Count Substrings

Solve an easy problem of counting the number of unique substrings in a string using tries.

Problem statement

Given a stringSS, count and return the number of unique substrings of SS.

Example 1

Sample input

S = "aabb"

Sample output

8

Explanation

The set of distinct substrings for "aabb" is ["a", "aa", "aab", "aabb", "ab", "abb", "b", "bb"].

Example 2

Sample input

S = "gfedcba"

Sample output

28

Explanation

The set of distinct substrings for "gfedcba" is ["g","f","e","d","c","b","a","gf","fe","ed","dc", "cb","ba","gfe","fed","edc","dcb","cba","gfed","fedc","edcb","dcba","gfedc","fedcb","edcba", "gfedcb","fedcba","gfedcba"].

Try it yourself

Try to solve the problem yourself before reading the solution.

#include <iostream>
#include <vector>
using namespace std;
int countDistinctSubStr(string str) {
// your code goes here
return -1;
}
Solve count substrings in C++

Intuition

Let's try to break the problem. Instead of finding unique substrings, let's find all the substrings of a given string. For this, a nested loop that iterates over all the combinations of the start and end indexes of a string will generate all the possible substrings.

Press + to interact
// iterate from 0 to len(str)
for(int i=0; i<strLen; i++){
//iterate from ith index to len(str)
for(int j=i; j<strLen; j++){
//substring = chacraters from index i to j (inclusive)
str.substr(i, j);
}
}

If we insert two strings in a trie, only a single path is createdSingle_path_trie_for_same_strings because the set of alphabets is the same. This property of tries can help us identify the count of unique substrings.

Algorithm

The summary of the algorithm is given below.

Insert substrings in a trie and maintain the count

Iterate through the given string SSand generate its substrings using a nested loop. Then, insert every substring into the trie and maintain its count. While inserting the substring, check whether a string is already present in the trie.

If while inserting a string in the trie, the last node's isEndOfWord is already true, it means the string has been inserted in the trie previously. Otherwise, if while inserting a string in the trie, the last node's isEndOfWord is false, it means it hasn't been inserted in the trie before. Therefore, we increment the unique substring counter and mark isEndOFWord as true for the last node of the current substring.

Example walk-through

Let's try to understand the procedure better with the help of the given example string aaab.

Step 1: While inserting the substring a, a new trie node for a is created since it doesn't already exist. Then, we increment the unique string counter and mark isEndOfWord as true.

Step 2: While inserting the substring aa, a new trie node for a is created since it doesn't already exist. We increment the unique string counter and mark isEndOfWord for the newly added node for a as true.

Step 3: While inserting the substring aaa, a new trie node for a is created since it doesn't already exist. We increment the unique string counter and mark isEndOfWord for the newly added a node as true.

Step 4: While inserting the substring aaab, a new trie node for b is created since it doesn't already exist. We increment the unique string counter and mark isEndOfWord for the newly added b node as true.

Step 5: While inserting the substring a, there is no need to create a new trie node since a is already present. Also, since isEndOfWord is true already for this node, the unique string counter is not incremented.

Step 6: While inserting the substring aa, there is no need to create a new trie node since aa is already present. Also, for the last node in the string aa, isEndOfWord is already true, so the unique string counter is not incremented.

Step 7: While inserting the substring aab, a new trie node b is created since the path aab didn't already exist. So we increment the unique string counter and mark isEndOfWord as true.

Step 8: While inserting the substring ab, a new trie node b is created since the path ab didn't already exist. Here also, for the last node b, isEndOfWord is false, so we increment the unique string counter and mark isEndOfWord as true.

Step 9: While inserting the substring b" a new trie node b is created since the path b didn't already exist. Here also, for the last node b, isEndOfWord is false, so we increment ...

Access this course and 1400+ top-rated courses and projects.