Count Substrings
Solve an easy problem of counting the number of unique substrings in a string using tries.
Problem statement
Given a string
Example 1
Sample input
S = "aabb"
Sample output
8
Explanation
The set of distinct substrings for "aabb" is ["a", "aa", "aab", "aabb", "ab", "abb", "b", "bb"]
.
Example 2
Sample input
S = "gfedcba"
Sample output
28
Explanation
The set of distinct substrings for "gfedcba" is ["g","f","e","d","c","b","a","gf","fe","ed","dc", "cb","ba","gfe","fed","edc","dcb","cba","gfed","fedc","edcb","dcba","gfedc","fedcb","edcba", "gfedcb","fedcba","gfedcba"]
.
Try it yourself
Try to solve the problem yourself before reading the solution.
#include <iostream>#include <vector>using namespace std;int countDistinctSubStr(string str) {// your code goes herereturn -1;}
Intuition
Let's try to break the problem. Instead of finding unique substrings, let's find all the substrings of a given string. For this, a nested loop that iterates over all the combinations of the start and end indexes of a string will generate all the possible substrings.
// iterate from 0 to len(str)for(int i=0; i<strLen; i++){//iterate from ith index to len(str)for(int j=i; j<strLen; j++){//substring = chacraters from index i to j (inclusive)str.substr(i, j);}}
If we insert two strings in a trie,
Algorithm
The summary of the algorithm is given below.
Insert substrings in a trie and maintain the count
Iterate through the given string
If while inserting a string in the trie, the last node's isEndOfWord
is already true, it means the string has been inserted in the trie previously. Otherwise, if while inserting a string in the trie, the last node's isEndOfWord
is false, it means it hasn't been inserted in the trie before. Therefore, we increment the unique substring counter and mark isEndOFWord
as true for the last node of the current substring.
Example walk-through
Let's try to understand the procedure better with the help of the given example string aaab
.
Step 1: While inserting the substring a
, a new trie node for a
is created since it doesn't already exist. Then, we increment the unique string counter and mark isEndOfWord
as true.
Step 2: While inserting the substring aa
, a new trie node for a
is created since it doesn't already exist. We increment the unique string counter and mark isEndOfWord
for the newly added node for a
as true.
Step 3: While inserting the substring aaa
, a new trie node for a
is created since it doesn't already exist. We increment the unique string counter and mark isEndOfWord
for the newly added a
node as true.
Step 4: While inserting the substring aaab
, a new trie node for b
is created since it doesn't already exist. We increment the unique string counter and mark isEndOfWord
for the newly added b
node as true.
Step 5: While inserting the substring a
, there is no need to create a new trie node since a
is already present. Also, since isEndOfWord
is true already for this node, the unique string counter is not incremented.
Step 6: While inserting the substring aa
, there is no need to create a new trie node since aa
is already present. Also, for the last node in the string aa
, isEndOfWord
is already true, so the unique string counter is not incremented.
Step 7: While inserting the substring aab
, a new trie node b
is created since the path aab
didn't already exist. So we increment the unique string counter and mark isEndOfWord
as true.
Step 8: While inserting the substring ab
, a new trie node b
is created since the path ab
didn't already exist. Here also, for the last node b
, isEndOfWord
is false, so we increment the unique string counter and mark isEndOfWord
as true.
Step 9: While inserting the substring b"
a new trie node b
is created since the path b
didn't already exist. Here also, for the last node b
, isEndOfWord
is false, so we increment ...