Solution: Count the Number of Good Subsequences

Let's solve the Count the Number of Good Subsequences problem using the Dynamic Programming pattern.

Statement

Count and return the number of good subsequences in the given string s. You may return the modulo 109+710^9 + 7 of the count.

  • A subsequence is a sequence formed from another sequence by deleting some or no elements while keeping the order of the remaining elements unchanged.

  • good subsequence is a subsequence of a string if it is not empty and the frequency of each character is the same.

Constraints:

  • 11 \leqs.length 104\leq 10^4

  • s will only contain lowercase English characters.

Solution

The solution to this problem involves calculating the combinations. We will start by calculating the frequency of each character and then iterate over the frequencies, from 1 to the highest frequency of any character. We will count subsequences where each character appears for some or all of its frequency, i.e., calculating combinations. The count is summed up in each iteration, and after the loop terminates, 1 is deducted from the final count. The reason for deducting 1 is that the empty subsequence will also be counted. Because empty subsequence does not qualify as a good subsequence, we will remove its count from the final calculation.

Computing the combinations is a typical Binomial Coefficient problemThe binomial coefficient, often denoted as "n choose k," calculates the number of ways to choose k elements from a set of n distinct elements., known as n choose k problem, which is in mathematics written as (nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k! \cdot (n - k)!}. Counting the combinations would require us to calculate the factorial and division by factorial, calculated by multiplying with the modular inverses.

Because calculating the factorial[i] requires calculating the [factorial[i-1]..factorial[2]] again and again, therefore we will use dynamic programming to calculate the factorial[i] by multiplying i only with factorial[i — 1]. That is, we will reuse the existing results instead of calculating them again.

The algorithm to solve this problem is as follows:

  • Initialize two arrays: factorials and inverses of size n + 1, an integer array, character_frequencies, of the same size to count the frequencies of each character and an integer variable finalCount with 00.

  • Start a loop from 11 to n + 1. Inside the loop, compute the factorial and inverses of each ith number and store it in factorials[i] and inverses[i], respectively.

  • Start a loop from 00 to n + 1 and count the frequencies of each character of s. Moreover, keep the record of maximum frequency and save it in max_frequency.

  • Start a nested loop, where the outer loop, controlled by i, runs for max_frequency times and the inner loop, controlled by j runs for 2626 times.

    • Inside the outer loop, initialize a variable count with the value. Perform the following steps Inside the inner loop:

      • Compute the number of combinations, that is, n choose k, where n is the value of character_frequency[j], and k is the value of i. The number of combinations can be calculated with (factorials[n] * inverses[k] % MOD) * inverses[n-k] % MOD.

      • Add 11 to the number of combinations. This is because we consider the possibility of not choosing a character when constructing subsequences. In other words, for each character that can be included in a subsequence, there are two choices: either include the character or do not. Adding 11 captures these two possibilities. For example, if we have the string aabbcaabbc, we can either choose one aa and one bb or one aa and no bb, or no aa and one bb, or neither aa nor bb. If we don't add 11 to our calculation, we would miss the possibility of having a non-empty subsequence that excludes both aa and bb.

      • Update the count by multiplying this calculation with the current value of the count.

    • Deduct 11 from the count, add it to the current value of final_count, and take the modulus of this value. Update the final_count value with this new value. The reason to deduct 11 from the count is that we need to exclude the empty subsequence.

  • After the loop terminates, the value of final_count holds the number of good subsequences in the input string s.

Let’s look at the following illustration to get a better understanding of the solution:

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.