Feature #2: Detect Virus
Implementing the "Detect Virus" feature for our "Computational Biology" project.
We'll cover the following...
Description
While studying different DNA samples, we observed that a certain virus consists of really long sequences of k
distinct nucleotides. The virus infects a species by embedding itself into the species’s DNA. We are working on devising a test to detect the virus. The idea is to analyze the longest string that consists of, at most, k
nucleotides from a species’s DNA.
We’ll be provided with a string representing a chromosome from the infected DNA and a k
value supplied from a hidden function. Our task will include calculating the longest subsequence from the chromosome string that has k
unique nucleotides.
Here is an illustration to better understand this process:
Solution
Since we want to return a substring from a specific window over the original string, we can use a sliding window approach to accomplish this efficiently. We’ll use two pointers, left
and right
, to denote the boundaries of our sliding window.
Initially, both our pointers will be at the beginning of the string at position 0
. We’ll keep moving the right
pointer to the right as long as there are k
distinct characters in our window. If we get k + 1
distinct characters at any point, the left
pointer will be moved to the ...