Feature #2: Detect Virus

Implementing the "Detect Virus" feature for our "Computational Biology" project.

Description

While studying different DNA samples, we observed that a certain virus consists of really long sequences of k distinct nucleotides. The virus infects a species by embedding itself into the species’s DNA. We are working on devising a test to detect the virus. The idea is to analyze the longest string that consists of, at most, k nucleotides from a species’s DNA.

We’ll be provided with a string representing a chromosome from the infected DNA and a k value supplied from a hidden function. Our task will include calculating the longest subsequence from the chromosome string that has k unique nucleotides.

Here is an illustration to better understand this process:

Solution

Since we want to return a substring from a specific window over the original string, we can use a sliding window approach to accomplish this efficiently. We’ll use two pointers, left and right, to denote the boundaries of our sliding window.

Initially, both our pointers will be at the beginning of the string at position 0. We’ll keep moving the right pointer to the right as long as there are k distinct characters in our window. If we get k + 1 distinct characters at any point, the left pointer will be moved to the ...

Access this course and 1400+ top-rated courses and projects.