Feature #6: Identify a Species
Implement the "Identify a Species" feature for our "Computational Biology" project.
Description
The DNA for an alien species consists of a sequence of nucleotides. We can uniquely identify a species by finding the longest substring of nucleotides in the DNA, where no nucleotide appears twice. This substring of DNA is known as a species marker. Given an animal’s DNA, where each nucleotide is represented by a letter, we can find its species marker.
The following examples might clarify the requirements:
Solution
The basic idea is to traverse the entire sequence of nucleotides and search for a substring in which no nucleotides appear twice. For each visited nucleotide, we can store its last occurrence in a hash table, with the key as the nucleotide and the value as its last position in the sequence.
Here’s how we implement this feature.
-
We initialize the following set of variables to
0
to keep track of the visited nucleotides:st_curr
: the starting index of the current substring.longest
: the length of the longest substring.start
: the
Level up your interview prep. Join Educative to access 70+ hands-on prep courses.