Charging Station: Generating the Neighborhood of a String
Let’s look at how we can generate the neighborhood of a string by employing different algorithms.
We'll cover the following...
The d-neighborhood
Our goal is to generate the d-neighborhood Neighbors(Pattern, d), the set of all k-mers whose Hamming distance from Pattern doesn’t exceed d. We’ll first generate the 1-neighborhood of Pattern using the following pseudocode:
ImmediateNeighbors(Pattern)
Neighborhood ← the set consisting of the single string Pattern
for i = 1 to |Pattern|
symbol ← i-th nucleotide of Pattern
for each nucleotide x different from symbol
Neighbor ← Pattern with the i-th nucleotide substituted by x
add Neighbor to Neighborhood
return Neighborhood
Our idea for generating Neighbors(Pattern, d) is as follows. If we remove the first symbol of Pattern (denoted FirstSymbol(Pattern)) then we’ll obtain a (k − 1)-mer that we denote by Suffix(Pattern).
STOP and Think: If we know Neighbors(Suffix(Pattern), d), how does it help us construct Neighbors(Pattern, d)?
Now, consider a (k − 1)-mer ...