Charging Station: Generating the Neighborhood of a String

Let’s look at how we can generate the neighborhood of a string by employing different algorithms.

We'll cover the following...

The d-neighborhood
Iterative version of Neighbors

The d-neighborhood

Our goal is to generate the d-neighborhood Neighbors(Pattern, d), the set of all k-mers whose Hamming distance from Pattern doesn’t exceed d. We’ll first generate the 1-neighborhood of Pattern using the following pseudocode:

  ImmediateNeighbors(Pattern)
    Neighborhood ← the set consisting of the single string Pattern 
    for i = 1 to |Pattern|
        symbol ← i-th nucleotide of Pattern
        for each nucleotide x different from symbol
            Neighbor ← Pattern with the i-th nucleotide substituted by x 
            add Neighbor to Neighborhood
    return Neighborhood

Our idea for generating Neighbors(Pattern, d) is as follows. If we remove the first symbol of Pattern (denoted FirstSymbol(Pattern)) then we’ll obtain a (k − 1)-mer that we denote by Suffix(Pattern).

STOP and Think: If we know Neighbors(Suffix(Pattern), d), how does it help us construct Neighbors(Pattern, d)?

Now, consider a (k − 1)-mer $Pattern^{\prime}$ ...

Before Getting Started

Where in the Genome Does DNA Replication Begin?

DNA Replication: Open Problems, Charging Stations, and Detours

How Do We Assemble Genomes?

Assemble Genomes: Charging Stations, and Detours

How Do We Compare Biological Sequences?

Biological Sequences: Detours

Conclusion

Charging Station: Generating the Neighborhood of a String

The d-neighborhood