Computing alignment score using linear memory

The runtime of the dynamic programming algorithm for aligning two strings of lengths n and m is proportional to the number of edges in their alignment graph, which is $O(n · m)$ . The memory required by this algorithm is also $O(n · m)$ , since we need to store the backtracking references. We’ll now demonstrate how to construct an alignment in just $O(n)$ space at the expense of doubling the runtime, meaning that the runtime is still $O(n · m)$ . In this section, for the sake of simplicity, we’ll consider the case of global alignment with non-affine gap penalties, i.e., when each symbol in a gap contributes the same score.

Divide-and-conquer algorithm

A divide-and-conquer algorithm often works when a solution to a large problem can be constructed from solutions of smaller problem instances. Such a strategy proceeds in two phases. The divide phase splits a problem instance into smaller instances and solves them; the conquer phase stitches the smaller solutions into a solution to the original problem. To learn more, see DETOUR: Divide-and-Conquer Algorithms.

Before we proceed with the divide-and-conquer algorithm for linear-space alignment, note that if we only wish to compute the score of an alignment rather than the alignment itself, then the space required can easily be reduced to just twice the number of nodes in a single column of the alignment graph, or $O(n)$ . This reduction derives from the observation that the only values needed to compute the alignment scores in column j are the alignment scores in column $j - 1$ . Therefore, the alignment scores in the columns before column $j - 1$ can be discarded when computing the alignment scores for column j, as illustrated in the figure below. Unfortunately, finding the longest path requires us to store an entire matrix of backtracking pointers, which causes the quadratic space requirement. The idea of the space reduction technique is that we don’t need to store any backtracking pointers if we’re willing to spend a little more time. In fact, we’ll show how to construct an alignment without storing any backtracking pointers.

Before Getting Started

Where in the Genome Does DNA Replication Begin?

DNA Replication: Open Problems, Charging Stations, and Detours

How Do We Assemble Genomes?

Assemble Genomes: Charging Stations, and Detours

How Do We Compare Biological Sequences?

Biological Sequences: Detours

Conclusion

Space Efficient Sequence Alignment

Computing alignment score using linear memory

Divide-and-conquer algorithm

The Middle Node Problem