Assembling genomes: Paired De Bruijn Graphs
Let’s talk about paired de Bruijn graphs.
Given a (k, d)-mer (a . . . a | b, . . . b), we define its prefix and suffix as the following (k −1, d + 1)-mers:
Prefix((a …a |b,…b)) = (a …a |b …b)
Suffix((a …a |b,…b)) = (a …a |b …b)
For example Prefix((GAC | TCA)) = (GA | TC) and Suffix((GAC | TCA)) = (AC | CA). Note that for consecutive (k, d)-mers appearing in Text, the suffix of the first (k, d)-mer is equal to the prefix of the second (k, d)-mer. For example for the consecutive (k, d)-mers (TAA | GCC) and (AAT | CCA) in TAATGCCATGGGATGTT.
Suffix((TAA | GCC)) = Prefix((AAT | CCA)) = (AA | CC)
Constructing a path graph
Given a string Text, we construct a graph PathGraph (Text) that represents a path formed by |Text| − (k + d + k) + 1 edges corresponding to all (k, d)-mers in Text. We label edges in this path by (k, d)-mers and label the starting and ending nodes of an edge by its prefix and suffix, respectively (see figure given below).
Get hands-on with 1400+ tech skills courses.