Assembling genomes: Transforming Read-Pairs to Long Virtual Reads

Let’s see the transformation of read-pairs to long virtual reads and the construction of de Bruijn graphs from those long virtual reads.

Constructing de Bruijn graph from long reads

Let Reads be the collection of all 2N2N k-mer reads taken from N read-pairs. Note that a read-pair formed by k-mer reads Read1_{1} and Read2_{2} corresponds to two edges in the de Bruijn graph DeBruijnk_{k} (Reads). Since these reads are separated by distance d in the genome, there must be a path of length k + d + 1 in DeBruijnk_{k} (Reads) connecting the node at the beginning of the edge corresponding to Read1_{1} with the node at the end of the edge corresponding to Read2_{2}, as shown in the figure below. If there’s only one path of length k + d + 1 connecting these nodes, or if all such paths spell out the same string, then we can transform a read-pair formed by reads Read1_{1} and Read2_{2} into a virtual read of length 2 · k + d that starts as Read1_{1}, spells out this path, and ends with Read2_{2}.

Get hands-on with 1200+ tech skills courses.