Assembling genomes: From Composition to Paired Composition

Let’s find out how composition can be converted to paired composition.

Notation of paired compositions

Given a string Text, a (k, d)-mer is a pair of k-mers in Text separated by distance d. We use the notation (Pattern1_{1} | Pattern2_{2}) to refer to a (k, d)-mer whose k-mers are Pattern1_{1} and Pattern2_{2}. For example, (AAT | TGG) is a (3, 4)-mer in TAATGCCATGGGATGTT. The (k, d)-mer composition of Text, denoted PairedCompositionk,d_{k, d} (Text), is the collection of all (k, d)-mers in Text (including repeated (k, d)-mers). For example, here’s PairedComposition3,1_{3,1}(TAATGCCATGGGATGTT):

Get hands-on with 1200+ tech skills courses.