...

/

Constructing Universal Strings

Constructing Universal Strings

Let's construct universal strings using Eulerian cycles.

We'll cover the following...

Now that you know how to use the de Bruijn graph to solve the String Reconstruction Problem, you can also construct a k-universal string for any value of k. We should note that de Bruijn was interested in constructing k-universal circular strings. For example, 00011101 is a 3-universal circular string, as it contains each of the eight binary 3-mers exactly once (in the figure given below).

k-Universal Circular String Problem

Problem overview:
Find a k-universal circular string.

Input: An integer k.
Output: A k-universal circular string.

Sample dataset:

4

Sample output:

0000110010111101

The code below shows the k-Universal Circular String Problem.

Press + to interact
import itertools
def k_universal_string_problem(k):
## get an Eulerian cycle from a de Bruijn graph made from the k-mers
cycle = eulerian_cycle_problem(debrujin_graph_from_kmers(binary_strings(k)))
## set cycle from start to k-1
cycle = cycle[:-(k-1)]
## set genome to be the first value in cycle after removing the final character
genome = cycle[0][:-1]
## append the last character of every cycle to genome
for i in cycle:
genome += i[-1]
return genome
## get Eulerian cycles
def eulerian_cycle_problem(dict):
stack = []
random_vertex = sorted(dict.keys())[0]
stack.append(random_vertex)
path = []
while stack != []:
u_v = stack[-1]
try:
w = dict[u_v][0]
stack.append(w)
dict[u_v].remove(w)
except:
path.append(stack.pop())
return path[::-1]
def binary_strings(k):
universe = ["0", "1"]
kmers = ["".join(el) for el in itertools.product(universe, repeat=k)]
return sorted(kmers)
## get a de Bruijn graph from k-mers
def debrujin_graph_from_kmers(patterns):
kmers = []
for pattern in patterns:
kmers = kmers+suffix_composition(len(pattern), pattern, uniq=True)
kmers = set(kmers)
dict = {}
for kmer1 in kmers:
dict[kmer1] = []
for kmer in patterns:
dict[prefix(kmer)].append(suffix(kmer))
return dict
def suffix_composition(k, text, uniq=False):
kmers = []
for i in range(len(text)+1-k):
kmers.append(text[i:i+k-1])
if uniq:
return sorted(list(kmers))
else:
return sorted(kmers)
def suffix(string):
return string[1:]
def prefix(string):
return string[0:-1]
print(k_universal_string_problem(4))

Like its analogue for linear strings, the k-Universal Circular String Problem is just a specific case of a more general problem, which requires us to reconstruct a circular string given its k-mer composition. This problem models the assembly of a circular genome containing a single chromosome, like the genomes of most bacteria.

Circular string construction

We know that we can reconstruct a circular string from its k-mer composition by finding an Eulerian cycle in the de Bruijn graph constructed from these k-mers. Therefore, we can construct a k-universal circular binary string by finding an Eulerian cycle in the de Bruijn graph constructed from the ...