...

Epilogue: Multiple Sequence Alignment

Learn multiple sequence alignment with the help of a three-dimensional Manhattan and a greedy multiple alignment algorithm.

We'll cover the following...

Building a three-dimensional Manhattan
- T-way alignment
A greedy multiple alignment algorithm

Amino acid sequences of proteins performing the same function are likely to be somewhat similar, but these similarities may be elusive in the case of distant species. You now possess an arsenal of algorithms for aligning pairs of sequences, but if sequence similarity is weak, pairwise alignment may not identify biologically related sequences. However, simultaneous comparison of many sequences often allows us to find similarities that pairwise sequence comparison fails to reveal. Bioinformaticians sometimes say that while pairwise alignment whispers, multiple alignment shouts.

Building a three-dimensional Manhattan

We’re now ready to use pairwise sequence analysis to build up our intuition for comparison of multiple sequences. In our three-way alignment of A-domains from the introduction, we found 19 conserved columns:

The score of multiple alignments is defined as the sum of scores of the alignment columns (or, equivalently, weights of edges in the alignment path), with an optimal alignment being one that maximizes this score. In the case of an amino acid alphabet, we can use a very general scoring method that is defined by a t-dimensional matrix containing $21^{t}$ entries that describe the scores of all possible combinations of t symbols (representing 20 amino acids and the space symbol). Intuitively, we should reward more conserved columns with higher scores. For more details, see DETOUR: Scoring Multiple Alignments. Intuitively, we should reward more conserved columns with higher scores. For example, in the Multiple Longest Common Subsequence Problem, the score of a column is equal to 1 if all of the column’s symbols are identical, and 0 if even one symbol disagrees.

Press + to interact

import itertools
MATCH = 1 
MISMATCH = 0
GAP = 0
def MultipleAlignment(dna):
    def go():
        g = itertools.product([0, -1], repeat=k) #store the cartesion product of 0 and -1 taken k times in g
        next(g) # get the next element of g
        return g
    k = len(dna) # store the length of dna in k
    score = {} # make an map score
    dir = {} # make a map dir
    cells = itertools.product(*[range(len(d) + 1) for d in dna]) # store the cartesion product of range of len(d)+1 where d is iterator for dna
    start = next(cells) # store the next of cells in start
    score[start] = 0 # initialize the score[start]=0
    dir[start] = None # initialize the dir[start]=None
    for c in cells: # run loop on cells
        score[c] = -10 ** 6 #initialize the score[c] with -10 ** 6
        dir[c] = None # initialize the dir[c]=None
        for d in go(): # run loop on go()
            prev = tuple(map(lambda x, y: x + y, c, d)) # evaluate the function using lambda and make a list using map and than pass to tuple and store in prev
            if any(x < 0 for x in prev): continue # check if any value in prev is 0 than continue
            if d.count(0): # if at least one '-', then assign GAP to penalty
                penalty = GAP
            elif any(dna[i][prev[i]] != dna[0][prev[0]] for i in range(k)): # if unequal column, then assign MISMATCH to penalty
                penalty = MISMATCH
            else: # if all are equal in column, then assign MATCH to penalty
                penalty = MATCH
            if score[c] < score[prev] + penalty: # check if score[c] < score[prev] + penalty, then score[c] = score[prev] + penalty and dir[c] = d
                score[c] = score[prev] + penalty
                dir[c] = d
    c = tuple(len(d) for d in dna) # assign tuple of len(d) where d is iterator for dna to c
    final_score = score[c] # assign score[c] to final_score
    alignment = ['' for _ in dna] # assign '' to alignment
    d = dir[c] # assign dir[c] to d
    #we don't need actual alignment for scoring, but let's find it
    while d:
        c = tuple(map(lambda x, y: x + y, c, d)) # evaluate the function using lambda and make a list using map and than pass to tuple and store in c
        for i, g in enumerate(d): # run loop on d
            if not g: # check if there is no g, then alignment[i] += '-'
                alignment[i] += '-'
            else: # else alignment[i] += dna[i][c[i]]
                alignment[i] += dna[i][c[i]]
        d = dir[c] # assign dir[c] to d
    return '%d\n%s' % (final_score, '\n'.join(x[::-1] for x in alignment))
DNA = """AATATCCG
TCCGA
ATGTACTG""".splitlines()
print(MultipleAlignment(DNA))

Before Getting Started

Where in the Genome Does DNA Replication Begin?

DNA Replication: Open Problems, Charging Stations, and Detours

How Do We Assemble Genomes?

Assemble Genomes: Charging Stations, and Detours

How Do We Compare Biological Sequences?

Biological Sequences: Detours

Conclusion

Epilogue: Multiple Sequence Alignment

Building a three-dimensional Manhattan

T-way alignment