Scoring Alignments
Explore what scoring matrices are and what is wrong with the LCS scoring model.
We'll cover the following...
What is wrong with the LCS scoring model?
Recall Marahiel’s alignment of the A-domains coding for Asp and Orn, which had 19 + 10 matches:
It’s not difficult to construct an alignment having even more matches at the expense of introducing more indels. Yet the more indels we add, the less biologically relevant the alignment becomes, as it diverges further and further from the biologically correct alignment found by Marahiel. Below is the alignment with the maximum number of matches, representing an LCS of length 19 + 8 + 19 = 46 (the green symbols represent new matches). This alignment is so long that we can’t fit it on a single line.
STOP and Think: If Marahiel had constructed this alignment, would he have been able to infer the eight amino acid-long signatures of the non-ribosomal code?
Below, we highlight the purple amino acids representing the non-ribosomal signatures. Although these signatures are grouped in eight conserved columns in Marahiel’s alignment from the beginning of the chapter, only five of these columns have “survived” in the LCS alignment above, making it impossible to infer the non-ribosomal signatures:
The frivolous matches hiding the real evolutionary scenario have appeared because nothing stopped us from introducing an excessive number of indels when building an LCS. Recalling our original alignment game in which we rewarded matched symbols, we need some way of penalizing indels and mismatches. First, let’s handle indels. Say that in addition to assigning matches a premium of , we decide to assess each indel a penalty of -4. The top-scoring alignment of the ...