A Final Attempt at Finding DnaA Boxes in E. coli

Learn that computational predictions, though powerful, can be inconclusive.

We now make a final attempt to find DnaA boxes in E. coli by finding the most frequent 9-mers with mismatches and reverse complements in the region suggested by the minimum skew as ori. Although the minimum of the skew diagram for E. coli is found at position 3923620, we shouldn’t assume that its ori is found exactly at this position due to random fluctuations in the skew. To remedy this issue, we could choose a larger window size (e.g., L=1000L = 1000), but expanding the window introduces the risk that we may bring in other clumped 9-mers that don’t represent DnaA boxes but appear in this window more often than the true DnaA box. It makes more sense to try a small window either starting, ending, or centered at the position of minimum skew.

Let’s cross our fingers and identify the most frequent 9-mers (with 1 mismatch and reverse complements) within a window of length 500 starting at position 3923620 of the E. coli genome. Bingo! The experimentally confirmed DnaA box in E. coli (TTATCCACA) is a most frequent 9-mer with 1 mismatch, along with its reverse complement TGTGGATAA:

Get hands-on with 1200+ tech skills courses.