...

/

An Explosion of Hidden Messages

An Explosion of Hidden Messages

Learn more about hidden messages in multiple genomes and how we can identify clumps in a sequence.

Looking for hidden messages in multiple genomes

We should not jump to the conclusion that ATGATCAAG/CTTGATCAT is a hidden message for all bacterial genomes without first checking whether it even appears in known ori regions from other bacteria. After all, maybe the clumping effect of ATGATCAAG/CTTGATCAT in the ori region of Vibrio cholerae is simply a statistical fluke that has nothing to do with replication. Or maybe different bacteria have different DnaA boxes.

Let’s check the proposed ori region of Thermotoga petrophila, a bacterium that thrives in extremely hot environments; its name derives from its discovery in the water beneath oil reservoirs, where temperatures can exceed 80º Celsius.

Press + to interact

This region doesn’t contain a single occurrence of ATGATCAAG or CTTGATCAT! Thus, different bacteria may use different DnaA boxes as “hidden messages” to the DnaA protein.

Significance of the Frequent Words Problem

Application of the Frequent Words Problem to the ori region above reveals that the following six 9-mers appear in this region three or more times:

AACCTACCA AAACCTACC ACCTACCA
CCTACCACC GGTAGGTTT TGGTAGGTT

Something peculiar must be happening because it’s extremely unlikely that six different 9-mers will occur so frequently within a short region in a random string. We’ll cheat a little and consult with Ori-Finder, a software tool for finding replication origins in DNA sequences. This software chooses CCTACCACC (along with its reverse complement GGTGGTAGG) as a working hypothesis for the DnaA box in Thermotoga petrophila. Together, these two complementary 9-mers appear five times in the replication origin:

Press + to interact

The Clump Finding Problem

Now imagine that ...