Detour: The Overlapping Words Paradox
Discover how the overlapping words paradox arises in a two-player game with k-mers and coin flips, revealing surprising non-transitive outcomes. Learn to analyze these sequences using correlation frequency and Conway's formula to understand underlying probabilities in DNA replication problems.
We'll cover the following...
We illustrate the overlapping words paradox with a two-player game called “Best Bet for Simpletons.” Player 1 selects a binary k-mer A, and Player 2, knowing what A is, selects a different binary k-mer B. The two players then flip a coin multiple times, with coin flips represented by strings of “1” (“heads”) and “0” (“tails”); the game ends when A or B appears as a block of k consecutive coin flips.
STOP and Think: Do the two players always have the same chance of winning?
Chances of winning
At first glance, you might guess that every k-mer has an equal chance of winning. Yet suppose that Player 1 chooses “00” and Player 2 chooses “10.” After two flips, either Player 1 wins (“00”), Player 2 wins (“10”), or the game continues (“01” or “11”). If the game continues, then Player 1 should surrender, since Player 2 will win as soon as “tails” (“0”) is next flipped. Player 2 is therefore three times more likely to win!
Non-transitive game
It may seem that Player 1 should have the advantage by simply selecting the “strongest” k-mer. However, an intriguing feature of Best Bet for Simpletons is that if k > 2, then Player 2 can always choose a k-mer B that beats A, regardless of Player 1’s choice of A. Another surprise is that Best Bet for Simpletons is a non-transitive game: if A defeats B, and B defeats C, then we can’t automatically conclude that A defeats C (c.f. rock-paper-scissors).
To analyze Best Bet for Simpletons, we say that B i-overlaps with A if the last i digits of A coincide with the first i digits of B. For example, “110110” 1-overlaps, 2-overlaps, and 5-overlaps with “011011.”
Correlation frequency
Given two k-mers A and B, the correlation of A and B, denoted Corr(A, B) = (c0, . . . , c), is a k-letter binary word such that c = 1 if B (k − i)-overlaps with A, and 0 otherwise.
The correlation frequency of A and B is defined as:
...