...

/

An Application of Markov Model

An Application of Markov Model

In this lesson, let's have a look at an application of Markov Model: Randomized Text Generation.

In the previous lesson, we implemented the Markov process distribution, which is a distribution over state sequences, where the initial state and each subsequent state is random. There are lots of applications of Markov processes. In this lesson, we will have a look at randomized text generation.


Randomized Text Generation

Randomized text generation using a Markov process has a long history on the internet; one of our favorite examples was “Mark V. Shaney”.

Mark V. Shaney is a synthetic Usenet user whose postings in the net.singles newsgroups were generated by Markov chain techniques, based on text from other postings. The username is a play on the words “Markov chain”. We can now think of it as a “bot”, that was running a Markov process generated by analysis of USENET posts, sampling from the resulting distribution to produce a “Markov chain” sequence and posting the resulting generated text right back to USENET.

In this lesson, we are going to replicate that. The basic technique is straightforward:

  1. Start with a corpus of texts.
  2. Break up the text into words.
  3. Group the words into sentences.
  4. Take the first word of each sentence and make a weighted distribution; the more times this word appears as a first word in a sentence, the higher it is weighted. This gives us our initial distribution.
  5. Take every word in every sentence. For each word, generate a distribution of the words which follow it in the corpus of sentences. For example, if we have “frog” in the corpus, then we make a distribution based on the words which follow: “prince” twice, “pond” ten times, and so on. This gives us
...