An Application of Markov Model
In this lesson, let's have a look at an application of Markov Model: Randomized Text Generation.
In the previous lesson, we implemented the Markov process distribution, which is a distribution over state sequences, where the initial state and each subsequent state is random. There are lots of applications of Markov processes. In this lesson, we will have a look at randomized text generation.
Randomized Text Generation
Randomized text generation using a Markov process has a long history on the internet; one of our favorite examples was “Mark V. Shaney”.
Mark V. Shaney is a synthetic Usenet user whose postings in the net.singles newsgroups were generated by Markov chain techniques, based on text from other postings. The username is a play on the words “Markov chain”. We can now think of it as a “bot”, that was running a Markov process generated by analysis of USENET posts, sampling from the resulting distribution to produce a “Markov chain” sequence and posting the resulting generated text right back to USENET.
In this lesson, we are going to replicate that. The basic technique is straightforward:
- Start with a corpus of texts.
- Break up the text into words.
- Group the words into sentences.
- Take the first word of each sentence and make a weighted distribution; the more times this word appears as a first word in a sentence, the higher it is weighted. This gives us our initial distribution.
- Take every word in every sentence. For each word, generate a distribution of the words which follow it in the corpus of sentences. For example, if we have “frog” in the corpus, then we make a distribution based on the words which follow: “prince” twice, “pond” ten times, and so on. This gives us