Project Creation: Part Two

In this lesson, we will discuss sampling and build our Markov model.

Load the dataset

Now’s the time to work with our real corpus. Click the download button below to get the dataset. This dataset contains the speech of the Honorable Prime Minister of India in English.

train_corpus.txt
Press + to interact
text_path = "train_corpus.txt"
def load_text(filename):
with open(filename,encoding='utf8') as f:
return f.read().lower()
text = load_text(text_path)
print('Loaded the dataset.')

Understand sampling

Before moving forward, one more important concept needs to be addressed: sampling. In simple words, sampling is the action or process of taking samples of something for analysis. Let’s understand sampling with the help of an example. Run the ...