Sentiment Analysis with spaCy
Let's look at a real-world dataset and train a sentiment analysis model.
We'll cover the following...
In this lesson, we'll work on a real-world dataset and train spaCy's TextCategorizer
on this dataset. We'll be working on the Amazon Fine Food Reviews dataset from Kaggle in this chapter. The original dataset is huge, with 100,000 rows. We sampled 4,000 rows. This dataset contains customer reviews about fine food sold on Amazon. Reviews include user and product information, user rating, and text.
We can load the dataset through the following method:
import pandas as pdurl = 'https://raw.githubusercontent.com/PacktPublishing/Mastering-spaCy/main/Chapter08/data/Reviews.zip'df = pd.read_csv(url, compression = 'zip')
Exploring the dataset
Now, we're ready to explore the dataset step by step:
First, we'll do the imports for reading and visualizing the dataset:
import pandas as pdimport matplotlib.pyplot as plt
We'll read the CSV file into a pandas DataFrame and output the shape of the DataFrame:
url = 'https://raw.githubusercontent.com/PacktPublishing/Mastering-spaCy/main/Chapter08/data/Reviews.zip'reviews_df = pd.read_csv(url, compression = 'zip')print(reviews_df.shape)
Next, we examine the rows and the columns of the dataset by printing the first five rows:
print(reviews_df.head())
We'll be using the
Text
...