...

/

Sentiment Analysis with spaCy

Sentiment Analysis with spaCy

Let's look at a real-world dataset and train a sentiment analysis model.

In this lesson, we'll work on a real-world dataset and train spaCy's TextCategorizer on this dataset. We'll be working on the Amazon Fine Food Reviews dataset from Kaggle in this chapter. The original dataset is huge, with 100,000 rows. We sampled 4,000 rows. This dataset contains customer reviews about fine food sold on Amazon. Reviews include user and product information, user rating, and text.

We can load the dataset through the following method:

Press + to interact
import pandas as pd
url = 'https://raw.githubusercontent.com/PacktPublishing/Mastering-spaCy/main/Chapter08/data/Reviews.zip'
df = pd.read_csv(url, compression = 'zip')

Exploring the dataset

Now, we're ready to explore the dataset step by step:

  1. First, we'll do the imports for reading and visualizing the dataset:

Press + to interact
import pandas as pd
import matplotlib.pyplot as plt
  1. We'll read the CSV file into a pandas DataFrame and output the shape of the DataFrame:

Press + to interact
url = 'https://raw.githubusercontent.com/PacktPublishing/Mastering-spaCy/main/Chapter08/data/Reviews.zip'
reviews_df = pd.read_csv(url, compression = 'zip')
print(reviews_df.shape)
  1. Next, we examine the rows and the columns of the dataset by printing the first five rows:

Press + to interact
print(reviews_df.head())
  1. We'll be using the Text ...