Exploratory Data Analysis

Learn how to perform exploratory data analysis for natural language processing on a dataset and plot data on bar charts and word clouds.

We’ll perform EDA on the BBC News dataset.

Bar chart

By using the value_counts() and plot() pandas functions, we can create a bar chart that visualizes class proportions.

Press + to interact
# Plotting the bar chart
color = ['C0', 'C1', 'C2', 'C3', 'C4']
categories = data['category'].value_counts()
categories.plot(kind = 'bar', figsize = (12,8), color = color)
plt.show()

As we can see in the output, the dataset is imbalanced because the classes aren’t evenly distributed. We’ll deal with this issue later because right now it may cause problems with classification model training. The two most common ...