Exploratory Data Analysis

Learn how to perform exploratory data analysis for anomaly detection on our dataset and plot data on histogram, box plot, bar plot, heatmap plot, and scatter plots.

Histogram of numeric features

We begin the EDA section of the project by visualizing the distribution of numeric features. We’ll use the hist() pandas function to accomplish this task.

Press + to interact
# Plotting histogram of numeric features
data[numeric].hist(bins=30, figsize=(9,8), grid=False)
plt.show()

All the features are extremely right-skewed. This indicates that some instances have significantly higher values than the rest and are outliers.

Numeric and categorical features

Visualizing the relationship between a categorical and numeric variable can be useful in many cases. We can do this by using the hue mapping feature of the histplot() Seaborn function.

Press + to interact
# Relationship b/w categorical&numeric features
fig, axes = plt.subplots(1, 2, figsize = (9,4))
for ax, col in zip(axes.flatten(), categorical):
sns.histplot(data=data, x ='Grocery', hue=col,
multiple='stack', ax=ax)
plt.show()

In this case, we plotted the 'Grocery' variable histogram with a different color for every category of the Channel and Region variables. Retail customers tend to spend more on grocery products than Horeca customers as the histogram bins with large values ...