Search⌘ K

Exploratory Data Analysis

Explore how to use various data visualization techniques like histograms, box plots, bar plots, heatmaps, and scatter plots with PyCaret to analyze numeric and categorical features. Understand how these visualizations help detect outliers, identify correlations, and differentiate data groups essential for effective anomaly detection.

Histogram of numeric features

We begin the EDA section of the project by visualizing the distribution of numeric features. We’ll use the hist() pandas function to accomplish this task.

Python 3.5
# Plotting histogram of numeric features
data[numeric].hist(bins=30, figsize=(9,8), grid=False)
plt.show()

All the features are extremely right-skewed. This indicates that some instances have significantly higher values than the rest and are outliers.

Numeric and categorical features

Visualizing the relationship between a categorical and numeric variable can be useful in many cases. We can do this by using the hue mapping feature of the histplot() Seaborn function.

Python 3.5
# Relationship b/w categorical&numeric features
fig, axes = plt.subplots(1, 2, figsize = (9,4))
for ax, col in zip(axes.flatten(), categorical):
sns.histplot(data=data, x ='Grocery', hue=col,
multiple='stack', ax=ax)
plt.show()

In this case, we plotted the 'Grocery' variable histogram with a different color for every category of the Channel and Region variables. Retail customers tend to spend more on grocery products than Horeca customers as the histogram bins with large values ...