Exploratory Data Analysis
Learn how to perform exploratory data analysis for anomaly detection on our dataset and plot data on histogram, box plot, bar plot, heatmap plot, and scatter plots.
We'll cover the following...
Histogram of numeric features
We begin the EDA section of the project by visualizing the distribution of numeric features. We’ll use the hist()
pandas function to accomplish this task.
# Plotting histogram of numeric featuresdata[numeric].hist(bins=30, figsize=(9,8), grid=False)plt.show()
All the features are extremely right-skewed. This indicates that some instances have significantly higher values than the rest and are outliers.
Numeric and categorical features
Visualizing the relationship between a categorical and numeric variable can be useful in many cases. We can do this by using the hue mapping feature of the histplot()
Seaborn function.
# Relationship b/w categorical&numeric featuresfig, axes = plt.subplots(1, 2, figsize = (9,4))for ax, col in zip(axes.flatten(), categorical):sns.histplot(data=data, x ='Grocery', hue=col,multiple='stack', ax=ax)plt.show()
In this case, we plotted the 'Grocery'
variable histogram with a different color for every category of the Channel
and Region
variables. Retail customers tend to spend more on grocery products than Horeca
customers as the histogram bins with large values ...