Exploratory Data Analysis
Explore how to use various data visualization techniques like histograms, box plots, bar plots, heatmaps, and scatter plots with PyCaret to analyze numeric and categorical features. Understand how these visualizations help detect outliers, identify correlations, and differentiate data groups essential for effective anomaly detection.
We'll cover the following...
Histogram of numeric features
We begin the EDA section of the project by visualizing the distribution of numeric features. We’ll use the hist() pandas function to accomplish this task.
All the features are extremely right-skewed. This indicates that some instances have significantly higher values than the rest and are outliers.
Numeric and categorical features
Visualizing the relationship between a categorical and numeric variable can be useful in many cases. We can do this by using the hue mapping feature of the histplot() Seaborn function.
In this case, we plotted the 'Grocery' variable histogram with a different color for every category of the Channel and Region variables. Retail customers tend to spend more on grocery products than Horeca customers as the histogram bins with large values ...