Exploratory Data Analysis
Explore how to conduct exploratory data analysis for clustering using PyCaret. Understand the use of histograms for visualizing multimodal distributions, correlation heatmaps for identifying variable relationships, and scatter plot matrices to reveal cluster separations in your data. This lesson prepares you to analyze datasets with multiple distinct groups effectively.
We'll cover the following...
Histogram
The hist() pandas function lets us easily visualize the distribution of each variable.
We can see in the output that all variable distributions are either bimodal or multimodal, meaning they have two or more peaks respectively. This typically happens when the dataset contains multiple groups with different characteristics. In this case, the dataset was specifically created to contain ...