Exploratory Data Analysis

Let’s learn how to perform exploratory data analysis for regression on our dataset and plot histograms, bar charts, and scatter plots.

We’ll now perform EDA on our data. As mentioned earlier, EDA is a method that helps us understand the dataset properties by using descriptive statistics and visualization. It is an important part of every machine learning or data science project because it’s essential that we understand the data set before we utilize it.

Histogram of numeric variables

The distribution of numeric variables can be visualized with a histogram that can be easily created with the hist() function.

Press + to interact
# Histogram of numeric variables
numeric = ['age', 'bmi', 'children', 'charges']
data[numeric].hist(bins=20, figsize = (10,5))
plt.show()

As we can see in the output, some of the ...