Data Preprocessing
Perform data cleaning and create dummies.
We'll cover the following
So, we know from EDA that some data is missing in our dataset. Let's deal with that first.
Data cleaning
The Age
column is missing ~19.9% of its data. A convenient way to fix the Age
column is by filling the missing data with the mean
or average
value of all passengers in that column. We can do even better in this case because we know that there are three passenger classes. It's better to use the average age for each missing passenger for its class. Let's use a boxplot()
to visually explore if there is any relationship between class and passenger age.
Get hands-on with 1400+ tech skills courses.