Summary
Go over a summary of what we have learned in this chapter.
In this chapter, we learned about the following concepts.
The data and exploratory data analysis
The Titanic dataset is considered a first step towards classification in machine learning. The goal here is to predict if a passenger survived the sinking of the Titanic or not.
EDA of the data reveals that:
The
Cabin
column is missing 77.1%, theAge
column is missing 19.9%, and theEmbarked
column is missing 0.2% of its data.Among the deceased, most were male.
The rate of survival was higher for the
class-1
passengers.The
S
port was the busiest port for each class. We can expect more people to survive. However, the rate of survival was higher for portC
.
Data preprocessing and preparation
Moving toward the model training and evaluation phase involves preprocessing, such as removing missing values, handling categorical features by creating ...