Getting Data Ready and Building Machine Learning Model
Get the data ready for training and build a machine learning model.
Typically, once we have the processed data, we split it into train and test parts using train_test_split()
.
# Importing required method from sklearnfrom sklearn.model_selection import train_test_split# Let's keep the default size and statesX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
Data split
However, we have separate files for the training and test datasets. We will work with two separate datasets here. Usually, this is how we work with real-life projects. We must perform all the preprocessing on the test part of the data, like we do with the train part.
Train part
Let's separate the data features as X_train
and the target as y_train
. Our target column has survived, whereas all others are features in train
(the entire dataset).
Press + to interact
X_train = train.drop('Survived', axis = 1) # features or variablesy_train = train['Survived'] # target, the values we need to predictprint(X_train.shape, y_train.shape)
We have separated the features as X_train
, and it’s always good ...
Access this course and 1400+ top-rated courses and projects.