

Getting Data Ready and Building Machine Learning Model

Getting Data Ready and Building Machine Learning Model

Get the data ready for training and build a machine learning model.

Typically, once we have the processed data, we split it into train and test parts using train_test_split().

# Importing required method from sklearn
from sklearn.model_selection import train_test_split
# Let's keep the default size and states
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
Data split

However, we have separate files for the training and test datasets. We will work with two separate datasets here. Usually, this is how we work with real-life projects. We must perform all the preprocessing on the test part of the data, like we do with the train part.

Train part

Let's separate the data features as X_train and the target as y_train. Our target column has survived, whereas all others are features in train (the entire dataset).

Press + to interact
X_train = train.drop('Survived', axis = 1) # features or variables
y_train = train['Survived'] # target, the values we need to predict
print(X_train.shape, y_train.shape)

We have separated the features as X_train, and it’s always good to standardize them. Another good ...