

Model Training Using Scaled Features

Model Training Using Scaled Features

Train a model using scaled data and predict.

We have the scaled data from the previous lesson. Let’s split the data into the train and test parts. We can either use our newly created data frame (df_scaled_features) or NumPy array (scaled_features). Let’s try the NumPy array for now.

Press + to interact
X = scaled_features # scaled features in X
y = target # targets in y
print("Scaled features are in X and the target is in y now! ")
test_size=0.30; random_state=42
# splitting data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)
# keep the test_size and random_state same if we want to have same results as given here!
print("train and test data sets are ready with test_size = {} and random_state = {}".format(

We’ll train the model with the training data.

Model training using scaled features

Let's create an instance of KNeighborsClassifier with a different name, fit the training dataset, and predict in a single cell. For direct comparisons to see the effect of scaling, we'll keep n_neighbors = 3 the same as without scaling.

Press + to interact
# Creating instance
knn_scaled = KNeighborsClassifier(n_neighbors=n_neighbors)
# model fitting/training
knn_scaled.fit(X_train, y_train)
# prediction from the trained model
predictions = knn_scaled.predict(X_test)

Once our model is created, we can evaluate it.

Predictions and evaluations

Let's perform the model evaluations.

Press + to interact
print('Our k (n_neighbors) = {} here.\n'.format(n_neighbors))
print(confusion_matrix(y_test, predictions))

We are getting accurate evaluations and significant improvements using scaled features. We can see the importance of scaled features in distance-based algorithms like KNN. However, we are not done yet. There is another thing we need to check ...