Model Training Using Scaled Features
Train a model using scaled data and predict.
We'll cover the following...
We have the scaled data from the previous lesson. Let’s split the data into the train and test parts. We can either use our newly created data frame (df_scaled_features
) or NumPy array (scaled_features
). Let’s try the NumPy array for now.
X = scaled_features # scaled features in Xy = target # targets in yprint("Scaled features are in X and the target is in y now! ")test_size=0.30; random_state=42# splitting dataX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)# keep the test_size and random_state same if we want to have same results as given here!print("train and test data sets are ready with test_size = {} and random_state = {}".format(test_size,random_state))
We’ll train the model with the training data.
Model training using scaled features
Let's create an instance of KNeighborsClassifier
with a different name, fit the training dataset, and predict in a single cell. For direct comparisons to see the effect of scaling, we'll keep n_neighbors = 3
the same as without scaling.
# Creating instanceknn_scaled = KNeighborsClassifier(n_neighbors=n_neighbors)# model fitting/trainingknn_scaled.fit(X_train, y_train)# prediction from the trained modelpredictions = knn_scaled.predict(X_test)
Once our model is created, we can evaluate it.
Predictions and evaluations
Let's perform the model evaluations.
print('Our k (n_neighbors) = {} here.\n'.format(n_neighbors))print(confusion_matrix(y_test, predictions))print(classification_report(y_test,predictions))
We are getting accurate evaluations and significant improvements using scaled features. We can see the importance of scaled features in distance-based algorithms like KNN. However, we are not done yet. There is another thing we need to check ...