...

/

Solution: Model Evaluation

Solution: Model Evaluation

Follow the instructions to perform model evaluation on real-world data.

We'll cover the following...

There are multiple possible solutions for the model selection coding challenge, depending on the cross-validation methods we choose, but the important thing is to do the following:

  1. Choose an appropriate metric for a classification task.

  2. Use a cross-validation method to select the best model.

Here is one possible solution:

Press + to interact
main.py
data.csv
import numpy as np
import pandas as pd
preprocessed = pd.read_csv("preprocessed.csv")
# Define X (model features) and y (target variable)
X = preprocessed[X_var]
y = preprocessed[y_var]
# Three algorithms
classifiers = [
LogisticRegression(penalty='l2', C=10),
KNeighborsClassifier(
n_neighbors=4, metric='euclidean', weights='distance'
),
DecisionTreeClassifier(
max_depth=5, min_samples_split=10
)
]
# Import evaluation metric
from sklearn.metrics import f1_score
# Initialize k-fold cross-validation
from sklearn.model_selection import KFold
k = 3
kf = KFold(n_splits=k)
# Perform k-fold cross-validation for each model
for model in classifiers:
# Initialize a list to store the F1 scores for each fold
f1_scores = []
for train_index, test_index in kf.split(X):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]
# Train the model
model.fit(X_train, y_train)
# Calculate F1 score for the current fold
y_test_pred = model.predict(X_test)
f1_scores.append(f1_score(y_test, y_test_pred))
print(f"Average F1 Score for {type(model).__name__}:", np.mean(f1_scores))
  • Lines 10–17: We initialize three different classification ...