Evaluating classification models

Just like there were many ways to evaluate linear regression models, there are many ways to evaluate the performance of classification models. Accuracy is one of the techniques. But it is not a sufficient metric alone. Why?

Think about a scenario where our model predicts a rare disease that is present only in 0.01% of the data. If our model always predicts that no disease is present, it will still be accurate 99% of the time but it would not diagnose correctly when it matters the most.

Classification report

A classification report is a table that calculates different metrics to evaluate our model. We can obtain the table using the function classification_report in sklearn.metrics

We will make the same model that we made in the last lesson.

Press + to interact

import pandas as pd 
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,classification_report
df = pd.read_csv('credit_card_cleaned.csv')
# Make data
X = df.drop(columns = ['default.payment.next.month','MARRIAGE','GENDER'])
Y = df[['default.payment.next.month']]
# Fit model
lr = LogisticRegression()
lr.fit(X,Y)
# Get predictions and accuracy
preds = lr.predict(X)
acc = accuracy_score(y_true = Y,y_pred = preds)
print('accuracy = ',acc)
print(classification_report(y_true = Y,y_pred = preds))

By printing the classification report we can see different metrics listed for both classes. Let’s look at these one by one. But before that, we need to know the concept of positive class and a negative class. A positive class is the one in which we are interested. If we are interested in predicting customers that will not default, then no is the positive class and yes is the negative class. However, we can invert this and call yes a positive class and no the negative class. In our example above, no is the positive class. Some of the concepts associated with positive and negative classes are:

True Positive: A true positive is an outcome where the model correctly predicts the positive class.
False Positive: A false positive is an outcome where the actual class was the negative class, but the model predicts the positive class.
True Negative: A true negative is an outcome where the model correctly predicts the negative class.
False Negative: A false negative is an outcome where the actual class was the positive class but the model predicts the negative class.

Precision

Precision is a ...

What is Data Science

Python Basics

Handling Tabular Data in Python

Data Cleaning

Exploratory Data Analysis

Statistical Inference

Predictive Models

Machine Learning

How to Predict the Traffic Volume Using Machine Learning

Evaluating Logistic Regression Models

Evaluating classification models

Classification report

Precision