Exercise: Calculating True and False Rates and Confusion Matrix

Learn to calculate the true and false positive and negative rates and the confusion matrix.

Confusion matrix calculation in Python

In this exercise, we’ll use the test data and model predictions from the logistic regression model we created previously, using only the EDUCATION feature. We will illustrate how to manually calculate the true and false positive and negative rates, as well as the numbers of true and false positives and negatives needed for the confusion matrix. Then we will show a quick way to calculate a confusion matrix with scikit-learn. Perform the following steps to complete the exercise, noting that some code from the previous lesson must be run before doing this exercise:

  1. Run this code to calculate the number of positive samples:

    P = sum(y_test) 
    P 
    

    The output should appear like this:

    # 1155 
    

    Now we need the number of true positives. These are samples where the true label is 1 and the prediction is also 1. We can identify these with a logical mask for the samples that are positive (y_test==1) AND & is the logical AND operator in Python) have a positive prediction (y_pred==1).

  2. Use this code to calculate the number of true positives:

    TP = sum( (y_test==1) & (y_pred==1))
    TP
    

    Here is the output:

    # 0
    

    The true positive rate is the proportion of true positives to positives, which of course would be 0 here.

  3. Run the following code to obtain the TPR:

    TPR = TP/P 
    TPR 
    

    You will obtain the following output:

    # 0.0 
    

    Similarly, we can identify the false negatives.

  4. Calculate the number of false negatives with this code:

    FN = sum( (y_test==1) & (y_pred==0) ) 
    FN 
    

    This should output the following: 1155

    We’d also like the FNR.

  5. Calculate the FNR with this code:

    FNR = FN/P 
    FNR 
    

    This should output the following:

    # 1.0 
    

    What have we learned from the true positive and false negative rates?

    First, we can confirm that they sum to 1. This fact is easy to see because the TPR = 0 and the FPR = 1. What does this tell us about our model? On the test set, at least for the positive samples, the model has in fact acted as a majority-class null model. Every positive sample was predicted to be negative, so none of them was correctly predicted.

  6. Let’s find the TNR and FPR of our test data. Because these calculations are very similar to those we looked at previously, we show them all at once and illustrate a new Python function:

    N = sum(y_test==0)
    N
    
    # 4178
    
    TN = sum( (y_test==0) & (y_pred==0))
    TN
    
    # 4178
    
    FP = sum( (y_test==0) & (y_pred==0))
    FP
    
    # 0
    
    TNR = TN/N
    FPR = FP/N
    print('The true negative rate is {} and the false positive rate is {}'.formate(TNR, 
    FPR))
    
    # The true negative rate is 1.0 and the false positive rate is o.o
    

Get hands-on with 1400+ tech skills courses.