How to create a confusion matrix without Scikit-Learn

Share

Machine learning algorithms are used to classify data into different categories. In order for us to test whether an algorithm is working well and outputs correct results, we need to look at several different metrics. There are four different types of categories in which the output of a machine learning algorithm can be classified.

  1. True Positives: Values that are actually positive and are identified as positive by the algorithm.
  2. False Positives: Values that are actually negative, but are identified as positive by the algorithm.
  3. False Negatives: Values that are actually positive, but are identified as negative by the algorithm.
  4. True Negatives: Values that are actually negative and are identified as negative by the algorithm.

A confusion matrix is used to store the summary of the output of a machine learning algorithm. This means that a confusion matrix stores the values of the metrics mentioned above. Below is an example of a confusion matrix.

This confusion matrix can be used to calculate multiple types of errors.

  • The first row can be used to calculate the precision.

    • Precision = TP/(TP+FP)TP/(TP+FP)
  • The first column can be used to calculate the recall or sensitivity.

    • Recall = TP/(TP+FN)TP/(TP+FN)
  • The second row can be used to calculate the negative predicted value.

    • Negative Predicted Value = TN/(FN+TN)TN/(FN+TN)
  • The second column can be used to calculate the specificity.

    • Specificity = TN/(FP+TN)TN/(FP+TN)
  • A confusion matrix can also be used to find different f-scores, such as f1, f2, f3, etc.

    • f-score = (B2+1)(Precision)(Recall)/((B2)(Precision)+Recall)(B^2 + 1)(Precision)(Recall)/((B^2)(Precision)+Recall) Where B=1,2,3…
def confusion_matrix(pred,original): #pass predicted and original labels to this function
matrix=np.zeros((2,2)) # form an empty matric of 2x2
for i in range(len(pred)): #the confusion matrix is for 2 classes: 1,0
#1=positive, 0=negative
if int(pred[i])==1 and int(original[i])==0:
matrix[0,0]+=1 #True Positives
elif int(pred[i])==-1 and int(original[i])==1:
matrix[0,1]+=1 #False Positives
elif int(pred[i])==0 and int(original[i])==1:
matrix[1,0]+=1 #False Negatives
elif int(pred[i])==0 and int(original[i])==0:
matrix[1,1]+=1 #True Negatives
precision=matrix[0,0]/(matrix[0,0]+matrix[0,1])
print("Precision:",precision)
recall=matrix[0,0]/(matrix[0,0]+matrix[1,0])
print("Recall:",recall)
specificity=matrix[1,1]/(matrix[0,1]+matrix[1,1])
print("Specificity:",specificity)
negative_pred_value=matrix[1,1]/(matrix[1,0]+matrix[1,1])
print("Negative Predicted Value:",negative_pred_value)
f1=2(precision*recall)/(precision+recall)
print("F1 score:",f1)
#the above code adds up the frequencies of the tps,tns,fps,fns and a matrix is formed
return matrix