Machine Learning

Decoding And Understanding The Classification Metrics

For “best evaluation” techniques yield the “best models”

Tooba Jamal

--

measuring tape on a yellow background representing accuracy measurement of machine learning models
Photo by Diana Polekhina on Unsplash

Table Of Contents

· Table Of Contents
· Accuracy
· Confusion Matrix
· Precision and Recall
· F-1 Score
· ROC Curve (Reciever operating characteristic)
· Conclusion

Writing one line of code to evaluate the performance of machine learning models is all cool but knowing what is happening behind those lines of code is what makes you a good data scientist.

We learn accuracy to evaluate our models in the beginning stages of our data science learning journey but it keeps a lot of useful information about our models hidden.

In this post, we are discussing the following classification metrics and what they tell us about the performance of our models.

  1. Accuracy
  2. Confusion Matrix
  3. Precision and Recall
  4. F1 score
  5. ROC curve

Accuracy

Accuracy is the proportion of truly predicted classes among the total number of classes. Accuracy gives an idea about the overall performance of our models but it can be misleading most of the time. A higher accuracy does not always mean our model is working fine as there are some hidden challenges in it.

To understand it, let’s suppose we build a model to classify individuals as male or female based on their attributes. The dataset we used has imbalanced class distribution meaning the amount of each of the classes is not equal. Say, we have a total of 100 records with 99 male and only 1 female. The model will correctly predict all the male classes and give 99% accuracy. Hence, accuracy is only useful in balanced class distribution.

Confusion Matrix

The confusion matrix provides a more comprehensive view of model performance by giving us exact proportions of correctly and wrongly predicted classes in the form of a table or matrix.

Consider a problem where we build a model which detects cancer in patients, in such a case we would never want a person with a cancerous tumor to be diagnosed as non-cancerous. Here, the confusion matrix comes to the rescue which tells us the exact measures of correct and wrong predictions as represented in the image below.

confusion matrix created by the author

Here, True positive means classes that were predicted true and were actually true. True negative is the classes that were predicted false and were actually false. False-positive are the classes that were predicted true but actually, they were false and false negative are the classes which were predicted false but were actually true.

Precision and Recall

Precision is the score that tells how often our model is correct when it predicts yes or the ratio of correctly predicted true classes by all the true classes predicted by our model. Mathematically, it is described as

precision = TP / TP + FP

The recall is the true positive rate which tells us the rate of truly predicted positive classes. It is useful in cases where we never want to miss the positive class like in the cancer example mentioned above. Mathematically,

recall = TP / TP + FN

F-1 Score

F-1 score is the harmonic mean of precision and recall which gives us an average of both the scores. Higher precision and recall scores mean f1 score is also higher and vice versa. The mathematical formula is,

f1 score = recall . precision / recall + precision

ROC Curve (Reciever operating characteristic)

ROC is a graphical representation of our model performance. True positive rate is plotted on the y-axis and the false positive rate is plotted on the x-axis. The classifiers which form the roc curve near the top left corner give better predictions.

photo from Wikipedia

Conclusion

In this post, we have discussed five of the most popular classification metrics and what do they tell us about our model performance. Ideally, we use more than one metric as they collectively contribute to more information when it comes to comparing different models. I hope, this has helped you understand the difference between all metrics and broadened your comprehension of how you can evaluate your models. Hit the clap and share the article if you like it. Thank you for reading!

PS: I am conducting a survey on the airline satisfaction of passengers. Please help me out by filling out this Google form, it will merely take a minute or two. Thanks in advance!

--

--