Accuracy in Machine Learning: Definition, Formula, Python Code, and Limitations"

Supervised learning with scikit-learn (Part 17)

Jun 24, 2025

📚Chapter:6- Measuring model performance

If you want to read more articles about Supervise Learning with Sklearn, don’t forget to stay tuned :) click here.

Introduction

In the dynamic world of machine learning, evaluating model performance is crucial to ensure reliability and effectiveness. One of the foundational aspects of this evaluation is the accuracy metrics, which help in understanding how well a model predicts outcomes. In classification problems, accuracy is a commonly used metric. Accuracy is the most straightforward indicator of model performance. Overall, how often is the classifier correct? It measures the percentage of accurate predictions: This blog delves into the accuracy metric used in machine learning, their significance, and how they can be interpreted to improve model performance.

Sections

Definition
Concepts
Formula
Python Code
Use Case
Limitations

Definition

Def: Accuracy is the ratio of correctly predicted observations to the total observations. It is a straightforward metric that indicates the overall correctness of the model.

Def The accuracy of a classifier is defined as the number of correct predictions divided by the total number of data points.

Def: Accuracy is the percentage that you predicted correctly out of all predictions

Def: it is simply a ratio of correctly predicted to the total observations

Concepts

We can say as, if we have high accuracy, then our model is the best. Yes, we could say that accuracy is a great measure but only when you have symmetric datasets where false positives and false negatives are almost the same. This begs the question though: which data do we use to compute accuracy? What we are really interested in is how well our model will perform on new data, that is, samples that the algorithm has never seen before. Well, you could compute the accuracy on the data you used to fit the classifier. However, as this data was used to train it, the classifier’s performance will not be indicative of how well it can generalize to unseen data. For this reason, it is common practice to split your data into two sets, a training set and a test set. You train or fit the classifier on the training set. Then you make predictions on the labeled test set and compare these predictions with the known labels. You then compute the accuracy of your predictions.

Formula

Accuracy = (TP + TN)/(TP + TN + FP + FN)

Python Code

Scikit-learn has a function named ‘accuracy_score()’ that let us calculate accuracy of model. We need to provide actual labels and predicted labels to function and it’ll return an accuracy score.

from sklearn.metrics import accuracy_score

y_true = [0, 1, 1, 0]
y_pred = [0, 1, 0, 0]
accuracy = accuracy_score(y_true, y_pred)

y_pred_DT = DTC.predict(X_test)
DT_Acc=DTC.score(X_test, y_test)
print('Accuracy score= {:.4f}'.format(DTC.score(X_test, y_test)))

Use Case:

Accuracy is suitable when the classes in the dataset are balanced. For instance, in a spam detection model where spam and non-spam emails are almost equal, accuracy gives a clear picture of model performance. In some cases, the accuracy score is not an idea cost function for measuring the model’s performance. e.g., e.g.le working on any healthcare project, if the accuracy is 99%, it will still not be considered the best performing model, and there should not be any risk related to human life.

Limitations

1- Imbalanced Classes

In cases of imbalanced datasets (e.g., fraud detection), accuracy can be misleading. For example, if 95% of transactions are non-fraudulent, a model that always predicts non-fraudulent transactions would have 95% accuracy but would be ineffective at identifying fraud.

The accuracy may be deceptive if the dataset contains classifications that are uneven. For instance, a model that merely predicts the majority class will be 99% accurate if the dominant class comprises 99% of the data. Unfortunately, it will not be able to appropriately classify the minority class. Other metrics including precision, recall, and F1-score should be used to address this issue.

Class imbalance example: Emails

Consider a spam classification problem in which 99% of emails are real and only 1% are spam. I could build a model that classifies all emails as real; this model would be correct 99% of the time and thus have an accuracy of 99%, which sounds great. However, this naive classifier does a horrible job of predicting spam: it never predicts spam at all, so it completely fails at its original purpose. The situation when one class is more frequent is called class imbalance because the class of real emails contains way more instances than the class of spam. This is a very common situation in practice and requires a more nuanced metric to assess the performance of our model.

2- Overfitting

When a model is overtrained on the training data and underperforms on the test data, it is said to be overfit. As a result, the accuracy may be high on the training set but poor on the test set. Techniques like cross-validation ,Ensemble methods. and regularisation should be applied to solve this issue.

3- Data Bias

The model will produce biased predictions if the training dataset is biassed. High accuracy on the training data may result from this, but performance on untrained data may be subpar. Techniques like data augmentation and resampling should be utilised to address this issue. Some other ways to address this problem are listed below:

📌 Related Posts You Might Like

🎯 Call to Action

Liked this tutorial?
👉 Subscribe to our newsletter for more Python + ML tutorials
👉 Follow our GitHub for code notebooks and projects
👉 Leave a comment below if you’d like a tutorial on vectorized backpropagation next!

👉, Supervised Learning with sklearn: Enroll for Full Course to find notes, repository etc.

👉, Deep learning and Neural network: Enroll for Full Course to find notes, repository etc.

🎁 Access exclusive Supervise Leanring with sklearn bundles and premium guides on our Gumroad store: From sentiment analysis notebooks to fine-tuning transformers—download, learn, and implement faster.

Coursesteach’s Substack

Discussion about this post