Classification is a machine learning task where the model learns to assign inputs into predefined categories — spam or not spam, cancer or no cancer, which digit is this. It is one of the most widely deployed AI capabilities in the world, powering everything from your email inbox to medical imaging to content moderation.

Category: Machine Learning · Difficulty: Beginner · Last updated: 15 May 2026 · 5 min read


Classification — What It Is and How Machine Learning Learns to Sort the World into Categories

What is Classification?

Every time Gmail quietly moves an email to your spam folder, it has run a classification model. Every time a radiologist’s software highlights a suspicious region in a scan, a classification model found it. Every time your bank’s app flags a transaction as potentially fraudulent, classification made that call.

Classification is the task of teaching a machine to sort inputs into categories. You provide thousands of labelled examples — emails marked spam or not spam, tumours marked malignant or benign, loan applications marked approved or rejected. The model finds the patterns that separate one category from another. Then you give it unlabelled new inputs, and it assigns each one to the most likely category — often in milliseconds.

How Classification works

  1. Collect labelled training data — inputs paired with their correct category.
  2. Choose a classification algorithm (logistic regression, decision tree, neural network, etc.).
  3. Train the model — it adjusts its internal parameters until it can correctly separate categories in the training data.
  4. Evaluate on a held-out test set — measure accuracy, precision, recall, and F1 score.
  5. Deploy — feed new unlabelled inputs and the model returns a predicted category and a confidence score.
  6. Monitor — real-world data distributions change, so retrain periodically to maintain performance.

Real-world examples

Not theory — what real teams actually shipped using this technique.

  • Google’s spam filter classifies billions of emails daily — over 99.9% accuracy at a scale that would require millions of human moderators to match.
  • Pathology AI by PathAI classifies cancer cells in tissue samples with accuracy comparable to expert pathologists, helping labs process more samples faster.
  • Content moderation on social platforms uses multi-class classification to sort posts into categories — safe, violent, hate speech, misinformation — flagging the harmful ones for review or removal.

Common pitfalls

  • Class imbalance — if 99% of your data is one class, a model that always predicts that class is 99% accurate but completely useless. Use techniques like oversampling, undersampling, or class-weighted loss.
  • Threshold selection — classification models output a probability score. Choosing where to draw the line (0.5? 0.7?) affects the tradeoff between false positives and false negatives. This is a business decision, not just a technical one.
  • Data leakage — if your training data contains information that would not be available at prediction time, accuracy looks great during training but collapses in production.
  • Confusing classification with regression — classification predicts a category. Regression predicts a number. Predicting whether a customer will churn is classification. Predicting how much they will spend next month is regression.

Frequently asked questions

QUESTION 1 What is classification in machine learning?

ANSWER 1 Teaching a machine to put things into categories. Show it thousands of labelled examples and it learns the boundary between categories. Give it new examples and it assigns each to the most likely category.

QUESTION 2 What is the difference between binary and multi-class classification?

ANSWER 2 Binary has two categories (spam or not spam). Multi-class has three or more (cat, dog, bird, or fish). Both use the same algorithms but multi-class requires distinguishing more complex boundaries.

QUESTION 3 What algorithms are used for classification?

ANSWER 3 Logistic regression, decision trees, random forests, SVM, KNN, XGBoost, and neural networks. The right choice depends on data type, size, and interpretability requirements.

QUESTION 4 How do you measure classification performance?

ANSWER 4 Accuracy, precision, recall, and F1 score. For imbalanced datasets, accuracy alone is misleading always check performance on the minority class separately.


📬 Get one concept + one use case every Tuesday. Join the newsletter →