Naive Bayes is a simple probabilistic classifier that calculates the probability of each class using Bayes Theorem, assuming all features are independent. The independence assumption is almost always wrong — but the algorithm works surprisingly well for text classification, spam filtering, and sentiment analysis. Trains in seconds, competitive with far more complex models on text tasks.

Category: Machine Learning · Difficulty: Beginner · Last updated: 15 May 2026 · 4 min read


Naive Bayes — What It Is and Why a Wrong Assumption Produces Surprisingly Good Results

What is Naive Bayes?

An email arrives. Is it spam? You notice certain words: “FREE”, “WINNER”, “CLICK HERE”, “UNSUBSCRIBE”. Each word individually raises your suspicion. Naive Bayes formalises this intuition into probabilities.

For each class (spam, not spam), it asks: given that this email contains word W, how much does that update the probability it is spam? It uses historical data to estimate P(spam | contains “FREE”) and P(not spam | contains “FREE”) for every word in the vocabulary. To classify a new email, it multiplies together the probabilities of each word given each class — the “naive” step — and picks the class with the highest resulting probability.

The naivety is in the multiplication. It assumes word probabilities are independent — that “FREE” appearing does not affect the probability of “WINNER” appearing. Any marketer knows this is false. They co-occur constantly in spam. Yet despite this unrealistic assumption, Naive Bayes works remarkably well.

Why the wrong assumption works?

For classification, you only need to know which class has the highest probability — not the exact probability values. Even when the absolute probabilities are wrong due to the independence assumption, the ranking of classes is often preserved. Spam still has a higher probability than not-spam, even if the exact numbers are distorted.

This is why Naive Bayes powered some of the first effective spam filters in the early 2000s and why it remains a competitive baseline for text classification tasks today.

Real-world examples

Not theory — what real teams actually shipped using this technique.

  • Paul Graham’s “A Plan for Spam” (2002) described a Naive Bayes spam filter that became the foundation of early effective spam detection — learning individual word probabilities from labelled spam and ham, combining them naively, and outperforming all previous rule-based approaches.
  • Medical diagnosis support in resource-constrained settings — Naive Bayes trained on symptom-disease pairs provides fast, interpretable preliminary diagnoses where complex models are impractical to deploy or explain.
  • Language identification — given a short text sample, Naive Bayes over character n-gram frequencies classifies the language in milliseconds with high accuracy.

Common pitfalls

  • Zero probability problem — if a word never appeared in training data for a class, its probability is zero, and the whole multiplication becomes zero. Laplace smoothing (add a small count to every word) prevents this.
  • Correlated features — when features are strongly correlated (as they almost always are in text), the independence assumption causes probability estimates to be overconfident — near zero or one — even when the true probability is uncertain.
  • Not suitable for continuous features without discretisation — standard Naive Bayes handles discrete counts. Gaussian Naive Bayes extends it to continuous features by assuming each feature follows a Gaussian distribution within each class.
  • Outperformed at scale — on large, labelled text datasets, gradient boosting and transformer-based models consistently outperform Naive Bayes. Use it as a fast baseline or in resource-constrained settings.

Frequently asked questions

QUESTION 1 What is Naive Bayes in simple terms?

ANSWER 1 A classifier that calculates the probability of each class using Bayes Theorem, multiplying individual feature probabilities together assuming they are independent.

QUESTION 2 Why is it called ‘naive’?

ANSWER 2 Because it assumes all features are independent — almost always false in practice. Remarkably, the ranking of class probabilities is often preserved anyway, making it work.

QUESTION 3 What is Naive Bayes used for?

ANSWER 3 Spam detection, text classification, sentiment analysis, language identification, medical diagnosis support, and as a fast baseline model.

QUESTION 4 When should you NOT use Naive Bayes?

ANSWER 4 When features are strongly correlated and that correlation matters, when well-calibrated probabilities are needed, or when maximum accuracy with sufficient data is the goal.


📬 Get one concept + one use case every Tuesday. Join the newsletter →