⚡ Overfitting is when a model memorises training data instead of learning general patterns — achieving near-perfect training accuracy while failing on new data. Like a student who memorises exam answers verbatim and fails when the questions change. Detected by the gap between training and validation performance. Fixed by more data, regularisation, dropout, and early stopping.
Category: Machine Learning · Difficulty: Beginner · Last updated: 15 May 2026 · 5 min read
Overfitting — What It Is, How to Detect It & How to Fix It
What is Overfitting?
A student has access to every past exam paper. They memorise every question and every answer — all 500 of them. Come exam day, if any of the previous questions appear, they answer perfectly. But when a new question appears — one not in the past papers — they are helpless. They learned the answers, not the subject.
Overfitting is the machine learning equivalent. A model that has “seen” its training data enough times, with sufficient capacity, can memorise every training example — including the noise, the quirks, and the irrelevant details specific to those particular examples. On training data it scores perfectly. On new data it has never seen, it fails — because the specific memorised patterns do not generalise.
The goal of machine learning is not to perform well on training data. It is to perform well on new, unseen data — to generalise. Overfitting is the failure mode that prevents generalisation.
How Overfitting works
The standard diagnostic: monitor both training loss and validation loss during training and plot them together.
Healthy training: both curves decrease together, converging to similar values. The model is learning genuinely generalisable patterns.
Overfitting signature: training loss continues to decrease while validation loss stops decreasing and starts increasing. The curves diverge. The model is memorising training examples at the expense of generalisation.
The validation loss minimum — the lowest point before it starts rising — is the optimal stopping point. Training beyond it makes the model worse on new data, not better.
HOW TO PREVENT IT
More training data — the most effective fix. More diverse examples make it harder to memorise specifics and force the model to learn general patterns. Data augmentation extends the effective dataset size artificially.
Regularisation — add a penalty term to the loss function that discourages large weights. L2 regularisation (weight decay) pushes weights toward zero. L1 regularisation produces sparse weights. Both constrain model complexity.
Dropout — randomly deactivate a fraction of neurons during each training step. Neurons cannot co-adapt (one neuron always compensating for another’s errors) — each must learn robust features independently. Reduces overfitting in neural networks significantly.
Early stopping — halt training when validation loss stops improving and restore the weights from the best validation epoch. Prevents the model from descending further into memorisation after the optimal point.
Simpler architecture — reduce the model’s capacity to match the complexity of the available data. A model with 10 million parameters trained on 1,000 examples has too much capacity. A simpler model is forced to learn what matters.
Real-world examples
Not theory — what real teams actually shipped using this technique.
- A medical AI trained on 500 hospital patients from one institution achieves 96% accuracy on that institution’s test data but 71% accuracy when deployed at a different hospital — the model memorised the specific patterns of one hospital’s equipment and patient population.
- A fraud detection model trained on 2019-2020 data memorised specific fraud patterns from that period — when fraud tactics changed in 2021, the model’s accuracy dropped dramatically because the memorised patterns no longer applied.
- A classic illustration: fit a polynomial of degree 9 to 10 data points. It passes through every point exactly — training error is zero. Evaluated on new points from the same underlying function, it wildly oscillates — overfitting in its purest form.
Common pitfalls
- Using test data to detect overfitting — if you tune the model based on test set performance, the test set is no longer a true measure of generalisation. Always use a separate validation set during development; keep the test set completely untouched.
- Regularisation strength — too much regularisation causes underfitting. Too little allows overfitting. The regularisation strength is a hyperparameter to tune on the validation set.
- Assuming more data always fixes overfitting — if the additional data is not diverse (same distribution, same source), it may not help. Diverse data from different distributions is what prevents memorisation.
- Dropout at inference — dropout is active during training but must be disabled at inference. Leaving dropout active at serving time degrades predictions and produces non-deterministic outputs.
Frequently asked questions
QUESTION 1 What is overfitting in simple terms?
ANSWER 1 Memorising instead of learning — perfect on training data, poor on new data. Like a student who memorises past exam answers and fails when new questions appear.
QUESTION 2 How do you detect overfitting?
ANSWER 2 Plot training loss and validation loss. Overfitting looks like training loss decreasing while validation loss stops improving and starts rising — the two curves diverge.
QUESTION 3 How do you prevent overfitting?
ANSWER 3 More diverse data, regularisation (L1/L2), dropout, early stopping, data augmentation, and simpler model architecture.
QUESTION 4 What is the difference between overfitting and underfitting?
ANSWER 4 Overfitting: too complex, memorises training data. High train accuracy, low validation accuracy. Underfitting: too simple, fails on both. Fix overfitting with regularisation; fix underfitting with more model capacity.
📬 Get one concept + one use case every Tuesday. Join the newsletter →