⚡ An epoch is one complete pass through the entire training dataset. Models train for many epochs — seeing all the data repeatedly, adjusting weights each time — until performance converges. Too few epochs and the model underlearns. Too many and it overfits by memorising training examples instead of learning general patterns.
Category: Machine Learning · Difficulty: Beginner · Last updated: 15 May 2026 · 4 min read
What is EPOCH?
Learning something once is rarely enough. A student reading a textbook chapter once retains less than one who reads it three times, practises questions, and reviews it again a week later. Machine learning works the same way. A model that sees each training example only once rarely learns the underlying patterns well enough to generalise.
An epoch is one complete pass through the entire training dataset — every example seen once, every weight updated based on what was seen. After one epoch, the model starts over and sees all the data again. This repetition — seeing the same data multiple times from different angles and in different orders — is what drives learning to converge.
HOW EPOCH RELATES TO BATCHES AND ITERATIONS
Training data is rarely processed one example at a time or all at once. It is split into batches — small subsets processed together before each weight update.
If you have 10,000 training examples and use a batch size of 100:
- One iteration = processing 1 batch (100 examples) → 1 weight update
- One epoch = 100 iterations (all 10,000 examples seen once)
- 50 epochs = 5,000 total weight updates
Batch size is a hyperparameter. Smaller batches mean noisier but more frequent updates — often better generalisation. Larger batches are faster but can converge to sharper, less generalisable minima.
HOW MANY EPOCHS
- Start training and log both training loss and validation loss after every epoch.
- Training loss should decrease steadily — the model is learning.
- Validation loss should also decrease initially — the model is generalising.
- Watch for the point where validation loss stops decreasing or starts rising while training loss keeps falling — this is the onset of overfitting.
- The optimal number of epochs is just before this point.
- Use early stopping to automate this: halt training when validation loss has not improved for N consecutive epochs, restore the best weights.
Real-world examples
Not theory — what real teams actually shipped using this technique.
- ResNet-50 trained on ImageNet typically uses 90 epochs with a learning rate schedule — reducing the learning rate at epoch 30 and 60 to allow fine-grained convergence after initial rapid learning.
- GPT-style language models train for fewer than 1 epoch on their massive datasets — the dataset is so large that even a single pass through all the data takes weeks and provides sufficient signal.
- Fine-tuning a pretrained BERT model on a small classification task typically requires only 3–5 epochs — the model already knows language; it just needs a few passes to specialise for the new task.
Common pitfalls
- Training too long — the most common mistake. Validation loss rising while training loss falls is overfitting. Always monitor both curves.
- Training too short — especially with small learning rates, models need many epochs to converge. A flat training loss after 5 epochs may just mean it needs 50.
- Shuffling matters — shuffling the training data before each epoch prevents the model from learning order-specific patterns that do not generalise. Always shuffle between epochs.
- Epoch count as a proxy for training time — epochs are dataset-relative. 10 epochs on 100 examples is trivial. 10 epochs on 1 billion examples is weeks of compute. Use wall-clock time and loss curves, not just epoch count, to evaluate training progress.
Frequently asked questions
QUESTION 1 What is an epoch in machine learning?
ANSWER 1 One full pass through the entire training dataset. The model sees each example once per epoch, adjusting weights after every batch. Multiple epochs are needed for convergence.
QUESTION 2 What is the difference between epoch, batch, and iteration?
ANSWER 2 Batch: subset of data processed before one weight update. Iteration: one weight update (one batch). Epoch: full pass through all data. 1000 examples, batch size 100 = 10 iterations per epoch.
QUESTION 3 How many epochs should you train for?
ANSWER 3 Monitor validation loss. Stop when it stops improving. Use early stopping to automate this — halt training when validation has not improved for N consecutive epochs.
QUESTION 4 What is early stopping?
ANSWER 4 Automatically halting training when validation loss stops improving, then restoring the weights from the best epoch. Prevents overfitting without specifying epochs in advance.
📬 Get one concept + one use case every Tuesday. Join the newsletter →