An epoch is one complete pass through the entire training dataset. Models train for many epochs — seeing all the data repeatedly, adjusting weights each time — until performance converges. Too few epochs and the model underlearns. Too many and it overfits by memorising training examples instead of learning general patterns.

Category: Machine Learning · Difficulty: Beginner · Last updated: 15 May 2026 · 4 min read


What is EPOCH?

Learning something once is rarely enough. A student reading a textbook chapter once retains less than one who reads it three times, practises questions, and reviews it again a week later. Machine learning works the same way. A model that sees each training example only once rarely learns the underlying patterns well enough to generalise.

An epoch is one complete pass through the entire training dataset — every example seen once, every weight updated based on what was seen. After one epoch, the model starts over and sees all the data again. This repetition — seeing the same data multiple times from different angles and in different orders — is what drives learning to converge.

HOW EPOCH RELATES TO BATCHES AND ITERATIONS

Training data is rarely processed one example at a time or all at once. It is split into batches — small subsets processed together before each weight update.

If you have 10,000 training examples and use a batch size of 100:

  • One iteration = processing 1 batch (100 examples) → 1 weight update
  • One epoch = 100 iterations (all 10,000 examples seen once)
  • 50 epochs = 5,000 total weight updates

Batch size is a hyperparameter. Smaller batches mean noisier but more frequent updates — often better generalisation. Larger batches are faster but can converge to sharper, less generalisable minima.

HOW MANY EPOCHS

  1. Start training and log both training loss and validation loss after every epoch.
  2. Training loss should decrease steadily — the model is learning.
  3. Validation loss should also decrease initially — the model is generalising.
  4. Watch for the point where validation loss stops decreasing or starts rising while training loss keeps falling — this is the onset of overfitting.
  5. The optimal number of epochs is just before this point.
  6. Use early stopping to automate this: halt training when validation loss has not improved for N consecutive epochs, restore the best weights.

Real-world examples

Not theory — what real teams actually shipped using this technique.

  • ResNet-50 trained on ImageNet typically uses 90 epochs with a learning rate schedule — reducing the learning rate at epoch 30 and 60 to allow fine-grained convergence after initial rapid learning.
  • GPT-style language models train for fewer than 1 epoch on their massive datasets — the dataset is so large that even a single pass through all the data takes weeks and provides sufficient signal.
  • Fine-tuning a pretrained BERT model on a small classification task typically requires only 3–5 epochs — the model already knows language; it just needs a few passes to specialise for the new task.

Common pitfalls

  • Training too long — the most common mistake. Validation loss rising while training loss falls is overfitting. Always monitor both curves.
  • Training too short — especially with small learning rates, models need many epochs to converge. A flat training loss after 5 epochs may just mean it needs 50.
  • Shuffling matters — shuffling the training data before each epoch prevents the model from learning order-specific patterns that do not generalise. Always shuffle between epochs.
  • Epoch count as a proxy for training time — epochs are dataset-relative. 10 epochs on 100 examples is trivial. 10 epochs on 1 billion examples is weeks of compute. Use wall-clock time and loss curves, not just epoch count, to evaluate training progress.

Frequently asked questions

QUESTION 1 What is an epoch in machine learning?

ANSWER 1 One full pass through the entire training dataset. The model sees each example once per epoch, adjusting weights after every batch. Multiple epochs are needed for convergence.

QUESTION 2 What is the difference between epoch, batch, and iteration?

ANSWER 2 Batch: subset of data processed before one weight update. Iteration: one weight update (one batch). Epoch: full pass through all data. 1000 examples, batch size 100 = 10 iterations per epoch.

QUESTION 3 How many epochs should you train for?

ANSWER 3 Monitor validation loss. Stop when it stops improving. Use early stopping to automate this — halt training when validation has not improved for N consecutive epochs.

QUESTION 4 What is early stopping?

ANSWER 4 Automatically halting training when validation loss stops improving, then restoring the weights from the best epoch. Prevents overfitting without specifying epochs in advance.


📬 Get one concept + one use case every Tuesday. Join the newsletter →