⚡ A loss function measures how wrong a model’s prediction is compared to the correct answer. Training minimises this score by adjusting weights — the model literally optimises whatever you measure. Choose the wrong loss function and you get a technically correct but practically useless model. MSE for regression. Cross-entropy for classification. Choosing right is half the battle.
Category: Machine Learning · Difficulty: Beginner · Last updated: 15 May 2026 · 4 min read
Loss Function — What It Is and Why It Determines What Your AI Model Actually Optimises For
What is Loss Function?
Every model training process needs to answer one question: how do I know when the model is getting better? The loss function is the answer — a single number that measures the current quality of the model’s predictions. Low loss means the model is predicting accurately. High loss means it is predicting poorly. Gradient descent adjusts the model’s weights to push that number down.
The crucial insight: the model optimises exactly what the loss function measures — nothing more, nothing less. If your loss function measures average squared error on house prices, the model will minimise average squared error. Whether it makes terrible predictions on luxury properties because they are rare in the training set — that is not in the loss. The model does not care about what you did not measure.
This is why loss function selection is a fundamental design decision, not an afterthought.
COMMON LOSS FUNCTIONS
Mean Squared Error (MSE) — for regression. Computes the average of squared differences between predictions and actual values. Penalises large errors heavily (because they are squared). Standard for predicting continuous values: prices, temperatures, sales volumes.
Mean Absolute Error (MAE) — for regression. Average of absolute differences. Less sensitive to outliers than MSE because errors are not squared. Better when outliers exist and should not dominate the loss.
Cross-Entropy Loss (Log Loss) — for classification. Measures the difference between the model’s predicted probability distribution and the true label distribution. Severely penalises confident wrong predictions. Standard for all classification tasks.
Binary Cross-Entropy — cross-entropy for binary classification (spam/not spam, fraud/not fraud).
Focal Loss — modified cross-entropy that down-weights easy examples and focuses on hard misclassified cases. Designed for class imbalance — standard in object detection.
Contrastive / Triplet Loss — for learning embeddings. Pulls similar examples together in embedding space and pushes dissimilar ones apart. Used in face recognition, image similarity, and recommendation systems.
Real-world examples
Not theory — what real teams actually shipped using this technique.
- Airbnb pricing model uses a custom loss function that penalises under-pricing more than over-pricing — reflecting the real business cost asymmetry. A standard MSE treats both errors equally, which does not reflect Airbnb’s actual objective.
- YOLO object detectors use a composite loss combining classification loss (cross-entropy on class predictions), localisation loss (MSE on bounding box coordinates), and confidence loss (binary cross-entropy on objectness score) — three separate objectives combined into one training signal.
- Language model pretraining uses cross-entropy loss over the vocabulary — for each position, the loss measures how surprised the model was by the actual next token. Minimising this trains the model to predict text accurately
Common pitfalls
- Proxy metrics vs real objectives — your loss function is a proxy for what you care about. Minimising average price prediction error is a proxy for “make good property valuations.” Always evaluate whether the model with the lowest loss also performs best on your actual business metric.
- Gradient vanishing with wrong loss — using MSE for classification produces near-zero gradients when predictions are near 0 or 1 — training barely updates. Cross-entropy was designed to have useful gradients throughout the output range for classification.
- Loss is not accuracy — a model with lower loss does not always have higher accuracy. Loss measures probability quality; accuracy measures binary correctness. Both matter; neither alone tells the full story.
- Imbalanced classes — standard cross-entropy on imbalanced data lets the model ignore the minority class and still achieve low loss by predicting the majority class. Use class-weighted loss or focal loss.
Frequently asked questions
QUESTION 1 What is a loss function in simple terms?
ANSWER 1 A score measuring how wrong the model is. Perfect prediction = 0. The more wrong, the higher. Training adjusts weights to push this number down.
QUESTION 2 What is the difference between MSE and cross-entropy?
ANSWER 2 MSE: average squared difference, for regression (predicting numbers). Cross-entropy: probability distribution difference, for classification (predicting categories).
QUESTION 3 What is the danger of optimising the wrong loss?
ANSWER 3 The model gets very good at minimising your loss — even if that loss does not capture what you actually care about. Always check whether low loss aligns with your real objective.
QUESTION 4 What is focal loss?
ANSWER 4 Modified cross-entropy that focuses training on hard examples and down-weights easy ones — designed for class imbalance, standard in object detection.
📬 Get one concept + one use case every Tuesday. Join the newsletter →