Focal loss is a modification of cross-entropy that down-weights easy examples and focuses training on hard, misclassified examples. It was designed for object detection with extreme class imbalance — where background pixels vastly outnumber foreground objects. The model wastes training effort on easy negatives; focal loss redirects that effort toward the hard cases that matter.

Loss Function – UseCaseinAI

Q: What is a loss function in simple terms?

A loss function is a score that measures how wrong the model is. Perfect prediction = loss of 0. The more wrong the prediction, the higher the loss. During training, gradient descent adjusts the model's weights to reduce this score. The model literally optimises to minimise the loss — which is why choosing the right loss function is so important.

Q: What is the difference between MSE and cross-entropy loss?

Mean Squared Error (MSE) measures the average squared difference between predicted and actual values — used for regression problems where the output is a continuous number (house price, temperature). Cross-entropy loss measures how well the model's predicted probabilities match the true distribution — used for classification problems where the output is a category. Using the wrong one (MSE for classification) leads to poorly calibrated models.

Q: What is the danger of optimising the wrong loss?

The model will become very good at minimising the loss you specified — even if that loss does not capture what you actually care about. A model trained to minimise average error on house prices might be excellent on average but terrible at the extremes — because squared error penalises large errors more. A model trained to maximise accuracy might ignore rare but critical cases. Always sanity-check whether your loss aligns with your real objective.

⚡ A loss function measures how wrong a model’s prediction is compared to the correct answer. Training minimises this score by adjusting weights — the model literally optimises whatever you measure. Choose the wrong loss function and you get a technically correct but practically useless model. MSE for regression. Cross-entropy for classification. Choosing right is half the battle.

Category: Machine Learning · Difficulty: Beginner · Last updated: 15 May 2026 · 4 min read

Loss Function — What It Is and Why It Determines What Your AI Model Actually Optimises For

What is Loss Function?

Every model training process needs to answer one question: how do I know when the model is getting better? The loss function is the answer — a single number that measures the current quality of the model’s predictions. Low loss means the model is predicting accurately. High loss means it is predicting poorly. Gradient descent adjusts the model’s weights to push that number down.

The crucial insight: the model optimises exactly what the loss function measures — nothing more, nothing less. If your loss function measures average squared error on house prices, the model will minimise average squared error. Whether it makes terrible predictions on luxury properties because they are rare in the training set — that is not in the loss. The model does not care about what you did not measure.

This is why loss function selection is a fundamental design decision, not an afterthought.

COMMON LOSS FUNCTIONS

Mean Squared Error (MSE) — for regression. Computes the average of squared differences between predictions and actual values. Penalises large errors heavily (because they are squared). Standard for predicting continuous values: prices, temperatures, sales volumes.

Mean Absolute Error (MAE) — for regression. Average of absolute differences. Less sensitive to outliers than MSE because errors are not squared. Better when outliers exist and should not dominate the loss.

Cross-Entropy Loss (Log Loss) — for classification. Measures the difference between the model’s predicted probability distribution and the true label distribution. Severely penalises confident wrong predictions. Standard for all classification tasks.

Binary Cross-Entropy — cross-entropy for binary classification (spam/not spam, fraud/not fraud).

Focal Loss — modified cross-entropy that down-weights easy examples and focuses on hard misclassified cases. Designed for class imbalance — standard in object detection.

Contrastive / Triplet Loss — for learning embeddings. Pulls similar examples together in embedding space and pushes dissimilar ones apart. Used in face recognition, image similarity, and recommendation systems.

Real-world examples

Not theory — what real teams actually shipped using this technique.

Airbnb pricing model uses a custom loss function that penalises under-pricing more than over-pricing — reflecting the real business cost asymmetry. A standard MSE treats both errors equally, which does not reflect Airbnb’s actual objective.
YOLO object detectors use a composite loss combining classification loss (cross-entropy on class predictions), localisation loss (MSE on bounding box coordinates), and confidence loss (binary cross-entropy on objectness score) — three separate objectives combined into one training signal.
Language model pretraining uses cross-entropy loss over the vocabulary — for each position, the loss measures how surprised the model was by the actual next token. Minimising this trains the model to predict text accurately

Common pitfalls

Proxy metrics vs real objectives — your loss function is a proxy for what you care about. Minimising average price prediction error is a proxy for “make good property valuations.” Always evaluate whether the model with the lowest loss also performs best on your actual business metric.
Gradient vanishing with wrong loss — using MSE for classification produces near-zero gradients when predictions are near 0 or 1 — training barely updates. Cross-entropy was designed to have useful gradients throughout the output range for classification.
Loss is not accuracy — a model with lower loss does not always have higher accuracy. Loss measures probability quality; accuracy measures binary correctness. Both matter; neither alone tells the full story.
Imbalanced classes — standard cross-entropy on imbalanced data lets the model ignore the minority class and still achieve low loss by predicting the majority class. Use class-weighted loss or focal loss.

Frequently asked questions

QUESTION 1 What is a loss function in simple terms?

ANSWER 1 A score measuring how wrong the model is. Perfect prediction = 0. The more wrong, the higher. Training adjusts weights to push this number down.

QUESTION 2 What is the difference between MSE and cross-entropy?

ANSWER 2 MSE: average squared difference, for regression (predicting numbers). Cross-entropy: probability distribution difference, for classification (predicting categories).

QUESTION 3 What is the danger of optimising the wrong loss?

ANSWER 3 The model gets very good at minimising your loss — even if that loss does not capture what you actually care about. Always check whether low loss aligns with your real objective.

QUESTION 4 What is focal loss?

ANSWER 4 Modified cross-entropy that focuses training on hard examples and down-weights easy ones — designed for class imbalance, standard in object detection.

📬 Get one concept + one use case every Tuesday. Join the newsletter →