When should you prioritise recall over precision, or vice versa?

Prioritise recall when missing a positive is very costly: cancer screening (missing cancer is worse than a false alarm that triggers more testing), fraud detection (missing fraud is worse than investigating a legitimate transaction). Prioritise precision when false positives are costly: spam filtering (blocking legitimate emails is worse than letting occasional spam through), medical treatment recommendations where unnecessary treatment causes harm.

F1 Score – UseCaseinAI

Q: What is the F1 score in simple terms?

The F1 score combines two things: how precise the model is (when it says positive, how often is it right?) and how sensitive it is (of all the actual positives, how many did it find?). A model that finds all the cancer cases but also flags healthy patients as cancer has high recall but low precision — a bad F1 score. F1 rewards models that are both precise and sensitive.

Q: What is the difference between precision and recall?

Precision: of all the cases the model predicted as positive, what fraction actually were positive? High precision means few false alarms. Recall: of all the actual positive cases that exist, what fraction did the model find? High recall means few missed cases. There is usually a tradeoff — increasing recall tends to decrease precision and vice versa. F1 balances both.

Q: Why is accuracy misleading for imbalanced datasets?

If 99% of your data is class A and 1% is class B, a model that always predicts class A achieves 99% accuracy — without detecting a single class B case. In fraud detection, cancer screening, or fault detection, class B is the entire point. F1 score exposes this: a model that never predicts class B has F1 = 0, regardless of its accuracy.

⚡ The F1 score is the harmonic mean of precision (when the model says positive, how often is it right?) and recall (of all actual positives, how many did the model find?). It is the go-to metric for imbalanced datasets — where accuracy is misleading. A cancer screener that never flags anyone is 99% accurate if 99% of patients are healthy. Its F1 score is zero.

Category: Machine Learning · Difficulty: Beginner · Last updated: 15 May 2026 · 4 min read

What is F1 score?

Accuracy sounds like the obvious way to measure a model. If 94 out of 100 predictions are correct, the model is 94% accurate. Simple. Useful. Until you have an imbalanced dataset.

Imagine building a model to detect a rare disease that affects 1% of the population. A model that always says “no disease” is 99% accurate — it correctly classifies every healthy person. But it misses every single patient. That is a useless model with a great accuracy score. Accuracy lied.

F1 score fixes this by combining two complementary metrics — precision and recall — into one number that penalises both false alarms and missed detections. A model that ignores the rare class entirely scores an F1 of zero, regardless of its accuracy. You cannot hide behind the majority class.

How F1 score works

Precision — of everything the model predicted as positive, what fraction actually was positive?
Formula: True Positives / (True Positives + False Positives)
High precision = few false alarms. “When I say someone has the disease, I am usually right.”

Recall (Sensitivity) — of all the actual positives that exist, what fraction did the model find?
Formula: True Positives / (True Positives + False Negatives)
High recall = few missed cases. “I find most of the disease cases that are actually there.”

F1 Score — the harmonic mean of precision and recall.
Formula: 2 × (Precision × Recall) / (Precision + Recall)
Ranges from 0 (worst) to 1 (perfect). Requires both precision and recall to be high to score well. Punishes models that sacrifice one for the other.

Real-world examples

Not theory — what real teams actually shipped using this technique.

A fraud detection model with 99% accuracy sounds impressive — until you check that 99.5% of transactions are legitimate. F1 score on the fraud class reveals whether the model actually catches fraud or just classifies everything as legitimate.
A cancer screening AI evaluated only on accuracy might look excellent. Evaluated on recall, you discover it misses 30% of actual cancer cases. That 30% is people who receive a false all-clear and do not get treatment.
Email spam filters use precision-recall tradeoffs deliberately — they prioritise precision (rarely blocking legitimate emails) over recall (occasionally letting spam through), because a false positive costs the user more than a false negative.

Common pitfalls

F1 treats false positives and false negatives equally — but they rarely cost equally in the real world. Use F-beta score to weight recall more (beta > 1) or precision more (beta < 1) based on your specific cost structure.
Macro vs micro F1 — for multi-class problems, macro F1 averages F1 across all classes equally (rare classes count as much as common ones). Micro F1 aggregates by total counts. Choose based on whether rare classes matter equally.
F1 is class-specific — always specify which class you are measuring F1 for. F1 on the positive class tells a very different story from F1 on the negative class.
Threshold dependence — F1 is calculated at a specific decision threshold. The precision-recall curve shows performance across all thresholds — always examine the full curve, not just F1 at the default 0.5 threshold.

Frequently asked questions

QUESTION 1 What is the F1 score in simple terms?

ANSWER 1 It combines precision (when the model says positive, how often is it right?) and recall (of all actual positives, how many did it find?) into one number — penalising models that sacrifice either.

QUESTION 2 What is the difference between precision and recall?

ANSWER 2 Precision: few false alarms. Recall: few missed cases. There is usually a trade-off — F1 balances into a single metric.

QUESTION 3 Why is accuracy misleading for imbalanced datasets?

ANSWER 3 A model predicting the majority class for every input achieves high accuracy while completely ignoring the minority class — which is often the entire point of the model.

QUESTION 4 When to prioritise recall over precision?

ANSWER 4 When missing a positive is very costly — cancer screening, fraud detection. Prioritise precision when false positives are costly — spam filtering, unnecessary medical treatment.

📬 Get one concept + one use case every Tuesday. Join the newsletter →