F1 Score
F1 Score
The F1 Score is a performance metric used in classification problems that balances the trade-off between Precision and Recall (also known as Sensitivity). It is especially useful when the dataset is imbalanced, and both false positives and false negatives are important.
Definition
The F1 Score is the harmonic mean of Precision and Recall.
Where:
- Precision =
- Recall =
- TP = True Positives
- FP = False Positives
- FN = False Negatives
Why Harmonic Mean?
The harmonic mean punishes extreme values more than the arithmetic mean. So if either precision or recall is very low, the F1 score will be low too. This makes it a balanced measure when you need both high precision and recall.
Simple Example
Imagine a medical test for a disease:
- True Positives (TP) = 80
- False Positives (FP) = 20
- False Negatives (FN) = 10
First, calculate:
Now compute F1 Score:
When to Use F1 Score
Use the F1 Score when:
- You care equally about false positives and false negatives
- The dataset is imbalanced
- Neither precision nor recall alone gives a complete picture
Not Ideal When
- Class distribution is balanced and you want to evaluate overall correctness → Accuracy may be enough.
- You want to analyze performance per class → Consider using Macro F1 or Weighted F1 in multi-class problems.
F1 Score Variants
- Micro F1: Aggregates total TP, FP, FN across all classes before computing F1
- Macro F1: Calculates F1 for each class, then averages
- Weighted F1: Like macro, but weighted by class support (number of instances)
Related Metrics
SEO Keywords
f1 score in machine learning, f1 score formula, harmonic mean of precision and recall, model evaluation metrics, f1 score example, f1 vs accuracy, precision recall f1 trade-off