F1 Score

The F1 Score is a performance metric used in classification problems that balances the trade-off between Precision and Recall (also known as Sensitivity). It is especially useful when the dataset is imbalanced, and both false positives and false negatives are important.

Definition

The F1 Score is the harmonic mean of Precision and Recall.

F1=2PrecisionRecallPrecision+Recall

Where:

  • Precision = TPTP+FP
  • Recall = TPTP+FN
  • TP = True Positives
  • FP = False Positives
  • FN = False Negatives

Why Harmonic Mean?

The harmonic mean punishes extreme values more than the arithmetic mean. So if either precision or recall is very low, the F1 score will be low too. This makes it a balanced measure when you need both high precision and recall.

Simple Example

Imagine a medical test for a disease:

  • True Positives (TP) = 80
  • False Positives (FP) = 20
  • False Negatives (FN) = 10

First, calculate:

Precision=8080+20=0.8
Recall=8080+10=0.8889

Now compute F1 Score:

F1=20.80.88890.8+0.888920.71111.68890.841

When to Use F1 Score

Use the F1 Score when:

  • You care equally about false positives and false negatives
  • The dataset is imbalanced
  • Neither precision nor recall alone gives a complete picture

Not Ideal When

  • Class distribution is balanced and you want to evaluate overall correctness → Accuracy may be enough.
  • You want to analyze performance per class → Consider using Macro F1 or Weighted F1 in multi-class problems.

F1 Score Variants

  • Micro F1: Aggregates total TP, FP, FN across all classes before computing F1
  • Macro F1: Calculates F1 for each class, then averages
  • Weighted F1: Like macro, but weighted by class support (number of instances)

Related Metrics

SEO Keywords

f1 score in machine learning, f1 score formula, harmonic mean of precision and recall, model evaluation metrics, f1 score example, f1 vs accuracy, precision recall f1 trade-off