Weighted F1 Score

The Weighted F1 Score is a metric used in multi-class classification to evaluate model performance by computing the F1 Score for each class and taking the average, weighted by the number of true instances for each class (i.e., the class "support").

It is especially useful when working with imbalanced datasets, where some classes are more frequent than others.

Definition

Weighted F1 = \sum_{i = 1}^{C} w_{i} \cdot F 1_{i}

Where:

$C$ = Number of classes
$F 1_{i}$ = F1 Score for class $i$
$w_{i} = \frac{Number of true instances in class i}{Total number of instances}$

Key Features

Classes with more data have more influence on the final score.
Helps prevent small classes from skewing the result disproportionately.
Often the default setting in many ML libraries like Scikit-learn (Python).

Simple Example

Suppose a dataset has three classes with these F1 Scores and supports:

F1(Class A) = 0.90, Support = 50
F1(Class B) = 0.70, Support = 30
F1(Class C) = 0.50, Support = 20

First calculate total support:

$Total = 50 + 30 + 20 = 100$

Now calculate weighted F1:

Weighted F1 = \frac{50}{100} \cdot 0.90 + \frac{30}{100} \cdot 0.70 + \frac{20}{100} \cdot 0.50

= 0.45 + 0.21 + 0.10 = 0.76

So the Weighted F1 Score is **0.76**, favoring the majority class's performance.

Weighted vs Macro vs Micro F1

Metric	Weighting	Best For
Macro F1	Equal weight for all classes	Equal treatment for each class
Micro F1	Global average over all TP, FP, FN	Imbalanced data, overall performance
Weighted F1	Weighted by class support	Imbalanced datasets, with performance emphasis on larger classes

Use Cases

**Text classification** (e.g., news topics, sentiment analysis)
**Image classification** where some labels are rare
**Healthcare diagnosis** with rare but critical outcomes
**Customer segmentation** with uneven population groups

Limitations

Might mask poor performance on minority classes if the model performs well on dominant ones.
If class fairness is a concern, Macro F1 Score might be more appropriate.

Related Pages

SEO Keywords

weighted f1 score, f1 score for imbalanced data, machine learning multi-class metrics, class imbalance performance metric, weighted average f1, scikit-learn f1 weighted, macro vs weighted f1