Macro F1 Score

The Macro F1 Score is an evaluation metric used in multi-class classification tasks. It calculates the F1 Score independently for each class and then takes the average (unweighted) across all classes.

Unlike the regular F1 Score, which is typically applied to binary classification, the Macro F1 is designed for problems involving more than two classes.

Definition

1. Compute Precision and Recall for each class individually 2. Compute F1 Score for each class 3. Take the arithmetic mean of these F1 scores

Macro F1 = \frac{1}{C} \sum_{i = 1}^{C} F 1_{i}

Where:

$C$ = Total number of classes
$F 1_{i}$ = F1 Score for class $i$

Why Use Macro F1?

Macro F1 treats all classes equally, regardless of their frequency. It is best used when:

All classes are equally important
You want a balanced measure that does not favor majority classes

Simple Example

Suppose you have a 3-class classification problem with the following F1 Scores:

F1(Class A) = 0.90
F1(Class B) = 0.70
F1(Class C) = 0.50

Then,

Macro F1 = \frac{0.90 + 0.70 + 0.50}{3} = 0.70

This means the average F1 score across all classes is 70%.

Macro F1 vs Micro F1

Macro F1: Averages F1 scores per class — **all classes treated equally**
Micro F1: Aggregates all TP, FP, FN before computing Precision/Recall — **class imbalance considered**

Use Macro F1 When

You want to treat all classes fairly, regardless of how often they appear.

Use Micro F1 When

Your dataset is imbalanced and you want overall performance.

Macro F1 vs Weighted F1

Macro F1: Equal weight for all classes
Weighted F1: Weighted by class support (number of instances per class)

Applications

Multi-class text classification
Image recognition with many categories
Speech tagging tasks
Any classification task where fairness across classes is important

Related Pages

SEO Keywords

macro f1 score, macro average f1, multi-class classification metrics, balanced evaluation, f1 score for multi-class, macro vs micro f1, macro f1 example, machine learning performance metric