Macro F1
Macro F1 Score
The Macro F1 Score is an evaluation metric used in multi-class classification tasks. It calculates the F1 Score independently for each class and then takes the average (unweighted) across all classes.
Unlike the regular F1 Score, which is typically applied to binary classification, the Macro F1 is designed for problems involving more than two classes.
Definition
1. Compute Precision and Recall for each class individually 2. Compute F1 Score for each class 3. Take the arithmetic mean of these F1 scores
Where:
- = Total number of classes
- = F1 Score for class
Why Use Macro F1?
Macro F1 treats all classes equally, regardless of their frequency. It is best used when:
- All classes are equally important
- You want a balanced measure that does not favor majority classes
Simple Example
Suppose you have a 3-class classification problem with the following F1 Scores:
- F1(Class A) = 0.90
- F1(Class B) = 0.70
- F1(Class C) = 0.50
Then,
This means the average F1 score across all classes is 70%.
Macro F1 vs Micro F1
- Macro F1: Averages F1 scores per class — **all classes treated equally**
- Micro F1: Aggregates all TP, FP, FN before computing Precision/Recall — **class imbalance considered**
Use Macro F1 When
- You want to treat all classes fairly, regardless of how often they appear.
Use Micro F1 When
- Your dataset is imbalanced and you want overall performance.
Macro F1 vs Weighted F1
- Macro F1: Equal weight for all classes
- Weighted F1: Weighted by class support (number of instances per class)
Applications
- Multi-class text classification
- Image recognition with many categories
- Speech tagging tasks
- Any classification task where fairness across classes is important
Related Pages
SEO Keywords
macro f1 score, macro average f1, multi-class classification metrics, balanced evaluation, f1 score for multi-class, macro vs micro f1, macro f1 example, machine learning performance metric