Precision-Recall Curve

The Precision-Recall Curve (PR Curve) is a graphical representation used to evaluate the performance of binary classification models, especially on imbalanced datasets where the positive class is rare.

It plots Precision (y-axis) against Recall (x-axis) for different classification thresholds.

Why Use Precision-Recall Curve?

In many real-world problems like fraud detection, disease diagnosis, or spam filtering, the positive class is much less frequent than the negative class. Traditional metrics like ROC Curve or Accuracy can be misleading in such cases.

The Precision-Recall curve focuses on the performance of the positive class, showing how precision and recall change as the classification threshold varies.

Definitions

Precision measures the proportion of correctly predicted positive observations to all predicted positives:

Precision = \frac{T P}{T P + F P}

Recall (Sensitivity) measures the proportion of correctly predicted positive observations to all actual positives:

Recall = \frac{T P}{T P + F N}

Where:

TP = True Positives
FP = False Positives
FN = False Negatives

How to Interpret the Curve

- The top-right corner (Precision=1, Recall=1) represents perfect classification. - A high area under the PR curve indicates both high precision and recall. - The curve helps to select the best threshold by balancing false positives and false negatives.

Area Under the Precision-Recall Curve (AUPRC)

Similar to ROC AUC, the Area Under the Precision-Recall Curve (AUPRC) summarizes the model’s ability to balance precision and recall.

Higher AUPRC means better model performance on the positive class.
Unlike ROC AUC, AUPRC is more informative with highly skewed data.

Example

Imagine a spam detection system:

- At a low threshold, many emails are classified as spam (high recall) but many legitimate emails are incorrectly flagged (low precision). - At a high threshold, only very confident spam emails are flagged (high precision) but some spam emails go undetected (low recall). - The PR curve shows how precision and recall trade off as the threshold changes.

Precision-Recall Curve vs ROC Curve

Aspect	Precision-Recall Curve	ROC Curve
Best used when	Positive class is rare / imbalanced	Classes are balanced or costs similar
Focus	Performance on positive class	Trade-off between TPR and FPR (sensitivity and specificity)
Interpretation	Emphasizes false positives impact on precision	Emphasizes false positives rate relative to negatives

Related Pages

SEO Keywords

precision recall curve, pr curve machine learning, how to read precision recall curve, precision recall vs roc curve, imbalanced classification metrics, auc pr curve, precision recall tradeoff