Precision-Recall Curve

Revision as of 05:32, 10 June 2025 by Thakshashila (talk | contribs) (Created page with "= Precision-Recall Curve = The '''Precision-Recall Curve''' (PR Curve) is a graphical representation used to evaluate the performance of binary classification models, especially on '''imbalanced datasets''' where the positive class is rare. It plots '''Precision''' (y-axis) against '''Recall''' (x-axis) for different classification thresholds. == Why Use Precision-Recall Curve? == In many real-world problems like fraud detection, disease diagnosis, or spam filtering,...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Precision-Recall Curve

The Precision-Recall Curve (PR Curve) is a graphical representation used to evaluate the performance of binary classification models, especially on imbalanced datasets where the positive class is rare.

It plots Precision (y-axis) against Recall (x-axis) for different classification thresholds.

Why Use Precision-Recall Curve?

In many real-world problems like fraud detection, disease diagnosis, or spam filtering, the positive class is much less frequent than the negative class. Traditional metrics like ROC Curve or Accuracy can be misleading in such cases.

The Precision-Recall curve focuses on the performance of the positive class, showing how precision and recall change as the classification threshold varies.

Definitions

  • Precision measures the proportion of correctly predicted positive observations to all predicted positives:
Precision=TPTP+FP
  • Recall (Sensitivity) measures the proportion of correctly predicted positive observations to all actual positives:
Recall=TPTP+FN

Where:

  • TP = True Positives
  • FP = False Positives
  • FN = False Negatives

How to Interpret the Curve

- The top-right corner (Precision=1, Recall=1) represents perfect classification. - A high area under the PR curve indicates both high precision and recall. - The curve helps to select the best threshold by balancing false positives and false negatives.

Area Under the Precision-Recall Curve (AUPRC)

Similar to ROC AUC, the Area Under the Precision-Recall Curve (AUPRC) summarizes the model’s ability to balance precision and recall.

  • Higher AUPRC means better model performance on the positive class.
  • Unlike ROC AUC, AUPRC is more informative with highly skewed data.

Example

Imagine a spam detection system:

- At a low threshold, many emails are classified as spam (high recall) but many legitimate emails are incorrectly flagged (low precision). - At a high threshold, only very confident spam emails are flagged (high precision) but some spam emails go undetected (low recall). - The PR curve shows how precision and recall trade off as the threshold changes.

Precision-Recall Curve vs ROC Curve

Aspect Precision-Recall Curve ROC Curve
Best used when Positive class is rare / imbalanced Classes are balanced or costs similar
Focus Performance on positive class Trade-off between TPR and FPR (sensitivity and specificity)
Interpretation Emphasizes false positives impact on precision Emphasizes false positives rate relative to negatives

Related Pages

SEO Keywords

precision recall curve, pr curve machine learning, how to read precision recall curve, precision recall vs roc curve, imbalanced classification metrics, auc pr curve, precision recall tradeoff