Predicted Probability & Calibration Explorer

Understand what a predicted probability means, how calibration works, and how thresholds affect decisions. Tune class imbalance, score separation, and calibration method to see effects on calibration error, loss, and ROC/PR performance.


Calibration & Reliability
Diagonal = perfect calibration
ECE: — Brier: — Log Loss: —
ECE uses fixed bins; try different calibration methods and class imbalance.
ROC & PR Curves
AUC summarizes ranking performance; PR is better for rare positives
ROC-AUC: — PR-AUC: — Accuracy@τ: — F1@τ: —
Data & Model
Higher separation = easier classification
Calibration
Threshold

How to interpret

What: A predicted probability is the model’s confidence (0–1) that an outcome is positive. Over many similar cases, a prediction of 0.8 should be positive about 80% of the time.

Calibration: Well-calibrated models match predicted probabilities to observed frequencies. Use Platt scaling (a logistic remap) or Isotonic (monotonic, non-parametric) to fix over/under-confidence.

Discrimination vs calibration: ROC/PR show how well the model ranks examples; metrics like ECE, Brier, and Log Loss show how trustworthy the probabilities are.

Thresholds: The default 0.5 may not be optimal—adjust τ for class imbalance or different error costs and watch Accuracy/F1 change.

Support This Free Tool

Every coffee helps keep the servers running. Every book sale funds the next tool I'm dreaming up. You're not just supporting a site — you're helping me build what developers actually need.

500K+ users
200+ tools
100% private
Privacy Guarantee: Private keys you enter or generate are never stored on our servers. All tools are served over HTTPS.

About This Tool & Methodology

Simulates predicted scores under tunable class imbalance and separation, then maps scores to probabilities with optional calibration (Platt/Isotonic). Plots reliability diagrams, ROC/PR curves, and reports ECE/Brier/Log Loss—all in your browser.

Learning Outcomes

  • Differentiate discrimination (ROC/PR) from calibration (reliability/ECE/Brier).
  • See how imbalance and score separation influence thresholds and metrics.
  • Practice choosing thresholds for specific precision/recall or cost trade‑offs.

Authorship & Review

  • Author: 8gwifi.org engineering team
  • Reviewed by: Anish Nath
  • Last updated: 2025-11-19

Trust & Privacy

  • All computations run locally with synthetic data by default; nothing is uploaded.