🎯 Probability Calibration Lab

Compare model confidence before and after calibration. Inspect the reliability diagram, probability histogram, and calibration metrics (Brier, Log Loss, ECE, MCE).


Reliability Diagram
Ideal = diagonal; calibrated curve should move closer
Brier (raw)
Brier (cal)
ECE (raw)
ECE (cal)
Predicted Probability Histogram
Overconfident models pile up near 0 or 1 but don’t align with actual frequencies.
📦 Dataset & Generator
Train size1000
Test size1000
Model sharpness (T)0.80
Label noise0.00
🧪 Calibration
Log Loss (raw):
Log Loss (cal):
MCE (raw):
MCE (cal):
⚙️ Actions

What is Calibration? Why it matters
Calibration basics
  • Well-calibrated: Of all samples scored ~0.7, about 70% are positive
  • Discrimination vs calibration: AUC measures ranking; calibration measures probability accuracy
  • Reliability diagram: Plots predicted vs empirical frequency (diagonal = perfect)
Methods
  • Platt scaling: Logistic mapping of scores (smooth, monotonic)
  • Isotonic regression: Flexible, non-parametric monotonic fit (data-hungry)
  • Temperature scaling: Adjusts logit sharpness (preserves ranking)
When to calibrate
  • Imbalanced data and rare-event prediction
  • Risk-sensitive thresholds and capacity planning
  • Domain shift (different prevalence); recalibration often required
Metrics
  • Brier score: Mean squared error of probabilities (lower is better)
  • Log loss: Penalizes overconfident errors (lower is better)
  • ECE/MCE: Average and worst-case calibration gaps

Support This Free Tool

Every coffee helps keep the servers running. Every book sale funds the next tool I'm dreaming up. You're not just supporting a site — you're helping me build what developers actually need.

500K+ users
200+ tools
100% private
Privacy Guarantee: Private keys you enter or generate are never stored on our servers. All tools are served over HTTPS.

About This Tool & Methodology

This lab estimates calibration via reliability diagrams and Brier score; it supports Platt scaling/Isotonic (where enabled) and shows pre/post‑calibration curves. All computations run locally.

Learning Outcomes

  • Interpret calibration curves vs discrimination metrics (ROC/AUC).
  • Apply Platt/Isotonic to improve probability estimates where needed.
  • Use Brier score and expected calibration error (ECE) where available.

Authorship & Review

  • Author: 8gwifi.org engineering team
  • Reviewed by: Anish Nath
  • Last updated: 2025-11-19

Trust & Privacy

  • Runs locally with synthetic/provided scores; no uploads stored.