Model Calibration — Public Receipts
When our model says 72%, does it actually hit 72%? Every settled bet gets graded into this table. The numbers below are computed by backend/calibration_monitor.py on a real-time loop — they're not curated, not cherry-picked, not edited. Nothing below is approved or filtered by a human.
How to read this →
ECE = Expected Calibration Error. Pages-of-stats summary: how far off, on average, our predicted probability sits from the actual hit rate. Lower is better; under 0.03 is good, 0.03–0.06 is fair, > 0.06 is poor and we shouldn't be sized to size against bets in that cell.
Brier = mean squared error between predicted prob and outcome (0 or 1). Useful for comparing two models on the same data; less interpretable in isolation. 0.25 ≈ "no better than flipping a coin"; 0.18 is solid.
Worst-first ordering in the per-market table is intentional — that's where the model is losing money and where we work next. The walk-forward block is the receipts that calibration actually works on never-seen data, not just on the training set.