Transparency · Reliability
Calibration Metrics
How well model probabilities match real-world outcomes — verified bucket-by-bucket.
Model Calibration · v3.0 Walk-Forward
3160 OOS fights
Expected Calibration Error
0.143
Average gap between predicted and actual win rates.
Lower = better.
Reasonable
Brier Score
0.1860
0.25 = coin flip
Log Loss
0.5515
0.693 = coin flip
Accuracy
76.4%
walk-forward
Buckets
7
confidence tiers
Reliability by Confidence Bucket
Predicted
Actual
50-55%
+14.5pp
55-60%
+32.6pp
60-65%
+23.8pp
65-70%
+23.0pp
70-75%
+13.5pp
75-80%
+14.6pp
80-85%
+10.0pp
| Bucket | Fights | Predicted average | Actual win rate | Delta percentage points |
|---|---|---|---|---|
| 50-55% | 1988 | 52.5% | 67.0% | +14.5 |
| 55-60% | 59 | 57.3% | 89.8% | +32.6 |
| 60-65% | 58 | 62.4% | 86.2% | +23.8 |
| 65-70% | 64 | 66.1% | 89.1% | +23.0 |
| 70-75% | 42 | 72.2% | 85.7% | +13.5 |
| 75-80% | 320 | 76.3% | 90.9% | +14.6 |
| 80-85% | 629 | 84.4% | 94.4% | +10.0 |
Model is under-confident — its picks win more often than advertised. Real edge is larger than displayed confidence.
▸ Exact numbers table
| Predicted Range | Fights | Avg Predicted | Actual Win Rate | Delta | Status |
|---|---|---|---|---|---|
| 50-55% | 1988 | 52.5% | 67.0% | +14.5% | Under-confident (wins MORE than predicted — extra value) |
| 55-60% | 59 | 57.3% | 89.8% | +32.6% | Under-confident (wins MORE than predicted — extra value) |
| 60-65% | 58 | 62.4% | 86.2% | +23.8% | Under-confident (wins MORE than predicted — extra value) |
| 65-70% | 64 | 66.1% | 89.1% | +23.0% | Under-confident (wins MORE than predicted — extra value) |
| 70-75% | 42 | 72.2% | 85.7% | +13.5% | Under-confident (wins MORE than predicted — extra value) |
| 75-80% | 320 | 76.3% | 90.9% | +14.6% | Under-confident (wins MORE than predicted — extra value) |
| 80-85% | 629 | 84.4% | 94.4% | +10.0% | Under-confident (wins MORE than predicted — extra value) |
Walk-forward = compact per-fight validation rows (3160 fights). True out-of-sample, zero lookahead.
Method-call reliability
| Actual Method | Sample | Method Calls Correct |
|---|---|---|
| DEC | 1578 | 48.0% |
| KO/TKO | 1019 | 60.1% |
| SUB | 563 | 24.0% |
When a fight actually ends by submission, our full read (winner + method) matched it only ~24% of the time; by KO ~60%, by decision ~48% (n=3,160). Submissions are our hardest outcome to project - treat the finish-method line as directional.
Method rows are descriptive reliability checks and do not alter the win-probability model.
Empirical Hit Rates by Public Zone
125 scored
| Zone | Scored | Record | Expected | Hit Rate | 95% Wilson CI |
|---|---|---|---|---|---|
| LOCK | 28 | 19/28 | 78.9% | 67.9% | 49.3%-82.1% |
| STRONG | 3 | 2/3 | 65.9% | 66.7% | 20.8%-93.9% |
| SOLID | 8 | 6/8 | 62.0% | 75.0% | 40.9%-92.9% |
| LEAN | 7 | 3/7 | 58.6% | 42.9% | 15.8%-75.0% |
| COIN FLIP | 79 | 52/79 | 52.4% | 65.8% | 54.8%-75.3% |
These rows are read-only audit data from resolved prediction-log entries. Empty zones stay visible so missing sample areas are obvious.
How to Read These Metrics
Brier Score — measures probability calibration. 0.25 = coin flip, lower = better.
Excellent
Log Loss — information-theoretic quality. 0.693 = coin flip, lower = better.
Excellent
ECE (Expected Calibration Error) — avg gap between predicted and actual rates. Lower = better calibrated.
Reasonably calibrated
Live Tracking Data (125 predictions scored — click to collapse)
125
Scored
66%
Accuracy
0.239
Brier
Calibration by Confidence Bucket
| Predicted Range | Count | Avg Predicted | Actual Win Rate | Delta | Status |
|---|---|---|---|---|---|
| 50%-55% | 74 | 52.2% | 64.9% | +12.7% | Under-confident |
| 55%-60% | 12 | 57.6% | 58.3% | +0.7% | Well calibrated |
| 60%-65% | 8 | 62.0% | 75.0% | +13.0% | Under-confident |
| 65%-70% | 3 | 65.9% | 66.7% | +0.7% | Well calibrated |
| 70%-75% | 3 | 72.1% | 66.7% | -5.4% | Over-confident |
| 75%-80% | 13 | 75.8% | 61.5% | -14.3% | Over-confident |
| 80%-85% | 4 | 81.9% | 50.0% | -31.9% | Over-confident |
| 85%-90% | 8 | 85.0% | 87.5% | +2.5% | Well calibrated |
Rolling Accuracy (10-fight window)
Fight 118
50%
Fight 119
50%
Fight 120
40%
Fight 121
50%
Fight 122
60%
Fight 123
70%
Fight 124
70%
Fight 125
60%