Model Calibration · v3.0 Walk-Forward 3160 OOS fights
Expected Calibration Error
0.143
Average gap between predicted and actual win rates. Lower = better. Reasonable
Brier Score
0.1860
0.25 = coin flip
Log Loss
0.5515
0.693 = coin flip
Accuracy
76.4%
walk-forward
Buckets
7
confidence tiers
Reliability by Confidence Bucket Predicted Actual
50-55%
P 52.5%
A 67.0%
+14.5pp
55-60%
P 57.3%
A 89.8%
+32.6pp
60-65%
P 62.4%
A 86.2%
+23.8pp
65-70%
P 66.1%
A 89.1%
+23.0pp
70-75%
P 72.2%
A 85.7%
+13.5pp
75-80%
P 76.3%
A 90.9%
+14.6pp
80-85%
P 84.4%
A 94.4%
+10.0pp
Calibration reliability by confidence bucket
Bucket Fights Predicted average Actual win rate Delta percentage points
50-55% 1988 52.5% 67.0% +14.5
55-60% 59 57.3% 89.8% +32.6
60-65% 58 62.4% 86.2% +23.8
65-70% 64 66.1% 89.1% +23.0
70-75% 42 72.2% 85.7% +13.5
75-80% 320 76.3% 90.9% +14.6
80-85% 629 84.4% 94.4% +10.0
Model is under-confident — its picks win more often than advertised. Real edge is larger than displayed confidence.
Exact numbers table
Predicted Range Fights Avg Predicted Actual Win Rate Delta Status
50-55% 1988 52.5% 67.0% +14.5% Under-confident (wins MORE than predicted — extra value)
55-60% 59 57.3% 89.8% +32.6% Under-confident (wins MORE than predicted — extra value)
60-65% 58 62.4% 86.2% +23.8% Under-confident (wins MORE than predicted — extra value)
65-70% 64 66.1% 89.1% +23.0% Under-confident (wins MORE than predicted — extra value)
70-75% 42 72.2% 85.7% +13.5% Under-confident (wins MORE than predicted — extra value)
75-80% 320 76.3% 90.9% +14.6% Under-confident (wins MORE than predicted — extra value)
80-85% 629 84.4% 94.4% +10.0% Under-confident (wins MORE than predicted — extra value)
Walk-forward = compact per-fight validation rows (3160 fights). True out-of-sample, zero lookahead.
Method-call reliability
Actual Method Sample Method Calls Correct
DEC 1578 48.0%
KO/TKO 1019 60.1%
SUB 563 24.0%
When a fight actually ends by submission, our full read (winner + method) matched it only ~24% of the time; by KO ~60%, by decision ~48% (n=3,160). Submissions are our hardest outcome to project - treat the finish-method line as directional. Method rows are descriptive reliability checks and do not alter the win-probability model.
Empirical Hit Rates by Public Zone 125 scored
Zone Scored Record Expected Hit Rate 95% Wilson CI
LOCK 28 19/28 78.9% 67.9% 49.3%-82.1%
STRONG 3 2/3 65.9% 66.7% 20.8%-93.9%
SOLID 8 6/8 62.0% 75.0% 40.9%-92.9%
LEAN 7 3/7 58.6% 42.9% 15.8%-75.0%
COIN FLIP 79 52/79 52.4% 65.8% 54.8%-75.3%
These rows are read-only audit data from resolved prediction-log entries. Empty zones stay visible so missing sample areas are obvious.
How to Read These Metrics
Brier Score — measures probability calibration. 0.25 = coin flip, lower = better. Excellent
Log Loss — information-theoretic quality. 0.693 = coin flip, lower = better. Excellent
ECE (Expected Calibration Error) — avg gap between predicted and actual rates. Lower = better calibrated. Reasonably calibrated
Live Tracking Data (125 predictions scored — click to collapse)
125
Scored
66%
Accuracy
0.239
Brier
Calibration by Confidence Bucket
Predicted Range Count Avg Predicted Actual Win Rate Delta Status
50%-55% 74 52.2% 64.9% +12.7% Under-confident
55%-60% 12 57.6% 58.3% +0.7% Well calibrated
60%-65% 8 62.0% 75.0% +13.0% Under-confident
65%-70% 3 65.9% 66.7% +0.7% Well calibrated
70%-75% 3 72.1% 66.7% -5.4% Over-confident
75%-80% 13 75.8% 61.5% -14.3% Over-confident
80%-85% 4 81.9% 50.0% -31.9% Over-confident
85%-90% 8 85.0% 87.5% +2.5% Well calibrated
Rolling Accuracy (10-fight window)
Fight 118
50%
Fight 119
50%
Fight 120
40%
Fight 121
50%
Fight 122
60%
Fight 123
70%
Fight 124
70%
Fight 125
60%