Model Calibration · v3.0 Walk-Forward 3160 OOS fights

Expected Calibration Error

0.143

Average gap between predicted and actual win rates. Lower = better. Reasonable

Brier Score

0.1860

0.25 = coin flip

Log Loss

0.5515

0.693 = coin flip

Accuracy

76.4%

walk-forward

Buckets

confidence tiers

Reliability by Confidence Bucket Predicted Actual

50-55%

P 52.5%

A 67.0%

+14.5pp

55-60%

P 57.3%

A 89.8%

+32.6pp

60-65%

P 62.4%

A 86.2%

+23.8pp

65-70%

P 66.1%

A 89.1%

+23.0pp

70-75%

P 72.2%

A 85.7%

+13.5pp

75-80%

P 76.3%

A 90.9%

+14.6pp

80-85%

P 84.4%

A 94.4%

+10.0pp

Calibration reliability by confidence bucket
Bucket	Fights	Predicted average	Actual win rate	Delta percentage points
50-55%	1988	52.5%	67.0%	+14.5
55-60%	59	57.3%	89.8%	+32.6
60-65%	58	62.4%	86.2%	+23.8
65-70%	64	66.1%	89.1%	+23.0
70-75%	42	72.2%	85.7%	+13.5
75-80%	320	76.3%	90.9%	+14.6
80-85%	629	84.4%	94.4%	+10.0

Model is under-confident — its picks win more often than advertised. Real edge is larger than displayed confidence.

▸ Exact numbers table

Predicted Range	Fights	Avg Predicted	Actual Win Rate	Delta	Status
50-55%	1988	52.5%	67.0%	+14.5%	Under-confident (wins MORE than predicted — extra value)
55-60%	59	57.3%	89.8%	+32.6%	Under-confident (wins MORE than predicted — extra value)
60-65%	58	62.4%	86.2%	+23.8%	Under-confident (wins MORE than predicted — extra value)
65-70%	64	66.1%	89.1%	+23.0%	Under-confident (wins MORE than predicted — extra value)
70-75%	42	72.2%	85.7%	+13.5%	Under-confident (wins MORE than predicted — extra value)
75-80%	320	76.3%	90.9%	+14.6%	Under-confident (wins MORE than predicted — extra value)
80-85%	629	84.4%	94.4%	+10.0%	Under-confident (wins MORE than predicted — extra value)

Walk-forward = compact per-fight validation rows (3160 fights). True out-of-sample, zero lookahead.

Method-call reliability

Actual Method	Sample	Method Calls Correct
DEC	1578	48.0%
KO/TKO	1019	60.1%
SUB	563	24.0%

When a fight actually ends by submission, our full read (winner + method) matched it only ~24% of the time; by KO ~60%, by decision ~48% (n=3,160). Submissions are our hardest outcome to project - treat the finish-method line as directional. Method rows are descriptive reliability checks and do not alter the win-probability model.

Empirical Hit Rates by Public Zone 125 scored

Zone	Scored	Record	Expected	Hit Rate	95% Wilson CI
LOCK	28	19/28	78.9%	67.9%	49.3%-82.1%
STRONG	3	2/3	65.9%	66.7%	20.8%-93.9%
SOLID	8	6/8	62.0%	75.0%	40.9%-92.9%
LEAN	7	3/7	58.6%	42.9%	15.8%-75.0%
COIN FLIP	79	52/79	52.4%	65.8%	54.8%-75.3%

These rows are read-only audit data from resolved prediction-log entries. Empty zones stay visible so missing sample areas are obvious.

How to Read These Metrics

Brier Score — measures probability calibration. 0.25 = coin flip, lower = better. Excellent

Log Loss — information-theoretic quality. 0.693 = coin flip, lower = better. Excellent

ECE (Expected Calibration Error) — avg gap between predicted and actual rates. Lower = better calibrated. Reasonably calibrated

Live Tracking Data (125 predictions scored — click to collapse)

125

Scored

66%

Accuracy

0.239

Brier

Calibration by Confidence Bucket

Predicted Range	Count	Avg Predicted	Actual Win Rate	Delta	Status
50%-55%	74	52.2%	64.9%	+12.7%	Under-confident
55%-60%	12	57.6%	58.3%	+0.7%	Well calibrated
60%-65%	8	62.0%	75.0%	+13.0%	Under-confident
65%-70%	3	65.9%	66.7%	+0.7%	Well calibrated
70%-75%	3	72.1%	66.7%	-5.4%	Over-confident
75%-80%	13	75.8%	61.5%	-14.3%	Over-confident
80%-85%	4	81.9%	50.0%	-31.9%	Over-confident
85%-90%	8	85.0%	87.5%	+2.5%	Well calibrated

Rolling Accuracy (10-fight window)

Fight 118

50%

Fight 119

50%

Fight 120

40%

Fight 121

50%

Fight 122

60%

Fight 123

70%

Fight 124

70%

Fight 125

60%

Calibration Metrics