- Extra Trees / Random Forest: tree ensembles used to catch nonlinear interactions between team form, market-like team strength, roster inputs, and sport-specific tabular features.
- Logistic L2: a regularized linear baseline. If this wins, the signal is mostly smooth and additive rather than heavily interaction-driven.
- Hist GB: histogram gradient boosting, useful for nonlinear tabular signals with compact training time.
- MLP: a neural tabular baseline that tests whether dense feature interactions beat the tree and linear baselines.
- Graph Role / GraphRL: graph-style role features and policy-inspired features that represent matchup structure instead of only flat team rows.
How the betting lab models were tested, what data they saw, and where each model family performs best.
Model Performance
This page explains the model comparison behind Betting Lab. We tested several model families against historical match outcomes, compared their probability quality, and kept the live lab focused on model-vs-market edge instead of mixing methodology into the betting workflow.
- ROC-AUC: ranking quality. Higher means the model more often gives stronger win probability to the team that actually won.
- Accuracy: simple winner classification rate after converting probability to a pick. Helpful, but less informative than calibration metrics.
- Brier score: probability calibration error. Lower is better because it punishes confident wrong probabilities.
- Realism score: a simulation sanity score used internally to check whether the model stack produces believable sport-specific outcomes and distributions.
| Sport | Rows | Window | Source frame |
|---|---|---|---|
| EPL | 2,561 | 2019-2026 | EPL match-level history joined with FPL-style team/player context. |
| NBA | 10,835 | 2018-2026 | NBA tabular feature file with game outcomes, team context, and roster-derived features. |
| NFL | 7,276 | 1999-2026 | NFL tabular features covering a longer historical window because season sample sizes are smaller. |
| NHL | 11,266 | 2018-2026 | NHL tabular features with a wide column set, including team and game-context signals. |
Betting Lab should not blindly pick the highest historical AUC model for every sport. The live page uses model probability, market probability, EV, and Kelly preview together. A model can rank games well but still need calibration checks before stake sizing. That is why this report separates evaluation from the live betting workflow.
Current read: NBA and EPL have the strongest predictive separation. NFL is usable but more sensitive to season context and sample size. NHL is the weakest in AUC, so NHL edges should be treated more conservatively until richer calibration improves.