Markets Model Performance
Model report

How the betting lab models were tested, what data they saw, and where each model family performs best.

4
Sports tested
31k+
Games in rich sets
7
Model families
Evaluation report · out-of-sample comparison

Model Performance

This page explains the model comparison behind Betting Lab. We tested several model families against historical match outcomes, compared their probability quality, and kept the live lab focused on model-vs-market edge instead of mixing methodology into the betting workflow.

  • Extra Trees / Random Forest: tree ensembles used to catch nonlinear interactions between team form, market-like team strength, roster inputs, and sport-specific tabular features.
  • Logistic L2: a regularized linear baseline. If this wins, the signal is mostly smooth and additive rather than heavily interaction-driven.
  • Hist GB: histogram gradient boosting, useful for nonlinear tabular signals with compact training time.
  • MLP: a neural tabular baseline that tests whether dense feature interactions beat the tree and linear baselines.
  • Graph Role / GraphRL: graph-style role features and policy-inspired features that represent matchup structure instead of only flat team rows.
  • ROC-AUC: ranking quality. Higher means the model more often gives stronger win probability to the team that actually won.
  • Accuracy: simple winner classification rate after converting probability to a pick. Helpful, but less informative than calibration metrics.
  • Brier score: probability calibration error. Lower is better because it punishes confident wrong probabilities.
  • Realism score: a simulation sanity score used internally to check whether the model stack produces believable sport-specific outcomes and distributions.
SportRowsWindowSource frame
EPL2,5612019-2026EPL match-level history joined with FPL-style team/player context.
NBA10,8352018-2026NBA tabular feature file with game outcomes, team context, and roster-derived features.
NFL7,2761999-2026NFL tabular features covering a longer historical window because season sample sizes are smaller.
NHL11,2662018-2026NHL tabular features with a wide column set, including team and game-context signals.

Betting Lab should not blindly pick the highest historical AUC model for every sport. The live page uses model probability, market probability, EV, and Kelly preview together. A model can rank games well but still need calibration checks before stake sizing. That is why this report separates evaluation from the live betting workflow.

Current read: NBA and EPL have the strongest predictive separation. NFL is usable but more sensitive to season context and sample size. NHL is the weakest in AUC, so NHL edges should be treated more conservatively until richer calibration improves.