Forty8 forecasts the 2026 World Cup with an ensemble of well-published statistical methods,
recomputed from open data after every matchday. Everything — code, data manifests, and every
forecast ever published — is open and timestamped before kickoff.
Data
Match history comes from the community-maintained international results dataset (1872–present, CC0);
the schedule from the public-domain openfootball project. Every input file's SHA256 goes into a manifest,
so any published probability can be traced back to its exact input bytes.
Model
Ratings: Elo with constants fitted on out-of-sample predictive likelihood
(not the folk defaults), plus Pi-ratings and a rating-uncertainty term for teams the data knows less about.
Goals: a time-decayed attack/defence Poisson model with the Dixon-Coles
low-score correction and shrinkage toward confederation means — this supplies full scoreline
distributions, which group tiebreakers need.
Market (when odds are supplied): bookmaker outright odds, de-vigged with
Shin's method, averaged on the logit scale, and inverted through tournament simulation into
per-team strengths.
Ensemble: the heads are combined in a log-opinion pool with weights fitted on
a held-out window, then temperature-calibrated. Calibration is verified with reliability tables,
not asserted.
Simulation: 100,000 tournament runs through the real 48-team format:
group tiebreakers, ranking of third-placed teams, bracket assignment, extra time, shootouts.
Championship odds carry 90% intervals.
Honest limitations
Expected accuracy is ~55% on three-way results — that is the state of the art, and roughly
where bookmakers sit. Anyone claiming much more is leaking or cherry-picking.
Shootouts are modeled as near coin flips; the evidence for a skill effect is weak.
Squad/injury information enters only when squad data is loaded; between updates a late
injury is invisible to the model.
Group tiebreakers beyond points, goal difference and goals scored (head-to-head subtables,
fair play) are approximated by seeded randomness.
Backtests are walk-forward (no leakage), but four tournaments is a small sample —
uncertainty on the model's own quality is real.
Pre-registration
Every forecast set is committed to a public, append-only history before kickoff and never
edited afterwards. Judge us on the archive, not on highlights.
Use of these numbers
This is not betting advice. Markets already incorporate most of this information.
If you choose to gamble, do it responsibly and within your means.