Methodology — CupProbs World Cup 2026 Forecast

← Back to the forecast

The forecast turns match results and betting-market prices into a probability for every remaining outcome. It is a hybrid rating + market blend fed through a Monte-Carlo simulation of the rest of the tournament. Everything is open source and reproducible from a fixed random seed.

In one line: rate every team from history, player value, and the market → turn each matchup into expected goals and a scoreline distribution → simulate the remaining 104 matches 100,000 times → count how often each thing happens.

1. Team strength

Each team gets a single strength rating on an Elo scale. That rating is an ensemble of three independent signals, each capturing something the others miss:

a. History Elo

A World-Football-Elo rating computed over ~49,000 international matches (1872–present). Every result moves the two teams' ratings by K · (actual − expected), with K weighted by match importance (a World Cup game counts far more than a friendly), the margin of victory, and home advantage. This rewards recent, meaningful results and is naturally continuous.

b. FIFA ranking

The latest official FIFA/Coca-Cola world-ranking points, rescaled onto the Elo scale. The published ranking is itself a points model with its own match weighting, so it is a useful second opinion that is built slightly differently from our history Elo.

c. Player value (a Pelé-style rating)

A squad-quality rating built bottom-up from players, not results. We take each nation's Transfermarkt squad market values (top 23), and adjust each player toward current ability:

an age curve (peak around 28) that discounts the forward-looking resale value baked into young players' prices;
a caps weight that downweights uncapped or fringe selections;
a league-strength factor from the club's domestic UEFA coefficient, so value earned in a stronger league counts for more.

The adjusted values are projected-XI-weighted, summed, log-transformed, and rescaled to the Elo scale. This captures sudden talent that results haven't caught up to yet.

Combining them

The three signals are blended with weights calibrated on held-out recent internationals (we fit the weights on older data and score them on data they never saw). The result favours the FIFA ranking and player value over raw history Elo (roughly 0.25 / 0.35 / 0.40 for history / FIFA / player value), with a small shrink toward the mean. Hosts (USA, Canada, Mexico) get a home-advantage bump when playing at home.

2. The market blend

Bookmaker prices are sharp — they aggregate a lot of information quickly. For every match with odds we:

De-vig the offered prices to strip out the bookmaker's margin (the "overround"), using the Shin method rather than naive proportional scaling, to recover fair probabilities;
Blend those market probabilities into the model's team strengths so the market informs the whole forecast, not just one game.

How much to trust the market versus the model was calibrated on club football (the English Premier League and Championship, where odds and results are abundant), then transferred to internationals with a discount — our ensemble is stronger than a plain results model, and World-Cup odds are softer than top-league club odds. Early in the tournament the market carries more weight (the model has little tournament-specific signal); as games are played, the model's own form and matchup detail earn more.

3. From ratings to a match

Given two teams' strengths plus context, the model produces expected goals for each side. The mapping is multiplicative, so expected goals never go negative even for huge mismatches: a rating edge tilts one side's expected goals up and the other's down symmetrically. Scorelines are drawn from a Dixon–Coles model, which corrects the low-score dependence (0–0, 1–0, 0–1, 1–1) that independent Poisson goals get wrong — and that correction matters a lot for draw rates and therefore group tables. A host edge applies only to the three host nations at home; all other games are treated as neutral.

For matches in progress, only the remaining goals are random: each side's rate is scaled by the fraction of the match left and added on top of the current score, so live win/draw/loss probabilities update with the scoreboard and the clock.

4. Simulating the tournament

We run the rest of the tournament 100,000 times. In each simulation:

completed results are fixed; every remaining group game is sampled from the match model;
each group is ranked by the full FIFA Article 13 tiebreakers — points, then head-to-head (points → goal difference → goals among the tied teams, applied recursively), then overall goal difference and goals, then fair-play conduct (simulated cards), then FIFA ranking, then drawing of lots;
the top two of every group plus the eight best third-placed teams advance; the third-placed teams are slotted into the bracket using FIFA's exact published 495-row allocation table;
the knockout rounds are played out — extra time is a shorter continuation and penalties are a near-coin-flip tilted slightly toward the stronger side — to a champion.

Aggregating across all 100,000 runs gives every number on the site: chances of finishing 1st/2nd/3rd in a group, of advancing, of reaching each knockout round, of winning the title, plus per-match win/draw/loss probabilities. The simulation runs in parallel across CPU cores and uses a fixed seed, so runs are reproducible and comparable.

Reading the bracket: because group finishes aren't settled yet, a team can reach a given match through several different bracket paths. We show each team's chance of appearing in a match at all (summed across those paths) and, next to it, its chance of winning that match if it gets there. This is why a team's title odds can be well below its odds of reaching the final — it might reach the final from more than one side of the draw.

5. Backtesting & validation

Before trusting the output we hindcast it out of sample. Rolling the ratings forward through history and scoring predictions on matches the model hadn't seen, the ensemble beats both a FIFA-ranking baseline and a base-rate baseline on log-loss (≈0.889 vs 0.922 vs 1.052 over thousands of post-2016 internationals). The goals-spread parameter was tuned on this backtest, and the blend and market weights were each fit on held-out data rather than eyeballed. We also check calibration — that, say, things we call 30%-likely happen about 30% of the time — and the internal invariants (probabilities sum to one, exactly 32 teams advance, round probabilities nest correctly).

6. Limitations

It's a model, not a crystal ball: every number is a probability, and underdogs do win.
Squad values are a snapshot and don't see late injuries or form swings between updates.
Pre-knockout, a team's exact route isn't fixed, so per-match figures sum over multiple possible bracket paths; this resolves once the groups finish.
Free-tier data feeds mean odds and live updates can lag during very busy windows.

Data & reproducibility

Ratings are built from public international-results and ranking data plus Transfermarkt squad values; live results and odds come from free feeds. The full pipeline (R) and site (vanilla JS) are open source, and every figure is reproducible from the committed data and the fixed simulation seed.

← Back to the forecast