The World Cup has only been played for a few days, yet some AI prediction models have become legendary while others have flopped

Asher

Odaily资深作者

@Asher_0210

2026-06-15 08:50

This article is about 3093 words, reading the full article takes about 5 minutes

On the first day of predictions, Qwen gave its all, while Grok and Claude mostly resembled NPCs.

AI Summary

Expand

Core Insight: Multiple major AI models (Qwen, Copilot, ChatGPT, etc.) have been used to predict World Cup match results, with their accuracy varying significantly. Qwen stood out on the first day by accurately predicting the score and the red card risk; Copilot and ChatGPT showed some highlights but were generally conservative in identifying upsets and draws. This suggests that AI can serve as a reference, but it is still far from being a definitive answer.
Key Points:
1. Qwen accurately predicted the opening match score of Mexico 2:0 South Africa and the associated red card risk, and correctly forecast South Korea's 2:1 comeback win against the Czech Republic. This created a sense of “scripted drama,” drawing significant attention to AI predictions.
2. Copilot's full tournament predictions had some highlights (e.g., Brazil 1:1 Morocco) but clearly failed in predicting upsets like Australia defeating Turkey and Japan holding the Netherlands to a draw, revealing its conservative judgment regarding underdogs.
3. ChatGPT's analytical logic was thorough, with well-reasoned predictions for the opening match, but it also failed to accurately identify matches where results deviated from on-paper strength, such as Qatar drawing with Switzerland and the Netherlands being held by Japan.
4. Models like Gemini, Grok, and Claude showed mixed results in individual match predictions. For example, Gemini hit the opening match score, while Grok and Claude failed to predict the exact scores. The sample size is too small to form a clear ranking.

Original Article: Odaily Planet Daily (@OdailyChina)

Author: Asher (@Asher_0210)

The most exciting place during this World Cup isn't just on the pitch.

As interest in World Cup-related prediction events heats up, more and more users are engaging in trades with real money. Who will win, what will be the score, will there be an upset, will there be a red card, which player will score — these topics, once just pre-match chatter among fans, are now being broken down into tradable prediction events.

When predictions become trades, users need more than just emotion and intuition. Odds changes, team form, injury information, historical head-to-heads, and market sentiment all become references before trading. In this process, AI models are increasingly being brought into World Cup prediction scenarios.

Major models like Qwen, ChatGPT, Gemini, Claude, DeepSeek, Qwen, and Copilot can not only answer "which team is more likely to win" but also provide score predictions, upset probabilities, red card risks, key player performances, and match trend analyses. For prediction market participants, AI's pre-game reasoning is becoming another layer of reference alongside odds, news, team data, and market sentiment.

However, predictions must ultimately return to the game itself.

With the World Cup officially underway, the results of the first few matches are already in. The AI analyses that users turned to for pre-game guidance now have real-world answers to compare against: Were the scores predicted correctly? Were upsets anticipated? How many details like red cards, last-minute winners, and match momentum were actually captured by the models?

The First to Go Viral: Qwen

The most entertaining outcome on the first day of the World Cup undoubtedly involved Qwen.

For the opening match between Mexico and South Africa, Qwen's pre-game prediction was Mexico 2-0 South Africa. After the match, the score indeed ended 2-0. Even more noteworthy, the three red cards shown during the match closely aligned with Qwen's pre-game risk assessment that "South Africa's overly aggressive defending could lead to an early numerical disadvantage."

Predicting a Mexico win wasn't particularly surprising; as one of the host nations, Mexico was the favorite. However, Qwen nailed the more specific match details: the 2-0 scoreline, South Africa's red card risk, and the gradual shift in momentum during the middle and later stages of the game.

Following that, for the South Korea vs. Czech Republic match, Qwen predicted a 2-1 win for South Korea.

This match was not an easy one to predict beforehand. The Czech Republic had physical strength, set-piece threats, and the experience typical of European teams. The match itself wasn't one-sided; the Czechs took the lead first, South Korea equalized, and the game remained deadlocked at 1-1 for a long period. It wasn't until the final stages that South Korea scored the winning goal, making the final score 2-1.

This gave Qwen's prediction a much stronger "storyline" feel. Predicting the winner can be based on paper strength, and guessing the score involves some luck, but details like red cards, comebacks, and late winners are what truly make people think, "there's something to this." After the first two days, Qwen successfully drew attention to the use of AI for World Cup predictions.

Copilot: Brilliant Hits and Clear Misses

Before the tournament, USA Today had Copilot predict all 104 matches of this World Cup. Based on the matches that have concluded, this prediction set includes both highlights and clear misses.

Among them, three predictions were most impressive.

For the opener, Mexico vs. South Africa, Copilot predicted Mexico 2-0, the exact final score. For South Korea vs. Czech Republic, it predicted South Korea 2-1, also matching the result. And for Brazil vs. Morocco, Copilot forecasted a 1-1 draw, which is precisely how it ended — with Morocco holding Brazil.

The Brazil 1-1 Morocco prediction is particularly noteworthy. Brazil is a traditional powerhouse with top-tier squad and attention. Although Morocco reached the semi-finals of the last World Cup, predicting a draw against Brazil wasn't a particularly safe choice. Yet, when the match ended, Brazil failed to secure an opening win, and Morocco once again showed its resilience in major tournaments. Copilot's prediction here was a stroke of genius.

But Copilot's weaknesses quickly became apparent.

It predicted Canada would beat Bosnia and Herzegovina 2-1, but the match ended 1-1. It predicted Switzerland would narrowly beat Qatar 1-0, but Switzerland was also held to a draw. It predicted the USA would win 2-0 against Paraguay; while the direction was correct, the actual score was 4-1, significantly underestimating the offensive output.

More obvious misses occurred in matches with upsets and strong teams being thwarted.

For Turkey vs. Australia, Copilot predicted a 2-1 win for Turkey, but Australia pulled off a 2-0 upset. For Ecuador vs. Ivory Coast, it predicted Ecuador 2-1, but Ivory Coast won 1-0. For Netherlands vs. Japan, it predicted Netherlands 2-1, but Japan equalized twice for a 2-2 draw. For Sweden vs. Tunisia, it predicted 1-1, but Sweden won emphatically 5-1.

Copilot managed to hit the exact scores for Mexico, South Korea, and Brazil, showing it doesn't just favor popular teams. However, matches like Australia beating Turkey, Qatar drawing with Switzerland, and Japan drawing with Netherlands reveal that its judgments on underdogs and draws still tend to be conservative.

ChatGPT: Comprehensive Analysis, but Less Accurate on Upsets

Compared to Copilot's full tournament schedule prediction, ChatGPT is more like a "pre-game analyst."

For the opening match prediction, ChatGPT predicted Mexico 2-0 South Africa, nailing the final score. Its reasoning was also quite complete, citing Mexico's home advantage, recent form, South Africa's attacking struggles, and factors like Mexico City's high altitude and home atmosphere. In this instance, ChatGPT didn't just provide the result; the logic behind its prediction aligned well with the match outcome.

However, when it came to predicting the full World Cup schedule, ChatGPT's consistency wavered. While it correctly predicted Mexico 2-0 South Africa and Brazil 1-1 Morocco, and also got the win/loss direction right for matches involving Scotland, Germany, and Sweden, it made incorrect calls on games like South Korea 2-1 Czech Republic, Qatar 1-1 Switzerland, Australia 2-0 Turkey, and Japan 2-2 Netherlands. In these cases, ChatGPT predicted wins for the teams with stronger on-paper strength (e.g., Switzerland over Qatar, Turkey over Australia, Netherlands over Japan).

ChatGPT is not without predictive power. It can clearly break down team strength, home environment, and recent form, and it can predict scores for some matches. But based on current results, it seems better at explaining "why the favorite is more reasonable" than at identifying which games might deviate from the favored narrative.

Gemini, Grok, Claude: Different Models Write Different Scripts for the Same Match

Besides Qwen, Copilot, and ChatGPT, some social media users fed the same match details to multiple models for pre-game predictions.

Taking the opening match between Mexico and South Africa as an example, one blogger simultaneously tested ChatGPT, Gemini, Grok, and Claude for pre-game predictions. The results showed that ChatGPT and Gemini both predicted Mexico 2-0 South Africa, matching the final score. Grok predicted Mexico 2-1, and Claude predicted Mexico 3-1; while both correctly predicted a Mexico win, they didn't hit the exact scoreline.

For this opening match prediction, the different models produced three distinct "scripts." ChatGPT Go and Gemini Pro were closer to the actual game: Mexico dominated, South Africa's attack was ineffective, and they were eventually shut out. Grok seemed to offer a more open scoreline, suggesting South Africa might manage a consolation goal. Claude Sonnet set Mexico's attacking expectations higher, resulting in a more wide-open 3-1 outcome.

Summary

Given the limited sample of AI predictions available for review, it is currently premature to definitively determine which model understands football best.

However, even based on the few matches that have concluded, differences are already emerging. Qwen has been the most memorable so far, consecutively hitting Mexico 2-0 South Africa and South Korea 2-1 Czech Republic on the first day, while also detecting red card risks and match momentum. It's a standout performance in a small sample. Whether it can maintain this hit rate needs verification with more matches.

Copilot and ChatGPT both had moments of brilliance, hitting specific scores. However, they also share a common weakness: their judgment was insufficiently sensitive when predicting outcomes that deviated from on-paper strength, such as Australia beating Turkey, Qatar drawing with Switzerland, and Japan drawing with Netherlands.

As for models like Gemini, Grok, and Claude, currently available public samples are mostly limited to single matches or social media comparisons. While their references have value, it's too early to rank them definitively.

AI can already serve as a layer of reference for World Cup prediction market users, but it is far from being the definitive answer. Going forward, Odaily Planet Daily will continue to collect pre-match predictions from various models and track their performance as the tournament progresses: to see which models were just lucky initially, and which can truly withstand the test of results over more matches.

Prediction Market

Welcome to Join Odaily Official Community