AI Prediction Record: Want to Make Money in Prediction Markets with AI? But It Might Not Even Read the Question Properly

Odaily资深作者

2026-01-04 08:41

This article is about 2312 words, reading the full article takes about 4 minutes

Originally hoping to use collective intelligence for a dimensional strike, but without proper guidance, AI still frequently experiences hallucinations.

AI Summary

Expand

Core Viewpoint: AI outperforms some humans in prediction markets.
Key Elements:
1. Grok's win rate is 75%, higher than the human rate of 66.7%.
2. AI relies on search and logic but is prone to misjudgment.
3. AI predictions do not rely on market data, avoiding herd mentality.
Market Impact: AI may become a new market analysis tool.
Timeliness Note: Medium-term impact.

Original | Odaily (@OdailyChina)

Author｜Nan Zhi (@Assassin_Malvo)

After many sectors have been proven false, prediction markets have become one of the few sectors within the Crypto space that is still experiencing positive growth. On November 20th, Nan Zhi began attempting to find "smart money" in prediction markets using the same approach used to find smart money in Meme coins last year, and achieved good initial results.

In early December, coinciding with the launch of Gemini 3 Pro, while testing related models, the idea arose of whether AI could be used to analyze and predict prediction markets, pitting humans against AI to see which side makes more accurate predictions.

When introducing prediction markets, they are often described as moving the market closer to the "truth" by "allowing people with insights to place bets with real money." However, some argue that Crypto+prediction markets allow "insiders" to safely profit from information asymmetry, thereby driving the market towards the "insider outcome." This is essentially a clash between the concepts of "wisdom of the crowd" and "truth being held by a few." AI prediction leans more towards "wisdom of the crowd," thus requiring a vast amount of available knowledge and insights.

Therefore, regarding the selection of AI models, Gemini and Grok were initially chosen because they rely on Google and the X platform, respectively, allowing for the most direct access to vast amounts of knowledge and insights. Recently, Nan Zhi added the combination of "Doubao + Douyin Knowledge," but due to the limited number of prediction questions involving this combination, it is not covered in this article.

Basic Rules

AI Versions: Gemini 2.5 pro (with built-in Google Search), Grok 4 Fast (called via OpenRouter, with native search function enabled)
Question Selection: Humans select the betting questions, and AI follows with predictions, but the Crypto category is excluded.
Input Content: Official question (title), official description (Description), and optional answers (which are essentially only Yes and No).

Note: Polymarket's questions are divided into main categories (Events) and subcategories (Markets). Main category Events are broad questions like "Who will be the next Fed Chair?" or "When will Saylor sell Bitcoin?". An Event contains N sub-markets, such as "Will Hassett become the next Fed Chair?" or "Will Saylor sell Bitcoin before March 31, 2026?". To align with human predictions, Markets were chosen as the questions for AI judgment. Other options are not input; for example, the AI is only asked to judge "Will Hassett become the next Fed Chair?" rather than asking it to choose the most likely candidate from N possibilities.

Prompt Design:
Require AI to search for the latest news, official announcements, and expert analysis reports.
Require the exclusion and prohibition of using prediction market data.
Make judgments based on "evidence" and logical reasoning.
Only allow output of Yes or No, accompanied by a paragraph explaining the reasoning logic.

Current Results

Among the prediction questions, 21 have been settled. Grok has the highest win rate at 75%, humans at 66.7%, and Gemini the lowest at 52.4%. The current results can be viewed on the relevant website.

What Mistakes Did the AI Make?

Gemini Occasionally Misjudges the Current Time

In the question "Will Trump's approval rating hit 35% in 2025?", Gemini stated that it is currently the first half of 2025, so anything is possible, and gave a random answer.

However, when the author used a program to directly ask Gemini to output the current time, Gemini was able to give the correct answer. It is still unclear why such an error in time perception occurred.

AI Lacks Sufficient Depth of Thought

In the question "Gemini 3.0 Flash released by December 16?", Grok reasoned that "officials have recently only mentioned Gemini 3 Pro and 2.5 related versions, with very few mentions of 3 Flash, therefore there is insufficient evidence to make a judgment," considering only current information.

Meanwhile, Gemini pointed out that "Gemini 1.0 was released in December 2023, and the experimental version of Gemini 2.0 Flash was launched in December 2024. Continuing this pattern, releasing a 3.0 version by the end of 2025 is logical," and also noted "a leaked demo about 'Gemini 3.0 Flash' circulating in online communities recently (December 14, 2025), further increasing the likelihood of its imminent public release."

Although, from a conclusion standpoint, Gemini's answer turned out to be wrong, in this question, a clear gap in the breadth of information relied upon by the two models is evident.

AI Infers Based on Common Sense Rather Than Evidence+Logic

In the question "Trump approval Up or Down this week?", Gemini stated that "predicting a single week's approval poll rating more than a year in the future is highly uncertain," first showing another instance of "time misjudgment." Then Gemini said that "in any given ordinary week, the probability of events causing a slight decline in approval ratings might be slightly higher than the probability of positive events significantly boosting them," so a decline in approval rating is more likely. The generated conclusion was based solely on subjective common-sense assumptions.

In this question, Grok based its reasoning on news reports and polling data regarding "government shutdown, economic concerns, immigration policy disputes, and negative backlash from comments on Rob Reiner's death," which aligns with the design expectations.

Incorrect Judgment of Settlement Conditions

In the question "Will Trump release the Epstein files by December 20?", both Gemini and Grok already knew that "the government will release 'hundreds of thousands of pages' of documents on Friday (December 19th)." The settlement conditions clearly stated that "if the government publicly releases any files related to Epstein's illegal activities that were not public before the listed date, it will be judged as Yes."

However, under this condition, Gemini stated that "completing the release of 'all' files by December 20th is impossible," clearly misjudging the conditions required for settlement, and thus gave the wrong answer.

Summary

In summary, Grok's prediction win rate has already surpassed that of the "smart money" that has made hundreds of thousands or even millions of dollars in profit on prediction markets. However, upon deeper examination of its prediction logic, there are still many areas that can be guided and corrected.

Gemini

Prediction Market

Welcome to Join Odaily Official Community