The results of the Alpha Arena 1.5 season, hosted by the quantitative platform nof1.ai, have been revealed: the experimental model with the internal code name Grok4.20 from xAI won with a return rate of +12.11%, turning $10,000 into $12,193 in 14 trading days, becoming the only large language model to achieve positive returns; during the same period, GPT-51 and Gemini 3.0 suffered losses of 3.4% and 5.7%, respectively.

Unintervened under four "Hell Modes"

The competition rules prohibit any human intervention. The models must automatically switch between the "Ascetic Mode" (high leverage restrictions) and the "Context-Aware Mode" (can observe opponents' positions). In the context-aware round, Grok4.20 established a 10x leveraged long position in Palantir (PLTR) two hours in advance. On that day, the retail investor sentiment index surged by 38%, and it closed with a profit of 11.4%, which was called "a textbook case of sentiment arbitrage" by the organizer.

Real-time X data stream becomes the key weapon

The organizer disclosed that Grok4.20 can call X (Twitter) Firehose in milliseconds, processing an average of 68 million English tweets per day. It uses an embedded sentiment-price model to generate ultra-short-term signals within 1-5 minutes. By contrast, GPT-51 can only use news summaries delayed by 15 minutes, and Gemini 3.0 relies on financial reports and SEC filings, with information timeliness lagging by over 30 minutes.

Elon Musk personally "liked," Grok5 is on the way

Elon Musk, founder of xAI, posted after the competition: "Grok knows the vibes. 4.20→5.0🚀", hinting that the next-generation Grok5 will upgrade the real-time sentiment engine into a multimodal "market-social-macro" three-dimensional framework. Rumors suggest that xAI plans to launch the "Grok Trader API" for institutions in Q1 2025, with an annual fee of up to $500,000. So far, it has received intention orders from more than 20 hedge funds.

Wall Street sounds the alarm

The CEO of nof1.ai stated that the competition aims to explore the feasibility of "LLM placing orders directly." The results prove that emotional data combined with reinforcement learning can generate excess returns. "When models can read retail memes within 2 hours, the high-frequency advantage of traditional quantitative funds will be weakened," he said. However, he also warned that a victory in a single cycle does not indicate strategy stability. Future seasons will introduce T+0 two-way trading, options, and cryptocurrencies to further test the model's adaptability.

Industry signal: AI trading enters the "real-time sentiment" stage