On March 12, 2026, xAI officially launched its new generation large language model Grok4.20Beta, which set a new industry record with extremely high factual reliability while maintaining competitive pricing.

According to the latest assessment by Artificial Analysis, Grok4.20 scored 48 points in the Intelligence Index with reasoning capabilities, an increase of 6 points from the previous version. Although it still lags behind Gemini3.1Pro Preview and GPT-5.4 (both scoring 57 points) in comprehensive benchmark tests, its performance in the AA omniscient test was outstanding, with a non-hallucination rate as high as 78%, effectively solving the common problem of AI models fabricating false information.

QQ20260313-091756.jpg

In terms of product matrix and engineering parameters, xAI simultaneously launched three API versions: one with reasoning capabilities, one without reasoning capabilities, and one in multi-agent mode. The model supports a context window of up to 2 million tokens, and its pricing strategy is highly market-penetrating, with costs ranging from $2 to $6 per million tokens, significantly lower than Grok4. Technically, Grok4.20 shows strong restraint when facing unknown areas, significantly increasing the frequency of acknowledging "not knowing," with an error rate of about one-fifth.

Grok, Musk, xAI

Currently, the global competition among large models has shifted from merely focusing on parameter scale to a dual contest of reasoning depth and factual accuracy. The release of Grok4.20 marks that xAI is trying to build a differentiated advantage by enhancing "honesty" and "low hallucination rate" in its pursuit of artificial general intelligence (AGI). This extreme pursuit of factual reliability not only enhances the practical potential of AI in rigorous industries but also provides a more solid foundation for information trust in future multi-agent collaboration.