Google Unveils Gemini 3: A 1 Million Token Context Window Competing with GPT-5.1, Ranking Top on LMArena

Alphabet's Google officially launched Gemini 3, offering a context window of up to 1 million tokens for the first time, supporting native multimodal reasoning for text, images, videos, and code. According to the official, Gemini 3 Pro achieved an accuracy rate of 91.9% on the GPQA Diamond graduate-level test, and scored 1501 Elo on LMArena, surpassing GPT-5.1 and Claude 4.5, becoming the highest-scoring model on the current public leaderboard.

Gemini 3 features a new Deep Think enhanced reasoning mode, which productizes the reasoning chain through "thought signatures" and "thinking levels." It achieved a score of 45.1% on ARC-AGI-2, setting new SOTA in multi-step logic, factual accuracy, and understanding of scientific charts. Google also launched the Google Antigravity development platform, supporting "agent-based coding" and "visual coding." LiveCodeBench Pro has an Elo of 2439, and Terminal-Bench 2.0 terminal operation accuracy is 54.2%, enabling autonomous completion of the entire data crawling, analysis, reporting, and deployment workflow.

Gemini 3 is now available to Google AI Ultra subscription users, and will be gradually rolled out to the Gemini app, AI Mode search, and enterprise-grade Vertex AI over the next few weeks. Google stated that the model was trained on its self-developed TPU v6 Pods, combined with a 90% search market share and 2 billion monthly active users for "AI Overviews," accelerating the transition of AI from laboratories to production lines.

Google Search AI Overview Accuracy is Only 90%, Easily Affected by False Information

According to The New York Times, the accuracy of Google's AI Overview feature is about 90%. With Google's annual search volume exceeding 5 trillion searches, this means that millions of incorrect answers may be generated every hour, and nearly a million pieces of incorrect information per minute. An assessment by startup company Oumi showed that the accuracy of Google's Gemini model increased from 85% in October last year to 91% in February this year.

Google Unveils Gemini 3: A 1 Million Token Context Window Competing with GPT-5.1, Ranking Top on LMArena

Related Recommendations

DeepMind CEO Criticizes AI Layoff Theory: Replacing Developers with AI is a Major Mistake

Google Search AI Overview Accuracy is Only 90%, Easily Affected by False Information

Google NotebookLM Launches New Cinematic Video Overview Feature

Browser Becomes AI Assistant: Gemini 3 Fully Integrated into Chrome Opens the Agent Era

Google Search Undergoes Major Changes: Shifting from Link Indexing to an AI Chat Center