Google DeepMind has officially released the preview version of Gemini 3.1 Flash-Lite, marking the arrival of the fastest and most cost-effective member of the Gemini 3 series. As an iteration of Gemini 2.5 Flash-Lite, the new model maintains a speed of over 360 tokens per second and an average response time of 5.1 seconds, while achieving a significant leap in intelligence level. According to the Artificial Analysis Intelligence Index monitoring, the model's score increased by 12 points to 34 points, and it demonstrated strong human preference competitiveness on the Arena.ai leaderboard with an Elo score of 1432.

In core dimensions such as multimodal capabilities and scientific reasoning, Gemini 3.1 Flash-Lite performs exceptionally well, scoring 86.9% in the GPQA Diamond test and achieving a 76.8% accuracy rate in the MMMU-Pro benchmark test. Its performance has surpassed heavy models like Claude Opus 4.6 and Kimi K2.5. Notably, this model allows developers to customize the "depth" of thinking, enabling it to flexibly adapt to a wide range of scenarios, from simple automation translation to complex UI building.

However, the dual advancements in performance and speed come with significant cost adjustments. The price for one million input tokens of Gemini 3.1 Flash-Lite has been raised to $0.25, and the output price has increased sharply from $0.40 to $1.50, nearly tripling the previous cost.
This pricing strategy reflects the cost pressures faced by model providers when balancing fast inference and high-precision logic. With the release of this model for testing on Google AI Studio and Vertex AI, the lightweight model market is transitioning from simple "low-price competition" to a new phase of "high-performance logic accessibility."
