The "cost-performance battle" in the large model field has been upgraded again! Recently, Google officially announced the release of its latest lightweight flagship model Gemini3Flash. Surprisingly, this new model, which is "fast and low-cost," not only completely replaced its predecessor as the default underlying engine for Google Search AI mode and Gemini applications, but also performed "underdog victories" in multiple real-world tests.

🚀 Three Times Faster, Prices Dramatically Reduced

For enterprises and developers, the arrival of Gemini3Flash is like a "gift coming from the sky". According to Google's official data, the model runs three times faster than 2.5Pro, but the reasoning cost has been significantly reduced: the input price is only 0.50 dollars per million tokens, a 60% reduction compared to 2.5Pro; the output price has even dropped from 10 dollars to 3 dollars.

image.png

This extreme cost-performance makes large-scale deployment of complex AI agents no longer out of reach. Combined with a 90% discount on context caching, Google is trying to set up a "moat" through price competition, making it difficult for competitors to respond.

 Amazing Intelligence: Surpassing the Flagship in Programming?

If being cheap and fast was expected, the "intelligence" performance of Gemini3Flash exceeded expectations. In the authoritative SWE-Bench Verified ranking that measures coding ability, the Flash version scored 78%, surpassing the higher-end flagship Gemini3Pro directly.

Furthermore, it introduced an innovative "Thinking Level" adjustment feature. Developers can switch between "low latency/low cost" and "deep reasoning" like adjusting volume. This means when handling simple daily conversations, it can respond like lightning; while facing complex programming error correction or legal document analysis, it can automatically "take a deep breath," mobilizing more computing power to ensure accuracy.

image.png

 The Era of Everyone Using AI: Everyone Is a "Vibe Coder"

The full release of Gemini3Flash indicates that AI is moving from "showing off" to "practical use." Through Google AI Studio or Vertex AI, developers can build responsive applications almost in real-time. Even early users have called it a "vibe coding" tool—as long as you have creativity, describe your needs in natural language, and this "powerful" model can quickly convert them into executable code logic.

As Gemini3Flash becomes the foundation of Google Search, in the future, every search, line of code, and even video analysis will be driven by this smarter and cheaper "brain".

Key Points:

  • Fast and Low Cost: Gemini3Flash is three times faster, with reasoning costs down to around 20% of the previous generation, breaking the myth that "high performance must mean high prices".

  • 🏆 Programming Performance "Upset": Scored 78% in the SWE-Bench test, astonishingly surpassing Gemini3Pro, becoming one of the most cost-effective coding models today.

  • 🎚️ Dynamic Reasoning Control: Added a "Thinking Level" parameter, allowing developers to manually adjust the AI's reasoning depth and response latency based on task difficulty.