Google has made a major upgrade to its artificial intelligence ecosystem, officially integrating the native "computer use" tool directly into the Gemini 3.5 Flash model, fully replacing the previous Gemini 2.5 testing framework. This move marks the acceleration of artificial intelligence from being a mere "conversational partner" to becoming an actual "digital colleague," driving AI agents from concept to practical implementation.

Through the Gemini API, developers can now build intelligent agents using the native capabilities of Gemini 3.5 Flash. These agents no longer rely on complex low-level code writing but instead navigate applications intuitively by perceiving and understanding visual information like screenshots, enabling them to automatically perform various complex desktop tasks.

This shows great potential in scenarios such as office automation, software testing, and cross-platform data processing, including automating website browsing, filling out long forms, clicking interface buttons, and efficiently handling repetitive data collection tasks across desktop, mobile, and browser environments. To accelerate this ecosystem development, Google has set up a real-time demonstration space on Browserbase for developers to immediately test the features of the Gemini enterprise agent platform.

QQ20260625-141617.jpg

Facing the potential security challenges brought by granting AI control over the mouse and keyboard, such as the risk of indirect instruction injection, Google emphasized that targeted adversarial training has been implemented to enhance the model's defensive capabilities. At the same time, Google also launched two enterprise-level security systems: one allows companies to set up software requiring AI to obtain explicit human approval before performing sensitive or permanent changes; the other can immediately freeze running tasks when detecting potential attacks, providing multi-dimensional protection for user desktop security.

Complementing this model upgrade, Google also released the stable version of Chrome 149 on the same day. This version introduces a practical feature called "Select from Screen," which users can enable in the browser's attachment menu. By dragging a box to select any image or text on the current tab, it can be instantly added as a prompt for Gemini, greatly improving the convenience of interactive questioning based on web content.

Google's integration of native computer usage tools into Gemini 3.5 Flash not only deepens the combination of its AI model with the operating system but also indicates that the AI industry is shifting from pursuing large model parameters to focusing on practical tool usage and task execution capabilities. This trend will accelerate the adoption of AI agents in enterprise automation and consumer services, reshaping human-computer interaction and software application forms, making more advanced autonomous AI agents possible.