Google's long-awaited AI vision is finally becoming a reality with the release of . Today, jointly announced that the "task automation" feature based on Gemini has entered the Beta testing phase. This feature marks a transformation of AI assistants from mere "information seekers" to "digital assistants" capable of performing cross-app tasks, simulating human operations to complete complex processes such as ordering food and hailing a taxi.

image.png

Visual Impact: Watching the Phone "Use Itself"

Differently from traditional API integration, 's automation feature simulates real user operations within a virtual window:

  • Smart Taxi Hailing: When you give the instruction "Hail a taxi to the airport," will automatically open Uber, confirm the specific terminal (if there are multiple terminals, it will ask proactively), and automatically fill in the destination.

  • Ordering Food: When processing the instruction "Order me a coffee and a croissant," the AI will independently scroll through the screen to find specific items on the Starbucks menu (such as Flat White), and even handle complex scrolling selections like a human would.

Security Logic: Human Control at Key Points

To avoid the risks associated with autonomy, Google has implemented a strict human review mechanism in the automation process:

Explicit Operation: Users can watch 's every step in real-time and take control or terminate the automation process at any time.

Last Confirmation: Before submitting an order or payment, the system will stop at the payment screen, waiting for the user to verify the details and manually click "Confirm," ensuring that each transaction is completed under controlled conditions.

Currently, this feature is prioritized for delivery and ride-hailing applications. For and subsequent users, the phone is no longer just a carrier for running apps but a "super agent" that can understand natural language intent and convert it into specific actions.

Although AI occasionally appears somewhat "clumsy" in scrolling menus and identifying options, this automation model that does not require deep API adaptation and instead works directly with UI interactions greatly expands the application boundaries of AI assistants. With algorithm iterations, we are moving away from the era of repeatedly switching between apps and entering a truly intelligent stage where all small tasks can be completed with a single sentence.