Alibaba Tongyi Lab recently released MAI-UI, a family of multi-modal general-purpose GUI intelligent agents. This system not only enables human-computer interaction but also integrates MCP tool usage, device and cloud collaboration, and online reinforcement learning, achieving leading results in general GUI foundations and mobile GUI navigation, surpassing competitors such as Gemini2.5Pro, Seed1.8, and UI-Tars2.

MAI-UI is built upon Qwen3VL, featuring models of different scales, including 2B, 8B, 32B, and 235B A22B. These models can receive natural language instructions and UI screenshots as input and output structured operations, supporting actions in real-time Android environments. These operations include clicking elements, swiping, entering text, and pressing system buttons. Additionally, MAI-UI introduces the ability to answer user questions, request clarification on ambiguous goals, and perform clear actions, while calling external tools via MCP tools, allowing the agent to mix GUI steps, direct language responses, and API-level operations within the same trajectory.

On top of the GUI, MAI-UI ensures the robustness of its navigation capabilities through a self-evolving data pipeline and an online reinforcement learning framework. Tongyi Lab used seed tasks obtained from application manuals, design scenarios, and public data, executed by multiple agents and human annotators, to generate task trajectories and optimize navigation behavior.
In the MobileWorld benchmark test, MAI-UI demonstrated its excellent performance with a success rate of 41.7%. In the AndroidWorld benchmark test, MAI-UI achieved a maximum variant success rate of 76.7%, surpassing other similar products.
The release of MAI-UI marks significant progress in GUI intelligent agent technology in the mobile application field, making smart devices more efficient and intelligent when handling complex operations.
github:https://github.com/Tongyi-MAI/MAI-UI
Key Points:
🌟 MAI-UI is a family of GUI intelligent agents introduced by Alibaba Tongyi Lab, integrating multiple advanced technologies.
📱 MAI-UI supports various operations and can perform complex user interactions in real-time Android environments.
🚀 MAI-UI's performance significantly surpasses competitors in benchmark tests such as MobileWorld and AndroidWorld.
