When "one sentence to get it done" turns from a marketing slogan into a real experience, phones finally start to truly understand human intentions. ZTE's recently launched Nebula-GUI small model is deeply integrating AI agents into the phone operating system, turning flagship models like Nubia Z70 Ultra and Z80 Ultra into "personal assistants" - no need to open an app, just a voice command such as "book me a high-speed train ticket to Shanghai tomorrow afternoon" or "take a photo of this cake with the food mode," and the phone can automatically complete the entire process across apps.

This capability is backed by a major breakthrough in ZTE's offline edge AI agent field. Recent evaluations show that Nebula-GUI, with only 7 billion parameters, won a silver medal in authoritative offline mobile GUI Agent tests, achieving a comprehensive score of 84.38. In high-complexity tasks such as automatic ticket booking and dining reservations, its operation speed and accuracy significantly outperform similar solutions. More importantly, it does not require an internet connection, and all reasoning is completed on the device, ensuring response speed and user privacy.

image.png

Currently, Nebula-GUI covers more than 30 mainstream apps, including 12306, Meituan, Gaode, WeChat, and Alipay, with an average task completion accuracy of over 90% for common scenarios. Users no longer need to manually switch apps, fill out forms, or click through multiple menus; complex operations are compressed into a single natural language interaction.

Breaking through the Chinese GUI data bottleneck, building an end-to-end training system

The biggest challenge in achieving this experience lies in the extreme scarcity of high-quality Chinese graphical user interface (GUI) data. To address this, ZTE developed an end-to-end data preparation system, which includes automated screenshot collection, semantic annotation, and synthetic instruction generation, building a training loop covering thousands of operational paths. This system significantly improves data annotation efficiency and consistency while greatly reducing production costs, providing a solid foundation for model training.

On top of this, the team used supervised fine-tuning (SFT) technology to transform a general multimodal large model into a GUI agent capable of "perception-understanding-execution" closed-loop capabilities. It not only recognizes screen elements but also understands user intent, plans operation paths, calls system permissions, and dynamically corrects errors during execution, ensuring robustness in real-world scenarios.

From the lab to commercial use, defining the next generation of phone interaction

The commercial deployment of Nebula-GUI marks a new stage in which phone AI assistants are moving from "voice Q&A" to "proactive execution." ZTE revealed that the next step will be expanding to more complex scenarios such as shopping price comparison, travel planning, and cross-app information extraction, further enhancing the practicality of "the phone assistant."