On April 2,

Key Breakthroughs: Understand Images and Write Code
As a native multi-modal coding foundation,
Multi-dimensional Perception: Native understanding of images, videos, design drafts, and complex document layouts, supporting the use of various visual tools such as frames, screenshots, and web reading.
Extended Vision: The context window is extended to 200k, allowing it to easily handle large-scale engineering projects or lengthy technical documents.
Performance Leadership: In core benchmark tests such as multi-modal coding and GUI Agent (Graphical User Interface Intelligent Agent), this model outperforms similar products with a smaller size.

Typical Scenarios: A Second-by-Second Leap from "Sketch" to "Final Product"
The addition of
Front-end Replication: Simply send a screenshot of a design draft or a screen recording, and the model can understand the layout, color scheme, and interaction logic, generating a front-end project that can be run directly.
GUI Autonomous Exploration: Combined with frameworks such as Claude Code, it can browse websites, sort out navigation relationships, and collect materials like a human, achieving full-site visual replication.
Interactive Editing: Supports adding, deleting, or modifying modules, styles, or layouts through dialogue, enabling visual code iteration.
Empowering "Lobster": AutoClaw Gets a Visual Upgrade
After integrating this model into Zhipu's self-developed agent
Industry Insight: Programming Is No Longer "Feeling in the Dark"
With the release of
