Tongyi Qianwen has officially released its latest image generation model, Z-Image. The model quickly topped the Hugging Face trend chart on the day of its release, with an impressive 500,000 downloads. With only 600 million parameters, Z-Image achieves photo-realistic quality comparable to large models, capable of accurately restoring skin texture, hair details, natural lighting, and material textures, showcasing aesthetic composition and atmosphere.

Z-Image also introduced an optimized version called Z-Image-Turbo, which requires only 8 inference steps to generate high-quality images, making it especially suitable for daily creation, poster design, and rapid prototyping. Even in complex text layout environments, Z-Image-Turbo can accurately render mixed Chinese and English text, keeping the text clear while maintaining realistic faces and overall visual aesthetics.
The model possesses extensive real-world knowledge, capable of generating famous landmarks such as the Eiffel Tower and the Forbidden City, matching real-world details, proportions, and context. Through a prompt enhancer, Z-Image can understand and handle complex tasks, demonstrating not just drawing capabilities, but creative abilities after understanding.
Additionally, Z-Image-Edit focuses on executing complex composite editing instructions, such as "make the person smile + turn their head + replace the background with cherry blossoms + add Chinese captions." It maintains high consistency in lighting, identity, and style during major modifications, avoiding common misalignment and distortion issues.
In terms of data, Z-Image has built an efficient data ecosystem, aiming to improve training efficiency by using "the right data." In terms of model architecture, Z-Image adopts a single-stream diffusion Transformer (S³-DiT), effectively improving parameter utilization. During training, a three-stage progressive strategy is used to systematically inject world knowledge, and Z-Image-Turbo enables real-time high-quality generation.
GitHub: https://github.com/Tongyi-MAI/Z-Image
Hugging Face: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
Key Points:
🌟 Z-Image reached 500,000 downloads on the first day and quickly topped the Hugging Face trend chart.
🎨 Z-Image achieves high-quality photo-realistic results with 600 million parameters, supporting text rendering.
🚀 Z-Image-Turbo and Z-Image-Edit provide efficient image generation and editing capabilities.
