Google officially released the experimental open-source language model DiffusionGemma on June 10, 2026, breaking the traditional autoregressive paradigm of large models that generate text word by word. It is the first to introduce the diffusion mechanism used in image AI into the field of text generation. The model can output 256 token blocks in parallel in a single step through multiple iterative optimizations starting from random noise.

Regarding hardware performance, through deep optimization by NVIDIA, the model's runtime speed under single GPU single-user mode is nearly four times faster than that of similar traditional models. When processing a single request on an H100 graphics card, its output speed can reach 1000 tokens per second. Even on high-end consumer-grade GPUs such as RTX5090, it can exceed 700 tokens per second.
DiffusionGemma has 26 billion parameters and is based on a mixture-of-experts (MoE) architecture, with only 3.8 billion parameters activated in a single step. Although its text generation quality and accuracy are slightly inferior to traditional Gemma4 series models in standard benchmark tests, its unique "full-block awareness" capability breaks the limitation of autoregressive models that can only look backward. Since all tokens can refer to each other during generation, the model shows significant advantages in tasks such as text completion, code filling, Sudoku solving, and amino acid sequence processing, which involve nonlinear and structured data.

Currently, the model weights are open-sourced on Hugging Face under the Apache 2.0 license and are fully compatible with mainstream inference frameworks such as vLLM and MLX. This exploration not only breaks the constraints of memory bandwidth on GPU computing power but also opens up a new technical path for future AI applications in complex logic and nonlinear text generation tasks.
