While the global AI race is still fiercely contested around autoregressive large models (such as GPT-5 and Gemini), a new startup is quietly breaking through with a disruptive architecture. Inception, an AI company led by Professor Stefano Ermon from Stanford University, recently announced a $50 million seed round of financing, led by Menlo Ventures, with additional participation from Microsoft M12, NVIDIA NVentures, Snowflake Ventures, Databricks Investment, and Mayfield. Renowned figures such as Andrew Ng and Andrej Karpathy have also joined as angel investors, making the lineup truly impressive.
Inception's core bet is to fully introduce diffusion models, originally used for image generation, into the text and code domains, challenging the current mainstream autoregressive paradigm. Ermon pointed out that models like GPT and Gemini use a "word-by-word prediction" approach, which must be processed sequentially, limiting speed and efficiency; whereas diffusion models optimize the overall output through parallel iterative processes, showing significant advantages when handling large codebases or long texts.
This concept has already been realized in products: the company simultaneously launched its latest model, Mercury, designed specifically for software development scenarios, and it has already been integrated into multiple developer tools such as ProxyAI, Buildglare, and Kilo Code. Testing shows that Mercury achieves a reasoning speed of over 1000 tokens per second in tasks such as code completion, refactoring, and cross-file understanding, far surpassing existing autoregressive models. "Our architecture is inherently designed for parallelism," Ermon emphasized, "it is faster, more efficient, and extremely cost-friendly in terms of computing power."
Why are diffusion models suitable for code?
Code differs from natural language—it is structured, relies on global context, and often requires cross-file associations. Autoregressive models tend to ignore overall logical consistency when processing such tasks due to their "character-by-character generation" approach. Diffusion models, on the other hand, start from "noise" and gradually approach the target output through multiple global adjustments, making them naturally suitable for highly structured data. Additionally, their parallel computing capabilities can fully utilize GPU/TPU clusters, significantly reducing latency and energy consumption, directly addressing the high-cost issues in current AI infrastructure.
Why are the giants betting on it?
With the soaring costs of AI training and inference, efficiency has become the new battlefield. Investors such as Microsoft, NVIDIA, and Databricks are all building AI development stacks and urgently need a high-performance, low-cost model foundation. Inception's approach may offer a new path for the commercialization of large models—saving computing power and achieving high throughput.
AIbase believes that Inception's rise marks that the exploration of AI architecture has entered deeper waters. When the marginal benefits of parameter competition diminish, bottom-up paradigm innovation will become the key to breaking through. If diffusion-based LLMs continue to demonstrate their advantages in high-value scenarios such as code, research, and finance, this technological revolution initiated by the Stanford laboratory may reshape the future landscape of generative AI.
