Article

Audio Creation Achieves New Breakthrough! Stability AI Launches Stable Audio 3: Subsecond Generation of Long Audio

Published in Latest AI News

Time :May 27, 2026

Read :4minute

Renowned artificial intelligence company Stability AI has officially released its latest generation audio large model Stable Audio3 and simultaneously open-sourced part of the model weights. As a latent diffusion model specifically designed for audio generation and editing, this system not only supports high-quality stereo output but also achieves a significant breakthrough in generation speed.

The newly released model family covers a wide range of specifications, from small to large, meeting diverse needs such as music creation and sound effect production. Notably, the model supports variable-length audio generation and introduces an audio editing feature based on internal image completion technology, offering creators unprecedented flexibility.

Innovative Architecture Breaks Hardware Limitations

Stable Audio3 is composed of two core components: a semantic acoustic autoencoder called SAME, and an efficient diffusion transformer. Among them, the SAME autoencoder achieves an audio compression rate of up to 4096 times, a breakthrough design that significantly shortens the length of the latent sequence.

Thanks to this efficient compression mechanism, even on ordinary consumer-grade hardware, the model can run long-period, large-scale audio generation tasks smoothly. This not only significantly lowers the technical barriers for high-quality audio creation but also makes professional-level audio and video production at home possible for individual creators.

Ultra Efficiency Achieves Instant Rendering

With the support of variable-length technology, the new model's computational cost can dynamically scale with the user's required audio duration, completely eliminating the computing power waste caused by fixed lengths in the past. In tests on high-performance hardware, the model can render a 20-second audio in about 0.62 seconds, and generate a 380-second music in just 1.31 seconds.

Additionally, through an innovative three-stage training process, Stable Audio3 no longer relies on traditional classifier-free guidance technology during inference, thus achieving a super-fast single-step forward propagation experience. Currently, the small and medium model weights are available on the Hugging Face platform for public access, while the larger version with stronger performance will be provided through commercial licensing.

Related Recommendations

Doubao and Tongyi Qianwen Disable AI Personification Features: Industry Contraction as New Regulations Take Effect on July 15th

The Interim Measures for the Administration of AI Anthropomorphic Interactive Services takes effect July 15. ByteDance’s Doubao and Alibaba’s Tongyi Qianwen will remove custom anthropomorphic agent features. Doubao notified users of the feature's offline from July 15 and full service termination from October 15, making it among the first major AI apps to comply.....

Jul 6, 2026

310.2k

New Start-up from Former Indian IT Giant Aims to Revolutionize the IT Services Industry with AI

Ex-Infosys CEO Vishal Sikka's startup Hang Ten Systems aims to disrupt traditional IT services with AI. It uses AI-driven software development and automation to help enterprises continuously build, modify, and operate software. Hang Ten recently raised a $32M seed round led by Mayfield, with participation from Aramco Ventures.....

Jun 25, 2026

187.3k

Google Releases Gemini 3.5 Flash with Native Integration of Computer Usage Tools, Replacing the 2.5 Framework

Google integrates Computer Use tool into Gemini 3.5 Flash, replacing old test framework, evolving AI into task-executing digital colleagues. Developers build agents via API, moving from concept to reality.....

Jun 25, 2026

264.1k

Google Gemini 3.5 Pro Release Delayed, Refining Core Capabilities Becomes the Top Priority

Google's next flagship Gemini 3.5 Pro, originally slated for release this month, has been postponed to July. The delay is not due to technical stagnation but to allow the R&D team more time for deeper optimization and refinement, aiming for higher product maturity. This reflects the intense competition in computing power and models, with major players more cautiously balancing release timing and quality.....

Jun 25, 2026

203.3k

Volcano Engine Launches Doubao 2.1 Pro: Daily Features Confirmed Free, Will Launch Professional Office Mode

At the June 23, 2026 Volcano Engine FORCE Conference, ByteDance unveiled major upgrades to its Doubao large model: version 2.1Pro, focusing on coding, agents, and vision-language. It also released video generation model Seedance 2.5 (2.04K), image model Seedream 5.0 Pro, and audio model 1.0. 2.1Pro's API is open with Coze ecosystem access. Volcano Engine President Tan Dai stated that daily basic features like search and Q&A will remain free for u....

Jun 23, 2026

443.3k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご