Kling AI, a subsidiary of Kuaishou, launched version 2.6 on the first day of the Omni Ecosystem Week. This version introduces built-in audio generation for the first time, supporting bilingual dialogue, singing, and synchronized sound effects, achieving an "text ⇄ video ⇄ audio" one-click loop. The official slogan "See the Sound, Hear the Visual" highlights its multimodal synchronization capabilities.
Regarding technical specifications, version 2.6 maintains 10 seconds of 1080P high-definition output, requiring only 25 points every 5 seconds (a 30% reduction from the previous version). The diffusion transformer plus 3D spatiotemporal joint attention architecture brings three improvements: compliance with complex instructions increased by 15%, cross-shot character consistency reaches SOTA, and it outperforms Seedance 1.0 by 285% in blind testing.
In terms of the market, Kling 2.6 will be launched first on professional platforms such as Artlist, offering scene expansion and multi-element editing APIs, targeting film, short dramas, advertisements, and MV production. Kuaishou stated that by Q1 2026, it will release a 4K/60fps version and open a custom voice library, continuing to lower the barriers to "AI filmmaking."
Industry observers believe that audio synchronization has filled the last gap in AI video, and post-production editing processes are expected to be shortened by over 50%. With the launch of Kling 2.6, competition in AI creation tools is expanding from "visuals" to "sound," potentially leading to a new wave of supply in audio-based short videos.
