ElevenLabs, the leading company in multimodal AI, officially announces: the new "Image & Video Platform" is now live! This is no longer just a voice tool, but a super AI content factory that integrates image generation, video generation, voice synthesis, music creation, and sound design. Now, creators and marketers can completely bid farewell to switching between multiple platforms, and easily complete commercial videos from script to final product with one click!

image.png

One-stop closed loop: From zero to finished video, just one platform

The new platform fully integrates visual generation with ElevenLabs' outstanding audio capabilities: users first generate images and dynamic videos, then directly overlay professional-grade narration, background music, and ambient sound effects within the same interface, achieving seamless integration throughout the process. The official claims that from concept to a marketing video ready for direct deployment, it can take as little as a few minutes, completely redefining the efficiency of AI content production.

Model team gathers: Top visual and top auditory together

The Image & Video platform integrates the world's strongest multimodal model matrix at once, including:

Google Veo (ultra-long consistency video)

OpenAI Sora (cinematic image quality)

Kling (super realistic physics animation)

Nanobanana, Flux Kontext, Seedream, and other rising stars. Nanobanana, Flux Kontext, Seedream, and other emerging stars work with ElevenLabs' self-developed natural AI voice and the latest music generation models. Users can freely mix the "best visuals" and "best audio," producing results far superior to those from single models combined.

Designed for business: Marketers are silent, short video bloggers are moved to tears

The platform is deeply optimized for creators and marketers:

Supports direct output of vertical/horizontal screen ratios, compatible with Douyin, Xiaohongshu, TikTok, YouTube

Includes a commercial-safe voice and music library, allowing generated content to be directly used for advertising

One-click replacement of voiceover language, easily creating multilingual versions

Provides a full timeline editor, supporting precise frame-by-frame synchronization of audio and video

Actual effect explodes: 30-second brand ad, 5 minutes to create

According to the official demonstration case, only a 30-second text is needed to complete within the platform:

Generate brand storyboard images → Convert to smooth video → Add CEO-level natural voiceover → Overlay emotional background music and ambient sound effects → Export 4K commercial products. Throughout the process, there is no need to switch files between Premiere, Midjourney, Runway, and Suno.

AIbase Editorial Department Comment:

ElevenLabs' move directly raises the ceiling of "text to video" even higher. More frighteningly, it solves the most difficult issue of audio-visual synchronization at once. When visual generation and sound generation, two giants, unite, independent creators and small and medium enterprises will face a real "undercutting" era. Do you want to know how many editors and voice actors this update might make "unemployed"?