Volcano Engine Launches Doubao Audio Generation Model 1.0: Generate Cinematic-Quality Audio with One Sentence, Character Voices Stay Consistent for 10 Minutes

Yesterday, Volc Engine officially launched the Doubao Audio Generation Model 1.0 (Doubao-Seed-Audio 1.0), which supports using either text or audio as input to generate complete audio works end-to-end. The core breakthrough of this model is that a single Prompt can handle the full elements of dialogue, sound effects, and background music, completely eliminating the traditional workflow of manual multi-track editing.

Turn a sentence into an "audio director," skipping all post-production

Previously, producing a high-quality audio piece meant generating dialogue, sound effects, and music one by one, manually aligning them, and mixing multiple tracks, which was a complicated process highly dependent on post-production skills. Doubao Audio Generation Model 1.0 compresses all of this into a single Prompt: users can define multiple characters' lines, tone, and emotional rhythm in a single instruction, embed details like laughter, sighs, pauses, and dialect accents, and generate background music and ambient sound effects simultaneously, outputting a finished product directly. A creator can type a description, and immediately receive a podcast, audiobook, or brand audio ready for release.

Long audio doesn't "oversell," consistent character voices from start to finish

The most challenging issue in long audio creation is consistency—whether a character sounds the same in the first minute and the tenth minute. Doubao Audio Generation Model 1.0 achieves deep integration between text-to-audio and reference audio, maintaining consistent voice quality throughout long audio, so creators don't need to compare and revise segment by segment. The current model supports 2 minutes of audio creation at a time, and through the extended function, it maintains consistent voice quality during long-term generation, meeting the needs of audiobooks, podcasts, and long series.

In addition, the model supports decoupled control of voice and style, allowing the same voice to adapt to different emotions and contexts, even achieving "one voice, multiple roles"—the same voice presenting different expressions under different role settings, significantly improving flexibility in character voice acting and creative audio production. Currently, Volc Ark has opened API testing, and individual users can enjoy 30 minutes of creation quota in the Experience Center. Doubao Audio Generation Model 1.0 will also be launched on products such as CapCut, Jiemod, and Tomato.

Doubao Audio Generation Model 1.0 Launched, Opening the Era of Audio Direction

Volcano Engine launched Doubao Audio Generation Model 1.0, featuring two core technologies: "Multimodal Reference Generation" and "Long-term Voice Consistency," simplifying traditional audio post-production processes. It enables one-stop generation of dialogue, sound effects, and background music, improving creative efficiency.

ByteDance DouBao Launches Seed 2.1 Series: Three Indicators of Coding and Agent Capabilities Comparable to GPT-5.5

ByteDance released the Seed 2.1 model family (Pro/Turbo) and Seed-Evolving, targeting the coding and agent era for complex engineering and scalable production. Upgrades cover coding delivery, long-horizon agent task execution, and multimodal understanding, with stronger self-planning and dynamic repair.....

Amazon Launches Beta Testing of Generative AI Assistant Alexa+ in Hindi in India

Amazon has launched the beta testing of the Hindi version of Alexa+ in India. It invites selected users by email to fill out a form before June 22 to optimize the localized experience. This product is a generative AI launched in 2025. After gradual promotion in the United States, it was fully opened in February of this year and has now expanded to markets including the UK, Canada, Brazil, Mexico, Italy, and Germany.

From Painting Cats to Understanding the Body: Midjourney Crosses into Healthcare, Launches Full-Body Ultrasound Scanner

Midjourney enters the healthcare field by collaborating with Butterfly Network to launch its first hardware, Midjourney Scanner. This full-body ultrasound scanner uses a ring-shaped sensor array to extend AI image generation capabilities to quantitative perception of human body structures, marking its transition from creative imagery to a new field of medical diagnosis.

WordPress VIP Releases AI Survival Report: Over 80% of Consumers Do Not Fully Trust AI-Generated Content, 42% Lower Trust Due to Lack of Source

According to the WordPress VIP report, consumer trust in AI continues to decline: 60% are annoyed by brands using artificial intelligence for marketing, 86% do not fully trust AI, and 73% believe the internet lacks more warmth. 42% of the audience have lower trust in content without clear sources, while it is easier for brands to access AI citations, but winning trust is more difficult.