Alibaba Launches New Qwen Model, Able to Clone Voices in Three Seconds of Audio

Recently, the Qwen team at Alibaba Cloud has released two new artificial intelligence models designed to generate or clone voices through text instructions. The Qwen3-TTS-VD-Flash model allows users to generate voices based on detailed descriptions, enabling users to precisely define voice characteristics such as emotion and speaking rhythm.

For example, users can request to generate a "middle-aged man with a loud baritone voice — an energetic advertisement narrator, fast speech rate, exaggerated tone changes, and a sales-persuasive voice." According to the manufacturer, this model outperforms OpenAI's recently launched GPT-4o mini-tts API in performance.

The second model, Qwen3-TTS-VC-Flash, can replicate a voice with just three seconds of audio and can reproduce it in ten languages. Qwen claims that the error rate of this model is lower than its competitors, such as Elevenlabs or MiniMax.

In addition, this AI can handle complex texts, imitate animal sounds, and extract voices from recordings. Both models are accessible via Alibaba Cloud's API, and users can also try the design model and cloning model demos on the Hugging Face platform.

Key Points:
🌟 New Qwen models support generating and cloning voices through text descriptions.
🎤 Qwen3-TTS-VC-Flash can replicate a voice in three seconds and supports ten languages.
🚀 The model performs better than competitors and is suitable for handling complex texts and voice imitation.

MiniMax M2.1 Makes a Shocking Open Source Debut! A 10 Billion Parameter Sparse Architecture Model Tops SOTA, Outperforming Gemini3Pro and Claude 4.5 in Multilingual Programming

The domestic large model MiniMax opensources M2.1, achieving breakthroughs in multilingual programming, code generation, and tool calling with its 10 billion parameter sparse architecture. It surpasses closed-source flagship models such as Google and Anthropic in authoritative benchmark tests, marking a new stage in the performance of open source coding models.

OpenAI Enables AI Simulated Hacking Attacks to Fix Critical Vulnerabilities in Proxy Browsers

To enhance the security of ChatGPT Atlas browser, OpenAI has launched a 'using poison to fight poison' strategy, using an automated attacker system to simulate hacker methods for round-the-clock stress testing, focusing on preventing adversarial prompt injection attacks to prevent malicious commands from controlling the AI agent.

Alibaba Launches Qwen ZhiXue, ByteDance Introduces AnyGen: AI Applications Take Over Your Backpack and Desk

Alibaba has launched the AI education application "Qwen ZhiXue," while ByteDance has introduced the AI productivity tool "AnyGen" overseas, indicating that major tech companies are accelerating the application of AI technology in vertical scenarios. The competition between the two giants in the fields of AI personalized services and lightweight productivity has fully escalated.

Alibaba Launches New Qwen Model, Able to Clone Voices in Three Seconds of Audio

Related Recommendations

MiniMax M2.1 Makes a Shocking Open Source Debut! A 10 Billion Parameter Sparse Architecture Model Tops SOTA, Outperforming Gemini3Pro and Claude 4.5 in Multilingual Programming

Rising Digital Avatar Lemon Slice Secures $10.5 Million in Funding to Drive Videoization of AI Chatbots

OpenAI Enables AI Simulated Hacking Attacks to Fix Critical Vulnerabilities in Proxy Browsers

Alibaba Launches Qwen ZhiXue, ByteDance Introduces AnyGen: AI Applications Take Over Your Backpack and Desk

ChatGPT Launches Annual Review! Your AI Personality, Annual Poem, and Creative Badges Are Here - but You Need to Enable Permissions First