The field of AI voice is witnessing a strong competitor, as the startup Resemble AI has officially released an open-source text-to-speech model called "Chatterbox Turbo," directly targeting industry giants such as ElevenLabs and Cartesia.

This model has achieved significant performance breakthroughs, accurately cloning a target voice with just five seconds of reference audio and outputting the first audio segment in as little as 150 milliseconds. This extreme low latency makes it an ideal choice for building real-time AI agents, automated customer support, dynamic game characters, virtual avatars, and social platform interactions. Resemble AI claims that the model has surpassed existing closed-source competitors in voice quality, offering developers a more natural synthesis experience.

In terms of security and compliance, Chatterbox Turbo features an embedded neural watermarking function called "PerTh" tailored for regulated industries, which can be used to verify the AI-generated identity of speech, effectively addressing deepfake risks. More disruptively, Resemble AI has chosen to release the model under the MIT license, meaning global developers can not only freely test it on platforms such as Hugging Face, RunPod, Modal, Replicate, and Fal, but also obtain the complete code on GitHub for commercial modification and distribution.

Currently, Resemble AI also offers managed services and plans to launch a further optimized version with reduced latency soon, aiming to reshape the competitive landscape of the voice synthesis market through an open-source ecosystem.