Kyutai Labs Opensources Kyutai TTS: Low-Latency Streaming Text-to-Speech Technology

AIbase基地

Published in AI News · 3 minute read · Jul 8, 2025

On July 3, the French AI research institute Kyutai Labs announced the open source of its latest text-to-speech (TTS) technology - Kyutai TTS, offering developers and AI enthusiasts an efficient and real-time speech generation solution. Kyutai TTS is highlighted by low latency and high-fidelity sound, supporting streaming text, allowing audio generation to start without requiring the complete text, which is particularly suitable for real-time interactive scenarios.

Kyutai TTS performs excellently. Using a single NVIDIA L40S GPU, the model can handle 32 requests simultaneously, with a latency of only 350 milliseconds. In addition, the system not only generates high-quality audio but also outputs precise word timestamps, which is convenient for real-time subtitle generation or interactive applications, such as the interruption handling function on the Unmute platform.

In terms of language support and quality evaluation, Kyutai TTS currently supports English and French, with word error rates (WER) of 2.82 and 3.29 respectively, demonstrating high accuracy. Speaker similarity reaches 77.1% (English) and 78.7% (French), ensuring natural and close-to-original sample voice. The model can also process long articles, breaking through the traditional 30-second limit of TTS, making it suitable for generating long content such as news and books.

Kyutai TTS uses a delay stream modeling (DSM) architecture, combined with a Rust server for efficient batch processing. It has opened its source code and model weights on GitHub and Hugging Face, helping global developers promote innovations in speech technology.

Uncovering the Secrets of Large Models! The 'Thinking Words' Behind Them Contain Astonishing Information

Recently, a research team from Renmin University, Shanghai Artificial Intelligence Laboratory, University College London, and Dalian University of Technology revealed an important finding in the reasoning process of large models: when the model is thinking, the 'thinking words' it uses actually reflect a significant increase in its internal information. This research result provides a new perspective for better understanding the reasoning mechanisms of artificial intelligence through methods of information theory. You may have seen large models output some language that seems human-like when answering questions, such as "Hmm..." or "Let me think...".

E Ink Launches AI Touchpad: E-Paper Technology May Change the Way Laptops Are Interacted With

E Ink recently announced the development of a new touchpad for laptops, which uses the same e-paper technology as e-readers. This innovative product is not simply about increasing the size of the touchpad or adding secondary display features, but rather positioning it as a dedicated platform for AI applications and assistants, designed to run in parallel with mainstream operating systems. E Ink released a prototype image showing the upgraded touchpad, which is equipped with a color e-ink screen similar to the Amazon Kindle Color.

Open Source Revolution! Kyutai TTS Launches: Ultra-Low Latency Speech Synthesis, the New Era of AI Voice is Here!

Recently, the French AI laboratory Kyutai announced the official open source of its new text-to-speech model, Kyutai TTS, providing global developers and researchers with a high-performance, low-latency speech synthesis solution. This breakthrough release not only promotes the development of open-source AI technology but also opens up new possibilities for multilingual voice interaction applications. AIbase provides an exclusive analysis of this technological highlight and its potential impact. Ultra-low latency, a new experience in real-time interaction. Kyutai TTS has become an industry standout with its exceptional performance.

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

In the field of artificial intelligence, reward models are a critical component for aligning large language models (LLMs) with human feedback, but existing models face the issue of "reward hacking." These models often focus on superficial features, such as the length or format of responses, rather than identifying genuine quality metrics, such as factual accuracy and relevance. The root cause lies in standard training objectives failing to distinguish between spurious associations and true causal drivers present in the training data. This failure leads to fragile reward models (RMs), which generate misaligned policies.

Kyutai Labs Opensources Kyutai TTS: Low-Latency Streaming Text-to-Speech Technology

Related AI News

Uncovering the Secrets of Large Models! The 'Thinking Words' Behind Them Contain Astonishing Information

E Ink Launches AI Touchpad: E-Paper Technology May Change the Way Laptops Are Interacted With

Open Source Revolution! Kyutai TTS Launches: Ultra-Low Latency Speech Synthesis, the New Era of AI Voice is Here!

Musk's xAI Approved to Use Methane Generators in Memphis, Sparking Community Protests

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

MiniMax Launches the World's First Open-Source Large-Scale AI Model, Technological Breakthrough Attracts Industry Attention