NetEase Youdao Launches Open Source Speech Synthesis Engine 'Yimosheng' Supporting Over 2000 Voice Tones

NetEase Youdao Announces the Launch of the Open-Source "EmoVoice" Speech Synthesis Engine, Supporting Both Chinese and English, Featuring Over 2000 Different Voice Tones, and Including a Unique Emotional Synthesis Function. Users Can Download and Use It via GitHub, and Implement Emotional Synthesis and Application of Voice Tones Through the Provided Interface and Script Interfaces. The Project Aims to Help Developers and Content Creators Expand the Application Scope of High-Quality TTS. Additionally, NetEase Youdao Has Introduced Cool AI Technologies Such as Custom Voice Creation, Voice Duplication, and the Hi Echo Virtual Human Oral English Tutor, Offering Users More Personalized and Practical Services.

Zhipu Multimodal Open Source Week Concludes Successfully: Four Core Video Generation Technologies Fully Open-Sourced

The Zhipu team has open-sourced four core video generation technologies, including GLM-4.6V visual understanding, AutoGLM device control, GLM-ASR speech recognition, and GLM-TTS speech synthesis models, showcasing their latest progress in the multimodal field and laying the foundation for the development of video generation technology.

Mianbi Intelligence Launches VoxCPM: A Next-Generation High-Fidelity Speech Generation Model

In the context of rapid development in speech synthesis technology, Mianbi Intelligence and the Human-Computer Speech Interaction Laboratory at the Shenzhen International Graduate School, Tsinghua University (THUHCSI) recently jointly released a new speech generation model - VoxCPM. This model, with a parameter size of 0.5B, is dedicated to providing users with high-quality and natural speech synthesis experiences. The release of VoxCPM marks another milestone in the field of high-fidelity speech generation. The model has achieved industry-leading levels in key indicators such as naturalness, voice similarity, and prosodic expressiveness.

Open Source Revolution! Kyutai TTS Launches: Ultra-Low Latency Speech Synthesis, the New Era of AI Voice is Here!

Recently, the French AI laboratory Kyutai announced the official open source of its new text-to-speech model, Kyutai TTS, providing global developers and researchers with a high-performance, low-latency speech synthesis solution. This breakthrough release not only promotes the development of open-source AI technology but also opens up new possibilities for multilingual voice interaction applications. AIbase provides an exclusive analysis of this technological highlight and its potential impact. Ultra-low latency, a new experience in real-time interaction. Kyutai TTS has become an industry standout with its exceptional performance.

Kyutai Labs Opensources Kyutai TTS: Low-Latency Streaming Text-to-Speech Technology

On July 3rd, the French AI research institution Kyutai Labs announced the open source of its latest text-to-speech (TTS) technology - Kyutai TTS, offering developers and AI enthusiasts an efficient and real-time speech generation solution. Kyutai TTS features low latency and high-fidelity sound, supporting streaming text, allowing audio generation to start without requiring the full text, especially suitable for real-time interaction scenarios. Kyutai TTS performs excellently. It uses a single NVIDIA L40S GPU

Sesame Releases CSM Voice Model: Transcending the Uncanny Valley with Globally Stunning Realism

Sesame's newly released Conversational Speech Model (CSM) has recently sparked heated discussions on X, lauded as a voice model that sounds "just like a real person." Its stunning naturalness and emotional expressiveness not only make it indistinguishable from human speech for users, but also claim to have successfully overcome the uncanny valley effect in the field of voice technology. With the spread of demonstration videos and user feedback, CSM is rapidly becoming a leader in AI voice technology.