Capable of speaking, singing, and even playing tricks! Xiaomi launches the MiMo-V2-TTS large model: dialects and emotions are handled effortlessly

Speech synthesis technology is making a qualitative leap from "mechanical repetition" to "emotional resonance." On March 19, Xiaomi officially launched its self-developed speech synthesis large model Xiaomi MiMo-V2-TTS. This is not only a tool that allows machines to "speak," but also a "versatile voice actor" capable of performing, speaking, and singing.

MiMo-V2-TTS is based on Xiaomi's self-developed Audio Tokenizer (audio tokenizer) and multi-codebook speech-text joint modeling architecture. After being pre-trained on massive speech data for hundreds of millions of hours, it demonstrates remarkable multi-granularity speech style control:

Emotion Master: The model supports precise adjustment from overall tone to local emotions. It can naturally shift tone and subtly change emotion within the same sentence, perfectly replicating the natural rhythm of human speech.
Cross-over Singer: In addition to speaking, it also has high-quality singing synthesis capabilities, accurately expressing pitch and rhythm, with natural and expressive vocal techniques.
Dialect Expert: To better suit different regional users, the model supports multiple dialects such as Northeastern, Sichuan, Henan, Cantonese, and Taiwanese accents, and can perform them in a character- or style-based manner.

Notably, MiMo-V2-TTS greatly reduces the interaction cost. It can intelligently recognize punctuation, intonation words, and emphasis markers in the text, and automatically convert them into appropriate speech expressions, without requiring any additional annotations or manual intervention from the user throughout the process.

For Xiaomi, the release of this large model marks a key milestone in its speech technology roadmap. In future plans, Xiaomi aims to cover more languages beyond Chinese and English, and to integrate it deeply with the multimodal understanding capabilities of MiMo-V2-Omni.

When AI agents can not only understand the world but also tell it with a compelling human voice, the future of human-computer interaction is already clear. With the implementation of MiMo-V2-TTS, smart devices within the Xiaomi ecosystem will no longer be cold terminals, but more "human-like" digital companions.

Say goodbye to complicated operations! iPadOS 27 brings comprehensive upgrades, turning the tablet into a computer in a flash

iPadOS 27 brings major upgrades to productivity and daily efficiency. It features revolutionary automation, search, and web browsing, with smarter multitasking that narrows the gap between tablet and PC. A key highlight is Magic Keyboard automation triggers, enabling custom actions based on connection state to streamline workflows.....

Xiaomi announced that MiMo-V2-Pro/Omni will be discontinued in June 2026, and will fully switch to the V2.5 series

Xiaomi announced that the old version of MiMo-V2-Pro/Omni model will be discontinued on June 30, 2026, and upgraded to the MiMo V2.5 series. The MiMo-V2-Pro will be migrated to V2.5-Pro, and MiMo-V2-Omni will be upgraded to the new V2.5 model. The new version has been fully launched, aiming to provide stronger reasoning capabilities and higher cost-effectiveness, and to promote developers to migrate.

Investing 60 Billion in AI for Three Years, Xiaomi's Large Model Achieves Double Global First, Accelerating Intelligent Transformation

On May 26, Lei Jun, Chairman of Xiaomi Group, announced R&D spending of 9 billion yuan in Q1 2026, up 33.4% year-on-year, with over 26,000 R&D staff and annual R&D investment expected to exceed 40 billion yuan. Its self-developed large model, Xiaomi MiMo-V2.5-Pro, ranked first globally among open-source models in both comprehensive intelligence and Agent indices on the Artificial Analysis list.....

The MIIT and Three Other Departments Jointly Released the National Standard 'Grading of Intelligence for Artificial Intelligence Terminals'

On May 8, the MIIT, the Administration for Market Regulation, and the Ministry of Commerce jointly released the national standard 'Grading of Intelligence for Artificial Intelligence Terminals', establishing a unified evaluation system. The standard adopts a '2+N' framework, clarifying the definition of smart terminals, grading logic, and testing methods. It sets up four levels of capabilities from L1 Response Level to L4 Collaboration Level. The L4 level will be further improved with technological development.

Capable of speaking, singing, and even playing tricks! Xiaomi launches the MiMo-V2-TTS large model: dialects and emotions are handled effortlessly

Related Recommendations

Say goodbye to complicated operations! iPadOS 27 brings comprehensive upgrades, turning the tablet into a computer in a flash

Xiaomi announced that MiMo-V2-Pro/Omni will be discontinued in June 2026, and will fully switch to the V2.5 series

Investing 60 Billion in AI for Three Years, Xiaomi's Large Model Achieves Double Global First, Accelerating Intelligent Transformation

Xiaomi MiMo-V2.5 Series API Permanent Price Drop, Up to 99% Discount

The MIIT and Three Other Departments Jointly Released the National Standard 'Grading of Intelligence for Artificial Intelligence Terminals'