MiniMax announced the launch of its new speech generation model, Speech2.5, which once again sets a new benchmark in the global speech technology field, further solidifying its position as the world's strongest speech model. Speech2.5 has made significant improvements in multilingual expressiveness, voice replication, and language coverage.
Compared to Speech02 released in May this year, Speech2.5 has made a leap in multilingual expressiveness, maintaining the global best level in Chinese, while also achieving overall improvement in English and other multilingual performances. The model surpasses its predecessor in word error rate, similarity, and natural rhythm, allowing users to easily switch between 40 languages. Whether for business meetings, daily conversations, or English podcasts, it provides a more natural and smooth speech experience, completely eliminating the common "mechanical feel" in previous speech synthesis.
In terms of voice replication, Speech2.5 has reached industry-leading precision. It not only can replicate accents across languages but also retain the accent characteristics of different regions within the same language, and even accurately replicate the voices of special ages. No matter in extreme scenarios or when switching between languages, Speech2.5 can maintain highly realistic voice details. For example, using the classic pronunciation of the Queen of England to introduce Speech2.5, the model can perfectly reproduce her unique pauses, rhythm, and pronunciation processing, and even retain accent characteristics when switching between Italian and English.
Additionally, the multilingual coverage of Speech2.5 has been significantly expanded from the previous version to 40 languages, including newly added languages such as Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, and Afrikaans. This expansion gives Speech2.5 a significant advantage in global content creation, enabling users to instantly generate high-quality multilingual voice content for cross-border e-commerce, overseas customer service, or localized marketing.
The release of Speech2.5 brings great convenience and innovation opportunities to multiple industries. For enterprise customers, the cost of multilingual customer service and international advertising dubbing will be greatly reduced. Previously, globally produced product promotion dubs required high costs and long production times, but now they can be generated in just 10 minutes. For creators, the realistic personal voice replication feature allows them to easily create global short video hits, realizing the creative expression of speaking 40 languages with one person. Educators also benefit, as the production cycle for minority language courseware is shortened from weeks to 10 minutes, making the customization of cross-border dialect textbooks more convenient.
Building upon Speech02, Speech2.5 has been upgraded again, not only continuing to offer the highest global value for money, but also further improving performance. Currently, the MiniMax Speech voice model is widely adopted worldwide, including overseas Agent platforms such as Vapi and Pipecat, and top AI applications like Hedra, Icon, and Syllaby. Domestic top platforms and products such as Gaotu Education, Ximalaya, NetEase, and Rokid glasses have also integrated MiniMax Speech.
MiniMax Open Platform:
minimaxi.com/platform_overview
MiniMax Audio:
minimaxi.com/audio