Article

Latency below 0.2 seconds! Mistral AI releases Voxtral Transcribe 2 speech model with support for real-time Chinese transcription

Published in Latest AI News

Time :Feb 5, 2026

Read :4minute

French artificial intelligence startup Mistral AI recently announced the launch of a new series of speech-to-text models — Voxtral Transcribe2. This series includes two models optimized for different application scenarios, aiming to solve the pain points of high latency and cost in voice interaction.

Among them, the most attention-grabbing is the real-time transcription model called Voxtral Realtime. This model has a parameter scale of 4B (4 billion) and uses an innovative streaming architecture. Its core highlight is extreme response speed: the model can transcribe audio input instantly and synchronously. Official data shows that the transcription delay has been reduced to below 200ms (0.2 seconds). This means that in real-time conversation or simultaneous interpretation scenarios, users almost feel no processing pause. To promote the development of the developer community ecosystem, Mistral AI has officially opened the model weights under the Apache 2.0 license.

The other model, Voxtral Mini Transcribe V2, focuses on large-scale processing and high cost-effectiveness. This model is specifically designed for processing long audio, supporting recording files up to 3 hours in a single request. In terms of accuracy, Mistral's official statement says this model has surpassed GPT-4o mini Transcribe and Gemini2.5Flash.

In terms of language support and cost, both new models have excellent universality, supporting 13 mainstream languages including Chinese. The pricing strategy is also very competitive: the offline batch processing API costs $0.003 per minute, while the real-time version API, which pursues optimal performance, costs $0.006 per minute.

Key Points:

⚡ Extremely Low Latency: The Voxtral Realtime model reduces transcription delay to within 200ms, supports instant audio transcription, and has already open-sourced the model weights.
🏆 High Cost-Effectiveness: The Voxtral Mini version outperforms similar products like GPT-4o mini in accuracy, supports ultra-long recordings of 3 hours, and offers highly competitive pricing.
🌐 Multi-language Support: The entire series of models natively supports 13 languages including Chinese, widely adapting to globalized voice office and real-time interaction scenarios.

Related Recommendations

Publishers and Authors Sue Google for Allegedly Using Copyrighted Content to Train Gemini, Internal Documents Warned of Risk of Billions in Fines

Publishers and authors, including Hachette, Cengage, Elsevier, and writer Scott Turow, have collectively sued Google for allegedly using copyrighted works without permission to train its AI platform Gemini, and intentionally removing or altering copyright information to cover up the infringement. They claim that works were provided to Google only for limited purposes like book search, but Google used them to train AI models through Google Books a....

Jul 15, 2026

85.2k

OpenAI's First Hardware Product Exposed! Smart Speaker Will Become Your AI Companion Assistant

OpenAI will launch its first hardware product: a movable, screenless smart speaker positioned as a home AI companion assistant, offering a more human-like interactive experience. It supports smart home control, audio/video playback, Q&A, and messaging.....

Jul 15, 2026

137.7k

AI Daily: GPT5.6 Series Models Released, Codex Disappears; Tencent Plans to Take Over Manus as the Largest Shareholder; MiniMax Founder Announces Zero Salary Until Achieving AGI

AI Daily covers AI trends and product innovations. This issue: OpenAI updates its Chrome extension, allowing ChatGPT to live in the sidebar, read pages, control tabs, access local files, and summarize PDFs—no app switching needed. Limited to Plus and Pro users.....

Jul 10, 2026

377.6k

Outperforming Competitors! Google Gemini 3.5 Pro Revealed: Epic Upgrade Scheduled for July 17

Google will launch Gemini 3.5 Pro on July 17, featuring a 2-million-token context window and "Deep Think" reasoning mode. Aimed at advanced systems, it excels in complex agent workflows, action execution, sub-agent collaboration, coding, and long-duration tasks, outperforming the earlier Flash version.....

Jul 7, 2026

282.9k

Survey Shows More Than Half of Companies Regret Laying Off Employees Due to AI; Ford, IBM and Other Giants Rehire Human Workers

After aggressive AI-driven layoffs, the industry is returning to human-machine collaboration. A survey shows that among 39% of business leaders who laid off staff due to AI, 55% admit it was a mistake. Research indicates that focusing solely on tech replacement while neglecting employee training will lead to team failure due to lack of critical oversight.....

Jul 1, 2026

276.3k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご