Cohere, which has been active in the enterprise AI market, officially launched an open-source speech recognition model called Cohere Transcribe on March 26, 2026.

This model has 2 billion parameters and is designed for edge devices, aiming to break through the latency bottleneck caused by the large size of previous speech models. By open-sourcing it under the Apache 2.0 license, Cohere tries to follow Meta's path, leveraging the power of the developer community to quickly improve the ecosystem and ultimately achieve commercialization feedback.

The Performance Monster on the Edge: Supports 14 Languages and Exceeds Mainstream Competitors

Cohere Transcribe includes 14 languages such as Chinese, Japanese, French, and Hebrew in its training. According to the latest data from the Hugging Face open ASR leaderboard, this model has already surpassed competitors such as ElevenLabs Scribe and Qwen3 from Alibaba.

Thanks to its reduced number of parameters, it can be directly deployed on terminal devices such as smartphones, PCs, or industrial gateways without frequently calling cloud computing power. This not only greatly reduces data transmission latency but also provides a more secure solution for industries with high sensitivity to privacy, such as banking, sales, and healthcare.

Strategic Expansion from Text to Speech: Rebuilding the Foundation of Intelligent Agent Interaction

Although Cohere has long focused on the text generation field, this cross-domain move in speech recognition is seen as a key step in building a comprehensive AI intelligent agent (Agent). The company announced that Cohere Transcribe will soon be integrated into its AI intelligent agent orchestration platform North.

Analysts point out that as voice interaction similar to Siri becomes the starting point of the AI trend, voice capabilities have become an essential "ears" for intelligent agents to perceive the world. Cohere is competing head-on with IBM, Alibaba, and Zoom, which launched AI Companion 3.0, by adopting this "small but powerful" open-source strategy in the edge computing and real-time speech translation market.