IBM has officially launched Granite4.01B Speech. This is a compact speech language model designed for edge computing and enterprise deployment, aiming to provide high-efficiency multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST) capabilities.

image.png

Compared to the previous version, Granite4.01B Speech has half the parameters of the previous model, yet it achieves significant performance improvements. The new model not only adds support for Japanese ASR, but also introduces a keyword bias feature and greatly improves the accuracy of English transcription. Its core design goal is to significantly reduce memory usage, inference latency, and computational costs without sacrificing core capabilities.

The model uses an innovative "two-stage design" architecture. The system first converts audio into text, and then processes it through a dedicated Granite language model. This modular design allows developers to flexibly arrange the process according to their needs. Currently, the model supports multilingual translation including English, French, German, Spanish, Portuguese, and Japanese, and can handle translation tasks from English to Chinese (Mandarin).

In performance testing, Granite4.01B Speech performed excellently, ranking first on the OpenASR leaderboard, with an average word error rate (WER) of just 5.52. Currently, IBM has officially open-sourced the model under the Apache 2.0 license, and developers can deploy it locally using mainstream frameworks such as Transformers or vLLM, providing strong AI voice support for resource-constrained mobile or edge devices.

Project: https://huggingface.co/ibm-granite/granite-4.0-1b-speech