Meta's Fundamental Artificial Intelligence Research (FAIR) team recently launched an automatic speech recognition system called Omnilingual ASR, capable of transcribing spoken language in over 1600 languages. Previously, most speech recognition tools focused on a few hundred resource-rich languages, leaving thousands of languages among the world's more than 7000 languages with little AI support.

image.png

The launch of Omnilingual ASR aims to fill this gap. Meta states that out of the 1600 supported languages, 500 have never been covered by any AI system. The FAIR team hopes this system will move toward a "universal transcription system," helping to break down global language barriers.

The accuracy of the system depends on the availability of training data. According to Meta, Omnilingual ASR has a character error rate below 10 in 78% of the 1600 languages tested. In languages with at least 10 hours of training audio, 95% met this standard. Even for "low-resource" languages with less than 10 hours of training audio, 36% had a character error rate below 10.

To further support research and practical applications, Meta also released the Omnilingual ASR corpus, a large dataset containing transcribed speech in 350 underrepresented languages. These data are provided under a Creative Commons (CC-BY) license, aiming to help developers and researchers build or adapt speech recognition models for specific local needs.

A key feature of Omnilingual ASR is the "language-in-the-box" option, which uses context learning. Users need only provide a small number of paired audio and text samples, and the system can learn directly from these examples without retraining or requiring significant computational resources. Meta states that this approach theoretically could extend Omnilingual ASR to over 5400 languages, far exceeding current industry standards. Although the recognition quality on under-supported languages has not yet reached the level of fully trained systems, it provides a practical solution for communities that have never had access to speech recognition technology before.

Meta has released Omnilingual ASR as an open-source project under the Apache 2.0 license, allowing researchers and developers to freely use, modify, and build models, including for commercial purposes. This model series ranges from a lightweight version with 300 million parameters to a top-accuracy version with 7 billion parameters. All models are based on FAIR's PyTorch framework, and users can also view demonstrations on the official website.

demo:https://aidemos.atmeta.com/omnilingualasr/language-globe

Key Points:

🌍  Meta launched the Omnilingual ASR system, supporting speech recognition in over 1600 languages, aiming to fill the gap in AI language recognition.

📊  The system's accuracy depends on training data, with most supported languages having a character error rate below 10, and some low-resource languages showing significant performance.

📦  Omnilingual ASR is an open-source project that provides a rich dataset, supporting developers in building models for local needs.