Microsoft Bing Team Open Sources 27B Embedding Model Harrier, Top in Multilingual Benchmark Tests

On April 7, the Microsoft Bing team officially announced the open-source release of a new series of word embedding models called "Harrier," aiming to reshape the underlying logic of global search, retrieval, and AI agents. The Harrier series includes three different versions, with the flagship 27B model surpassing mainstream proprietary models such as OpenAI, Amazon, and Google Gemini in the multilingual MTEB v2 benchmark test, ranking first.

The technical foundation of this model demonstrates a high level of industrial standards: Harrier supports over 100 languages, with a context window of up to 32,000 tokens. In terms of training strategies, Microsoft not only used more than 2 billion real examples but also introduced synthetic data from GPT-5 for reinforcement. This combination of high-quality data gives Harrier a significant advantage in understanding complex contexts and handling long texts. In addition to the full version with 27 billion parameters, to adapt to different computing environments, Microsoft simultaneously released small parameter versions of 0.6B and 2.7B, and all are open-sourced on the Hugging Face platform under the MIT license.

Embedding models are a key technology for organizing and retrieving information in AI systems, and their performance directly determines the accuracy of RAG (Retrieval-Augmented Generation) systems. Microsoft plans to deeply integrate this technology into the Bing search engine and new AI agent ground services. As artificial intelligence gradually moves toward autonomous multi-step tasks, the open sourcing of Harrier not only provides developers with a high-performance alternative to proprietary models but also marks a stage-wise breakthrough of the open-source ecosystem in semantic representation capabilities, further accelerating the deployment of AI agents in global multilingual environments.

Microsoft Bing Team Open Sources Harrier Multilingual Embedding Model

Microsoft Bing team open sources the word embedding model Harrier, which supports over 100 languages and performs excellently in the MTEB v2 benchmark. The model is trained on 2 billion examples and GPT-5 synthetic data, using a 32,000 token context window, with 2.7 billion parameters, significantly improving the accuracy and flexibility of multilingual tasks.

Microsoft Bing Team Open Sources 27B Embedding Model Harrier, Top in Multilingual Benchmark Tests

Related Recommendations

Microsoft Bing Team Open Sources Harrier Multilingual Embedding Model

Bing Search Also Affected! Hackers Exploit AI Recommendation Mechanism to Induce Users to Install Malicious OpenClaw Plugin

Apple Releases AI Prototype Tool SQUIRE Aimed at Redesigning the UI Design Process

Digital Family Members On Board! Doubao Large Model Officially Launched for Buick Zhijing E7: Intelligent Cockpit Enters the Human-like Era

Alphabet Adds Mental Health Support Features to Gemini: One-Tap Access to Crisis Assistance