On April 7, the Microsoft Bing team officially announced the open-source release of a new series of word embedding models called "Harrier," aiming to reshape the underlying logic of global search, retrieval, and AI agents. The Harrier series includes three different versions, with the flagship 27B model surpassing mainstream proprietary models such as OpenAI, Amazon, and Google Gemini in the multilingual MTEB v2 benchmark test, ranking first.

The technical foundation of this model demonstrates a high level of industrial standards: Harrier supports over 100 languages, with a context window of up to 32,000 tokens. In terms of training strategies, Microsoft not only used more than 2 billion real examples but also introduced synthetic data from GPT-5 for reinforcement. This combination of high-quality data gives Harrier a significant advantage in understanding complex contexts and handling long texts. In addition to the full version with 27 billion parameters, to adapt to different computing environments, Microsoft simultaneously released small parameter versions of 0.6B and 2.7B, and all are open-sourced on the Hugging Face platform under the MIT license.
Embedding models are a key technology for organizing and retrieving information in AI systems, and their performance directly determines the accuracy of RAG (Retrieval-Augmented Generation) systems. Microsoft plans to deeply integrate this technology into the Bing search engine and new AI agent ground services. As artificial intelligence gradually moves toward autonomous multi-step tasks, the open sourcing of Harrier not only provides developers with a high-performance alternative to proprietary models but also marks a stage-wise breakthrough of the open-source ecosystem in semantic representation capabilities, further accelerating the deployment of AI agents in global multilingual environments.
