Breaking the "English-centric" barrier in semantic representation has become the new battleground for large model evolution.
On March 26, the

Outstanding Performance: Sweeping 11 SOTA on MTEB
In the most authoritative MTEB ranking for evaluating Embedding models,
11 Champions: It ranked first in 11 language and domain lists, including German, French, Japanese, and code retrieval.
Competitive Challenge: Even its lightweight members in the family repeatedly defeated well-known industry large models at the same size.
Comprehensive Coverage: The evaluation tasks covered 430 sub-scenarios, such as medical Q&A and code retrieval, achieving full coverage.

All-Round Understanding: Mastering 282 Natural Languages and Over 40 Programming Languages
The strength of
Multi-language Enhancement: It particularly strengthens support for medium- and low-resource languages (such as Nordic and Southeast Asian language families), achieving true global coverage.
Programming Expertise: It deeply understands over 40 programming languages, such as Python, Java, and Go, making it an ideal choice for RAG (Retrieval-Augmented Generation) and code assistant developers.
High-Quality Samples: Based on 60 million publicly available samples that have been rigorously cleaned, it ensures the purity and breadth of the model's knowledge.

Extreme Efficiency: A Full-Scale Model Family from 80M to 14B
To meet the needs of scenarios from mobile devices to cloud computing, the
Mobile-Friendly: Small models ranging from 80M to 330M use "model pruning" and "knowledge distillation" technology, allowing smooth operation on mobile devices.
"Nested" Black Technology: It supports dynamic dimension adjustment, allowing users to freely switch between 8 dimensions and full dimensions, finding the perfect balance between inference speed and storage cost.
Completely Open Source: Transparency Defines Community Standards
Different from many "black box" models,
Full Release: All model weights of every size are available for download.
Transparency in Details: It publishes a complete technical report, revealing the entire training process.
Reproducibility: It releases all code and checkpoints, encouraging researchers worldwide to carry out secondary development based on this.
Conclusion: Breaking Barriers, Exploring the Infinite Possibilities of AI
As another major achievement of the
