The Baoling large model series under Ant Group has received a major update today, with
As an Instruct model with a total parameter count of 104B and an activated parameter count of 7.4B,

Technical Highlights: Hybrid Architecture and Extreme Efficiency
Hybrid Linear Architecture: Through underlying computational optimization, the model demonstrates excellent inference speed. With 4 H20 cards, its inference speed can reach up to 340 tokens/s. In the Prefill (pre-fill) throughput metric, it reached 2.2 times that of Nemotron-3-Super, significantly reducing response latency.
Outstanding "Smart Efficiency Ratio": The development team conducted in-depth calibration of token efficiency during training. Evaluation data shows that for tasks of the same quality,
only consumes about 15M tokens, which is one-tenth of that of similar competitors, greatly reducing commercial costs.Ling-2.6-flash
Scenario Deepening: Targeted Enhancement of Agent Capabilities
For the most widely used agent (intelligent entity) scenarios in large models,
Currently, developers can access the open-source resources of this model through Hugging Face and ModelScope (Moba Community), further exploring its potential in various industry applications.
