Ant Group's BaiLing large model has officially announced the open-source release of its latest member, Ling-2.6-flash. The model also launched multiple quantized versions including BF16, FP8, and INT4, aiming to provide global developers with more flexible hardware compatibility options and further lower the threshold for AI deployment.
As a high-performance model, Ling-2.6-flash has a total parameter count of 104B, with 7.4B activated parameters. Previously, the model made a name for itself on international mainstream evaluation platforms under an anonymous identity and completed multiple rounds of deep optimization for Chinese-English switching and code adaptation based on developer feedback.

Significant Improvement in Inference Efficiency
In terms of technical architecture, Ling-2.6-flash introduced an advanced hybrid linear architecture, greatly unleashing computing potential. Under the mainstream H20 GPU environment, its inference speed can reach up to 340 tokens per second, with throughput far exceeding industry competitors.
Aside from its speed advantage, the model also showed remarkable efficiency. Evaluation data shows that when completing tasks of the same complexity, Ling-2.6-flash consumes only one-tenth the number of tokens compared to models of the same level, effectively reducing long-term operational costs for enterprises.
Enhanced Intelligent Agent Scenarios
Regarding the currently popular agent applications, Ant Group enhanced the model's targeted capabilities. Whether it is complex tool calls or long-path task planning, Ling-2.6-flash has demonstrated strong logical execution ability and task success rate.
The model is now available on mainstream open-source communities such as Hugging Face and ModelScope. Through this deep open-source initiative, Ant Group hopes to empower more developers in vertical fields, exploring new boundaries of large model applications while ensuring data privacy.
