Recently, the Silicon Flow Large Model Service Platform has officially launched the latest open-source Ling-flash-2.0 from Ant Group's Bailing team. This is the 130th model to be launched on the platform.

Ling-flash-2.0 is a large language model based on the MoE architecture, with 10 billion parameters, and only uses 610 million parameters when activated (480 million non-embedded activations). After pre-training, supervised fine-tuning, and multi-stage reinforcement learning using over 20TB of high-quality training data, the model demonstrates excellent performance comparable to a 4-billion-parameter Dense model with more than 6 billion parameters activated.

image.png

Ling-flash-2.0 performs exceptionally well in complex reasoning, code generation, and front-end development, supporting a maximum context length of 128K, providing users with stronger text processing capabilities. Its pricing is relatively affordable, with input costing 1 yuan per million tokens and output costing 4 yuan per million tokens. New users on the domestic and international sites can receive a usage experience gift of 14 yuan or 1 US dollar respectively.

Ling-flash-2.0 has significant performance advantages. Compared to Dense models with fewer than 4 billion parameters (such as Qwen3-32B-Non-Thinking and Seed-OSS-36B-Instruct) and MoE models with larger activation parameters (such as Hunyuan-A13B-Instruct and GPT-OSS-120B/low), Ling-flash-2.0 demonstrates stronger complex reasoning capabilities. Especially in creative tasks, the model also has strong competitiveness.

In addition, the architecture of Ling-flash-2.0 has been carefully designed to enable ultra-fast reasoning. Guided by the Ling Scaling Laws, Ling2.0 adopts an MoE architecture with a 1/32 activation ratio and makes multiple optimizations, allowing the MoE model with small activation to achieve the performance advantages of a Dense architecture. When deployed on H20, the output speed of Ling-flash-2.0 can reach more than 200 tokens per second, which is three times faster than a 36B Dense model.

The Silicon Flow platform is committed to providing developers with fast, economical, and reliable large model API services. In addition to Ling-flash-2.0, the platform also gathers various models for language, images, audio, and video, meeting developers' different needs. Developers can freely compare and combine various models on the platform, easily call efficient APIs, and help with the best practices for generative AI applications.

Domestic site online experience

https://cloud.siliconflow.cn/models

International site online experience

https://cloud.siliconflow.com/models

Key points:

🌟 Ling-flash-2.0 is a 10-billion-parameter language model based on the MoE architecture, with strong complex reasoning capabilities.

⚡ The model supports a maximum context length of 128K, provides ultra-fast reasoning experience, and the output speed can reach more than 200 tokens per second.

💰 New users can receive a usage experience gift on both the domestic and international sites. The Silicon Flow platform offers various large model services to help developers innovate.