Silicon-Based Flow Launches Ling-flash-2.0 Inference Speed Sets New Records

Recently, the Silicon Flow Large Model Service Platform has officially launched the latest open-source Ling-flash-2.0 from Ant Group's Bailing team. This is the 130th model to be launched on the platform.

Ling-flash-2.0 is a large language model based on the MoE architecture, with 10 billion parameters, and only uses 610 million parameters when activated (480 million non-embedded activations). After pre-training, supervised fine-tuning, and multi-stage reinforcement learning using over 20TB of high-quality training data, the model demonstrates excellent performance comparable to a 4-billion-parameter Dense model with more than 6 billion parameters activated.

Ling-flash-2.0 performs exceptionally well in complex reasoning, code generation, and front-end development, supporting a maximum context length of 128K, providing users with stronger text processing capabilities. Its pricing is relatively affordable, with input costing 1 yuan per million tokens and output costing 4 yuan per million tokens. New users on the domestic and international sites can receive a usage experience gift of 14 yuan or 1 US dollar respectively.

Ling-flash-2.0 has significant performance advantages. Compared to Dense models with fewer than 4 billion parameters (such as Qwen3-32B-Non-Thinking and Seed-OSS-36B-Instruct) and MoE models with larger activation parameters (such as Hunyuan-A13B-Instruct and GPT-OSS-120B/low), Ling-flash-2.0 demonstrates stronger complex reasoning capabilities. Especially in creative tasks, the model also has strong competitiveness.

In addition, the architecture of Ling-flash-2.0 has been carefully designed to enable ultra-fast reasoning. Guided by the Ling Scaling Laws, Ling2.0 adopts an MoE architecture with a 1/32 activation ratio and makes multiple optimizations, allowing the MoE model with small activation to achieve the performance advantages of a Dense architecture. When deployed on H20, the output speed of Ling-flash-2.0 can reach more than 200 tokens per second, which is three times faster than a 36B Dense model.

The Silicon Flow platform is committed to providing developers with fast, economical, and reliable large model API services. In addition to Ling-flash-2.0, the platform also gathers various models for language, images, audio, and video, meeting developers' different needs. Developers can freely compare and combine various models on the platform, easily call efficient APIs, and help with the best practices for generative AI applications.

Domestic site online experience

https://cloud.siliconflow.cn/models

International site online experience

https://cloud.siliconflow.com/models

Key points:
🌟 Ling-flash-2.0 is a 10-billion-parameter language model based on the MoE architecture, with strong complex reasoning capabilities.
⚡ The model supports a maximum context length of 128K, provides ultra-fast reasoning experience, and the output speed can reach more than 200 tokens per second.
💰 New users can receive a usage experience gift on both the domestic and international sites. The Silicon Flow platform offers various large model services to help developers innovate.

DeepSeek-R1 Paper Appears on the Cover of Nature, Highlighting a New Breakthrough in Artificial Intelligence Reasoning

Recently, the latest issue of the journal Nature featured a cover paper that has attracted widespread attention. The research focuses on DeepSeek-R1. This study was led by Professor Liang Wenfeng's team and centers on how to enhance the reasoning capabilities of large language models (LLMs) through reinforcement learning. As early as January this year, the research was published on arXiv and received high praise from the academic community. In the cover introduction, Nature pointed out that if large models can plan the steps to solve problems, they often achieve better solutions. This

Beware of Fraud! DeepSeek Officially Warns Users to Prevent Scams Related to Computing Power Leasing and Equity Financing

DeepSeek (Deep Seek) issued an official statement on the evening of September 18, revealing that fraudsters have been impersonating the company or its current employees, using forged employee IDs and business licenses, and engaging in fraud on multiple platforms. These scammers are posing as offering 'computing power leasing' or conducting 'equity financing' to illegally charge users. This behavior not only seriously infringes on user property rights but also harms the company's reputation. In the statement, DeepSeek emphasized that the company has never asked users to transfer money to any personal account or non-official account.

GPT-4o Revived! How OpenAI Is Coping With Users' Emotional Dependence On The New Model

Shortly after the release of GPT-5, OpenAI unexpectedly decided to bring back its previous models such as GPT-4o. The strong reaction from users made the company realize that many people had already developed deep emotional attachments to these older models. When GPT-4o was taken offline, many users felt as if they had lost a familiar companion, and this response clearly exceeded OpenAI's expectations. In a recent interview, OpenAI's Chief Product Officer, Nick Turley, addressed this issue in depth.

Silicon-Based Flow Launches Ling-flash-2.0 Inference Speed Sets New Records

Related Recommendations

MiniMax Open-Source M2 Model: High-Performance AI Empowers Coding and Proxy, Cost is Only 8% of Competitors

Tencent Hunyuan Launches 52B Parameter Multimodal Understanding Model Large-Vision, Supporting Input of Any Resolution and Full Scenario

DeepSeek-R1 Paper Appears on the Cover of Nature, Highlighting a New Breakthrough in Artificial Intelligence Reasoning

Beware of Fraud! DeepSeek Officially Warns Users to Prevent Scams Related to Computing Power Leasing and Equity Financing

GPT-4o Revived! How OpenAI Is Coping With Users' Emotional Dependence On The New Model