Ant Group recently announced the open-sourcing of its latest flagship large model - Ling-1T, which has up to 1 trillion parameters and is the largest base model known to be trained using FP8 low-precision mode. Ling-1T was developed by Ant's "Bailing" team, marking another breakthrough in artificial intelligence technology.

image.png

According to the team's introduction, Ling-1T belongs to the Ling2.0 model family, which is divided into three series: Ling series, Ring series, and Ming series. The Ling series focuses on handling general tasks with speed and efficiency as its core, while the Ring series focuses on deep thinking and complex reasoning. The Ming series is a multimodal model that can handle a more diverse range of information types.

Ling-1T has 1 trillion parameters, but only about 50 billion parameters are actually activated for each token, greatly reducing computational costs. To support such a massive model, the Ant team proposed the "Ling Scaling Law." After experiments with over 300 models, they summarized the relationship between computational efficiency and the ratio of expert activation. In addition, they developed a learning rate scheduler called WSM, which can automatically adjust the learning strategy during training to ensure the model is trained stably and efficiently.

The training process of Ling-1T is divided into three stages: pre-training, mid-training, and post-training. During the pre-training stage, the model was exposed to more than 20 trillion tokens of data, including a large amount of reasoning-intensive text. The mid-training stage focuses on enhancing the model's reasoning capabilities, while the post-training stage uses "evolutionary chain of thought" technology for self-iteration, improving reasoning accuracy.

In comparison with other mainstream models, Ling-1T performed well in multiple tests, especially in mathematical reasoning and code generation capabilities, demonstrating its outstanding performance. In community testing, Ling-1T also showed remarkable performance in complex tasks, such as successfully simulating physical phenomena and cosmic evolution.

Although Ling-1T demonstrates strong capabilities, it still has some limitations, such as high costs when processing extremely long contexts. The Ant team has stated that they are researching a new hybrid attention architecture to address this issue.

Open source address:   

HuggingFace: https://huggingface.co/inclusionAI/Ling-1T  

GitHub: https://github.com/inclusionAI/Ling-V2  

Key points:   

🔍 Ling-1T is the largest trillion-parameter model known, trained using FP8 low-precision mode.   

🚀 The model outperforms several mainstream models in mathematical reasoning and code generation, showing excellent performance.   

⚙️ The Ant team is researching a new architecture to solve the cost issue of Ling-1T in processing extremely long contexts.