Article

AliTongyi Lab Launches FIPO Algorithm to Significantly Enhance Large Model Inference Capabilities

Published in Latest AI News

Time :Apr 7, 2026

Read :4minute

The Qwen Pilot team from Alibaba Tongyi Lab recently introduced a new algorithm called FIPO (Future-KL Influenced Policy Optimization), which aims to overcome the bottlenecks faced by large models during the reasoning process. Traditional reinforcement learning methods (RLVR) often fail to distinguish which tokens are critical for the final result when processing each token in a reasoning chain. Therefore, how to accurately identify key tokens has become an urgent problem.

The FIPO algorithm introduces the Future-KL mechanism, which specifically rewards tokens that have a significant impact on subsequent reasoning, thereby solving the "reasoning length stagnation" issue in pure RL training. In practical tests, FIPO outperformed models of similar scale such as o1-mini and DeepSeek-Zero-MATH under a 32B pure RL setup.

According to the team's research results, most tokens show little change before and after training, indicating that the impact of reinforcement learning is extremely sparse. The team found that commonly used evaluation metrics in the industry, such as entropy and KL divergence, are difficult to accurately identify changes in key tokens. Therefore, they introduced a new observation dimension —— the difference in log probability of symbol pairs (Δlog p), effectively capturing the directionality of optimization.

In the experiment, the FIPO algorithm was tested on the zero-shot model Qwen2.5-32B-Base, breaking through the bottleneck of reasoning length, with an average reasoning length exceeding 10,000 tokens. At the same time, the algorithm also achieved a significant improvement in reasoning accuracy, proving its potential in complex mathematical reasoning.

Key points:
🔍 FIPO algorithm is developed by Alibaba Tongyi Lab, aiming to enhance the reasoning ability of large models.
📈 This algorithm can accurately identify tokens that have a significant impact on reasoning, breaking through the reasoning length bottleneck.
🧠 Experiments show that FIPO performs significantly better than traditional algorithms in complex mathematical reasoning.

Related Recommendations

The financial large model market has grown by 90% in a year, and Baidu Intelligent Cloud once again holds the top position

IDC report shows China's financial sector generative AI market reached 1.74 billion yuan in 2025, surging 90.4% YoY, nearly doubling. Baidu AI Cloud leads again with technical strengths.....

Jul 16, 2026

136.7k

The Three-O'Clock Scam: How AI Voice Fraud Easily Bypasses Security

A 73-year-old US woman was scammed by AI voice cloning. A caller posing as her daughter claimed she caused a car accident while texting, injured a pregnant woman, and needed $15,000 bail. In panic, she withdrew cash and handed it to a courier, only learning of the fraud after reaching her real daughter.....

Jul 16, 2026

99.2k

GPT-5.6 IQ Breaks 130 Genius Line, Smarter Than 99% of Humans, Practical Work Ability Also Extraordinary

OpenAI's GPT-5.6 variants scored 136 on Tracking AI's offline IQ test, surpassing the 130 genius threshold and exceeding 99% of humans. Achieved on a private, cheat-proof question set, it outperformed all rivals. The model previously scored over 140 on the public Mensa Norway-style test.....

Jul 16, 2026

252.9k

Ant Brain Full-Stack 2.0 Makes Its Debut at WAIC, Demonstrating Smart Pharmacy with a Single Brain for Multiple Machines

The 2026 World AI Conference opens on July 17, highlighting the top 10 "Treasures of the Exhibition." Selections include Ant Group's robot smart pharmacy powered by the Lingbo cross-embodiment model, and Sugon's fully domestic 100,000-card AI supercluster, evaluated on technology, market potential, replicability, and social value.....

Jul 16, 2026

102.9k

JD.com AI Agent and Tencent Yuanbao Integrate with WeChat Mini Program Ecosystem to Enable One-Click Ordering within AI Chat

JD's AI agent and Tencent Yuanbao have linked mini-programs, making JD the first e-commerce partner on Yuanbao. The partnership covers digital, home appliances, beauty, food, etc. Users can complete search, recommendation, order, fulfillment, and after-sales on Yuanbao, closing the loop from AI chat to e-commerce.....

Jul 15, 2026

191.0k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご