According to reports, the Qwen Pilot team from Ali Tongyi Lab has introduced a new algorithm called FIPO. This algorithm aims to break through the bottlenecks of traditional reinforcement learning (RL) in handling complex logic, achieving a dual breakthrough in reasoning length and accuracy.

Core Breakthrough: Solving "Reasoning Length Stagnation"

Traditional models often struggle to distinguish which Tokens are key to reaching the correct answer when dealing with complex problems like mathematics. FIPO addresses this issue with targeted reengineering:

Future-KL Mechanism: Introduces the Future-KL strategy, specifically rewarding Tokens that have a significant positive impact on subsequent reasoning, enabling AI to "think ahead."

Symbolic Log Probability Difference: Introduces this new mechanism to precisely capture the model's optimization direction, preventing the reasoning process from getting stuck in unproductive loops.

Reasoning Length Leap: On a base model, FIPO successfully increased the average reasoning length to over 10,000 Tokens, completely solving the problem of insufficient reasoning depth.

Outstanding Performance: 32B Model Surpasses o1-mini

In practical tests, the 32B-scale model equipped with the FIPO algorithm demonstrated remarkable "powerful" performance:

Surpassing Competitors: In a pure reinforcement learning setup, its reasoning performance not only surpassed models of the same scale but also outperformed OpenAI's o1-mini in some metrics.

Mathematical Potential: The algorithm performs exceptionally well in handling high-level mathematical reasoning tasks, showcasing strong logical deduction capabilities.

Industry Context: Tongyi Lab's "Intelligent Evolution"

Ali Tongyi Lab has been active in AI fundamental algorithms recently. In addition to this impressive FIPO algorithm, the team launched the CoPaw 1.0 new version at the beginning of March, demonstrating their continuous efforts in improving the logical rigor and interaction depth of models.

Conclusion: The "Second Curve" of Reasoning Efficiency

While the industry is still debating parameter scale, Ali Tongyi