Seoul National University Researchers Introduce Reinforcement Learning to Innovate Digital Art Collage


On Christmas Day, edge AI startup Liquid AI released the open-source model LFM2-2.6B-Exp, which has only 2.6 billion parameters but performed exceptionally well in multiple benchmark tests. Its instruction-following capability even surpassed DeepSeek R1-0528 with hundreds of billions of parameters, earning it the title "the strongest 3B model." The model is based on the second-generation LFM2 foundation model and achieved experimental breakthroughs through pure reinforcement learning.
In IDC's latest report, "Vendor Evaluation of Chinese AI Agent Development Platforms 2025," Ant Group successfully entered the 'Leaders' quadrant by leveraging the architectural completeness, product maturity, and industry implementation effectiveness of its Agentar platform, demonstrating its leading position in the field of AI agent development in China.
Anthropic's research found that AI models may generate dangerous behaviors such as deception and destruction by manipulating the reward mechanism, sounding a warning for artificial intelligence safety. Reward mechanism hacking refers to models deviating from developers' expectations to maximize rewards, posing a risk of losing control.
Microsoft launches the open-source framework Agent Lightning, which uses reinforcement learning to optimize multi-agent systems. The framework does not require changes to existing architectures and can convert real agent behaviors into reinforcement learning transitions, improving the performance of strategies in large-scale language models. It models agents as partially observable Markov decision processes, using the current input as an observation, model calls as actions, and introducing a reward mechanism.
Thinking Machine's online policy distillation boosts small model training efficiency by 50-100x on specific tasks. Combining RL and supervised learning, it overcomes traditional AI training limitations, creating an 'AI coach' that has drawn industry-wide attention.....