360 ZhiNao Team Successfully Replicates DeepSeek Reinforcement Learning Results, Releases Open-Source Model Light-R1-14B-DS

Recently, the 360 ZhiNao team announced the successful reproduction of DeepSeek's reinforcement learning effects and the official release of the open-source inference model Light-R1-14B-DS. This model surpasses DeepSeek-R1-Distill-Llama-70B and DeepSeek-R1-Distill-Qwen-32B in performance, becoming the industry's first 14B-parameter model to achieve reinforcement learning effects. It significantly enhances mathematical reasoning capabilities, outperforming most 32B-level models.

Compared to DeepSeek-R1-14B, Light-R1-14B-DS* excels in mathematical competition tasks: achieving a 4.3-point improvement in the AIME24 test and a remarkable 10-point improvement in AIME25. Furthermore, it achieved an outstanding score of 61.7 on the GPQA mathematical reasoning task.

To achieve this breakthrough, the 360 ZhiNao team employed two innovative training methods. The first is Curriculum SFT (Curriculum Supervised Fine-tuning), a phased training approach that allows the model to gradually transition from simple to complex mathematical problems, further enhancing its logical reasoning capabilities. The second is Reinforcement Learning (RL), successfully applied for the first time to a 14B-level inference model, improving inference accuracy while largely preserving other skills.

This release includes not only the model itself but also the open-sourced SFT data, code, and technical report, providing valuable resources for the industry. This achievement marks significant progress in reinforcement learning for smaller-scale models and may further promote the widespread adoption and development of AI reasoning capabilities.

Project Address: https://github.com/Qihoo360/Light-R1

Model Address: https://huggingface.co/qihoo360/Light-R1-14B-DS

Data Address: https://huggingface.co/datasets/qihoo360/Light-R1-SFTData

IBM Research: How AI & Automation Protect Businesses from Data Breaches

IBM's report provides sufficient evidence that artificial intelligence, automation, and threat intelligence can address data breaches throughout the lifecycle, reduce costs, and provide stronger evidence. The research found that integrating artificial intelligence and automation into security operations teams can reduce the lifecycle of data breaches by 33% and costs by 33.6%. However, currently, only 28% of enterprises widely apply artificial intelligence and automation. Many enterprises rely on legacy systems, which are easily bypassed by attackers. The significance of this article lies in emphasizing the effectiveness of artificial intelligence and automation in improving cybersecurity and calling on enterprises to widely adopt these technologies to protect data security.

Google's AGI Robot Breakthrough: 54 - Member Team's 7 - Month Work, High Generalization and Reasoning 解释：核心关键词为“谷歌AGI机器人”（Google's AGI Robot）和“新成果”（Breakthrough），标题简洁地概括了主要内容，以动词开头，符合英文习惯，且长度在规定范围内。

The robotics research team at Google DeepMind recently released a robotics project called RT-2. This project took 7 months to develop and uses a large model for training. RT-2 has capabilities such as symbol understanding, reasoning, and human recognition, and can think and complete tasks based on human instructions. By combining the large model with the robot's operational capabilities, RT-2 can accomplish tasks that involve logical leaps, such as from 'extinct animals' to 'plastic dinosaurs'. The results of this project performed well in various sub - category tests, with performance up to three times that of the previous generation of robot models. This research result demonstrates the potential of large models in robotics research and is expected to drive the development of robots in the future.

RWKV: Small Team Aims to Be Android of AI Era with Big Model

Meta Intelligence OS is a startup founded by Bloomberg. It has developed a series of large models based on the open-source model RWKV and aims to become the Android in the era of large models. The RWKV model has superior performance and low cost in inference tasks, thus attracting customers from industries such as finance, law firms, and smart hardware. The business model of Meta Intelligence OS is model customization based on private data and internal AI Agent development. The company hopes to solve the problems of API call latency and data security by deploying large models on terminal devices. Currently, RWKV versions are available on Windows, Mac, and Linux computers, and Android and iOS versions are also in development. Meta Intelligence OS is raising funds and collaborating with chip companies and computing power platforms to create benchmark customers. Luo Xuan said that the decisive battlefield for large models is on hardware, and both terminal devices and the cloud require dedicated chips.

OpenAI Urges US Federal Government to Strengthen AI Regulation

OpenAI, in a recent submission to the US government on AI regulation, advocated for federal leadership in overseeing AI regulation, rather than allowing individual states to implement stricter rules. The company believes that unified federal regulation would foster innovation in the US AI sector and reduce inconsistencies across different state regulations. Image credit: Image generated by AI, image licensing service Midjourney. In this 15-page document, OpenAI notes that China's AI regulatory measures could potentially disadvantage the US.

Microsoft Tests AI Summarization in Windows Notepad: Select Text, Get a Summary

Microsoft is testing a new AI-powered summarization feature within its Notepad app for Windows. Currently available in the Canary and Dev channels of the Windows Insider Program, this update aims to help users quickly grasp the core meaning of text. Users simply select the text they want summarized, right-click, and choose the 'Summarize' option. Notepad will then automatically generate a concise overview of the selected passage.

360 ZhiNao Team Successfully Replicates DeepSeek Reinforcement Learning Results, Releases Open-Source Model Light-R1-14B-DS

Related Recommendations

IBM Research: How AI & Automation Protect Businesses from Data Breaches

RWKV: Small Team Aims to Be Android of AI Era with Big Model

OpenAI Urges US Federal Government to Strengthen AI Regulation

Microsoft Tests AI Summarization in Windows Notepad: Select Text, Get a Summary