Moon's Dark Side Releases New Open Source Model Kimi-Dev-72B, Breaking Programming Benchmark Records

Moonshot AI has announced the release of its new open-source model, Kimi-Dev-72B, which is specialized in software engineering tasks and has achieved the highest score among all open-source models on the AI programming benchmark, SWE-bench Verified. With only 7.2 billion parameters, Kimi-Dev-72B has surpassed DeepSeek-R1, which has an impressive 67.1 billion parameters.

In the SWE-bench Verified test, Kimi-Dev-72B scored 60.4%, establishing itself as a new benchmark in the field of open-source models. The optimization process of this model includes large-scale reinforcement learning, enabling it to automatically fix real repositories in Docker environments. Kimi-Dev-72B only receives rewards when all test cases pass during testing, ensuring the correctness and robustness of the generated solutions, meeting high standards for real-world development.

Kimi-Dev-72B is now available for download on Hugging Face and GitHub platforms. Users can access model weights and source code, and a technical report will be released soon. The Hugging Face link is: huggingface.co/moonshotai/Kimi-Dev-72B, and the GitHub link is: github.com/MoonshotAI/Kimi-Dev.

In terms of design philosophy, Kimi-Dev-72B combines the roles of BugFixer and TestWriter. BugFixer is responsible for fixing errors, while TestWriter writes corresponding unit tests. These two parts complement each other, ensuring the effectiveness of the model in programming tasks. The workflow of Kimi-Dev-72B is simple and clear, mainly divided into two stages: file localization and code editing.

To enhance the capabilities of the model, Moonshot AI used approximately 150 billion high-quality data points for mid-term training, sourced from real issues and PR submissions on GitHub. Through rigorous data purification, the model learned how human developers solve problems and write code. The reinforcement learning phase focused on enhancing its code editing ability, gradually optimizing model performance through a result-based reward system.

During the testing phase, Kimi-Dev-72B coordinates the roles of BugFixer and TestWriter using a self-play mechanism, thereby improving the model's performance and effectiveness. Each problem can generate up to 40 patch candidates and test candidates, demonstrating the powerful effect of the self-play mechanism.

In the future, Moonshot AI plans to further expand the functionality of Kimi-Dev-72B by exploring deeper integration with popular development tools, making it more seamlessly integrated into developers' workflows. The company is committed to continuously improving this model, conducting rigorous red team tests, and rolling out stronger versions to the community.

Hugging Face address: huggingface.co/moonshotai/Kimi-Dev-72B

GitHub address: github.com/MoonshotAI/Kimi-Dev

Key Takeaways:

🔍 Kimi-Dev-72B is a newly released open-source model that has set a global record for programming benchmarks.

🚀 This model combines the functions of BugFixer and TestWriter to improve programming efficiency and code quality.

💡 Moonshot AI will continue to optimize Kimi-Dev-72B, planning deeper integration with popular development tools in the future.

Moon's Dark Side Releases New Open Source Model Kimi-Dev-72B, Breaking Programming Benchmark Records

Related AI News

Former Google CEO's startup releases 24 billion-parameter chemical reasoning model with accuracy surpassing multiple leading models

Tencent LeVo is Here! An AI Singing Model Comparable to Suno 4.5 Supports Zero-Sample Timbre Cloning

Quark App Launches Quark Teacher with Personalized AI Tutoring Capabilities

MagicTryOn is released! ZJU x vivo jointly create video virtual try-on, you can perfectly change clothes while dancing!

POE Platform Launches Dream 3.0 Image and Seedance 1.0 Lite Video Model

Disrupting Tradition! New Multi-Agent Framework OWL Gains 17K Stars, Surpassing OpenAI to Pioneer a New Era of Intelligent Collaboration

​Moon's Dark Side Releases New Open Source Model Kimi-Dev-72B, Breaking Programming Benchmark Records

Related AI News

Former Google CEO's startup releases 24 billion-parameter chemical reasoning model with accuracy surpassing multiple leading models

Tencent LeVo is Here! An AI Singing Model Comparable to Suno 4.5 Supports Zero-Sample Timbre Cloning

Quark App Launches Quark Teacher with Personalized AI Tutoring Capabilities

MagicTryOn is released! ZJU x vivo jointly create video virtual try-on, you can perfectly change clothes while dancing!

POE Platform Launches Dream 3.0 Image and Seedance 1.0 Lite Video Model

Disrupting Tradition! New Multi-Agent Framework OWL Gains 17K Stars, Surpassing OpenAI to Pioneer a New Era of Intelligent Collaboration

Moon's Dark Side Releases New Open Source Model Kimi-Dev-72B, Breaking Programming Benchmark Records