Moonshot AI has announced the release of its new open-source model, Kimi-Dev-72B, which is specialized in software engineering tasks and has achieved the highest score among all open-source models on the AI programming benchmark, SWE-bench Verified. With only 7.2 billion parameters, Kimi-Dev-72B has surpassed DeepSeek-R1, which has an impressive 67.1 billion parameters.

image.png

In the SWE-bench Verified test, Kimi-Dev-72B scored 60.4%, establishing itself as a new benchmark in the field of open-source models. The optimization process of this model includes large-scale reinforcement learning, enabling it to automatically fix real repositories in Docker environments. Kimi-Dev-72B only receives rewards when all test cases pass during testing, ensuring the correctness and robustness of the generated solutions, meeting high standards for real-world development.

Kimi-Dev-72B is now available for download on Hugging Face and GitHub platforms. Users can access model weights and source code, and a technical report will be released soon. The Hugging Face link is: huggingface.co/moonshotai/Kimi-Dev-72B, and the GitHub link is: github.com/MoonshotAI/Kimi-Dev.

In terms of design philosophy, Kimi-Dev-72B combines the roles of BugFixer and TestWriter. BugFixer is responsible for fixing errors, while TestWriter writes corresponding unit tests. These two parts complement each other, ensuring the effectiveness of the model in programming tasks. The workflow of Kimi-Dev-72B is simple and clear, mainly divided into two stages: file localization and code editing.

To enhance the capabilities of the model, Moonshot AI used approximately 150 billion high-quality data points for mid-term training, sourced from real issues and PR submissions on GitHub. Through rigorous data purification, the model learned how human developers solve problems and write code. The reinforcement learning phase focused on enhancing its code editing ability, gradually optimizing model performance through a result-based reward system.

image.png

During the testing phase, Kimi-Dev-72B coordinates the roles of BugFixer and TestWriter using a self-play mechanism, thereby improving the model's performance and effectiveness. Each problem can generate up to 40 patch candidates and test candidates, demonstrating the powerful effect of the self-play mechanism.

In the future, Moonshot AI plans to further expand the functionality of Kimi-Dev-72B by exploring deeper integration with popular development tools, making it more seamlessly integrated into developers' workflows. The company is committed to continuously improving this model, conducting rigorous red team tests, and rolling out stronger versions to the community.

Hugging Face address: huggingface.co/moonshotai/Kimi-Dev-72B

GitHub address: github.com/MoonshotAI/Kimi-Dev

Key Takeaways:

🔍 Kimi-Dev-72B is a newly released open-source model that has set a global record for programming benchmarks.  

🚀 This model combines the functions of BugFixer and TestWriter to improve programming efficiency and code quality.  

💡 Moonshot AI will continue to optimize Kimi-Dev-72B, planning deeper integration with popular development tools in the future.