In the field of software engineering, Kunlun Weibo officially released its self-developed code intelligence base model Skywork-SWE-32B on June 20 and made it open source. The model performs excellently in software engineering tasks and has become the strongest code repair capability model in the industry with a parameter scale of 32B. The Kunlun Weibo team created the largest verifiable dataset currently available by constructing over 10,000 verifiable GitHub repository task instances, systematically verifying the data scaling law of large models in software engineering tasks.
Skywork-SWE-32B achieved a pass@1 accuracy rate of 38.0% on the SWE-bench Verified benchmark, setting a new record for the best performance among the Qwen2.5-Coder-32B series models under the OpenHands code framework. By introducing test-time expansion technology, the model's performance further improved to 47.0%, surpassing all existing open-source models below the 32B parameter scale and narrowing the performance gap with some closed-source models.
The Kunlun Weibo team addressed the issues in the mainstream datasets for SWE tasks in the current market by establishing an automated three-stage process for collecting and verifying training data. In the data collection phase, they used the GitHub API to scrape information from over 150,000 open-source repositories and, through a series of strict screening steps, retained 23,389 task samples. In the verification phase, the team used unified command generation and Docker environment construction techniques to ensure the validity of each task sample, ultimately generating 10,169 high-quality samples.
In the agent trajectory generation phase, the team used the open-source OpenHands framework combined with commercial large models as the base to execute multiple rounds of interaction for each task, comprehensively recording the problem-solving process of the agent. Ultimately, they built 8,209 high-quality validated trajectories, providing a solid foundation for the training of Skywork-SWE-32B.
The successful release of Skywork-SWE-32B has injected new vitality into the development of software engineering agents, showcasing its capabilities and potential in handling complex development scenarios.
Blog address 🔗
https://quixotic-sting-239.notion.site/eb17f379610040ceb54da5d5d24065bd
HuggingFace address 🔗
https://huggingface.co/Skywork/Skywork-SWE-32B
Key points:
🌟 The Skywork-SWE-32B model achieved a pass@1 accuracy rate of 38.0% on the SWE-bench Verified benchmark, setting a new record for the best performance among 32B open-source models.
📈 After introducing test-time expansion technology, the model's accuracy increased to 47.0%, significantly narrowing the performance gap with closed-source models.
🔍 Kunlun Weibo established an automated process to build more than 10,000 high-quality, verifiable SWE task datasets, laying the groundwork for model training.