Well-known AI programming assistance tools Cursor recently shared an internal test finding: when handling long, autonomous programming tasks, OpenAI's latest GPT-5.2 model showed higher reliability than Anthropic's Claude Opus4.5. To verify the model's capabilities, the Cursor team attempted to build a fully functional Web browser from scratch, covering complex underlying architectures such as HTML parsing, CSS layout, and a custom JavaScript virtual machine.

The actual test results showed that GPT-5.2 could more accurately follow complex instructions and maintain high focus in "long-distance" tasks requiring millions of lines of code and taking weeks, effectively avoiding common issues like "goal drift" in long-term tasks. In contrast, Claude Opus4.5, although performing well in many scenarios, tends to stop midway or look for shortcuts when dealing with such large-scale projects, handing over control prematurely.
Currently, Cursor has synchronized the GPT-5.2 model on its platform, aiming to explore whether AI agents can independently complete large projects that usually take human teams months to finish. In addition to the browser experiment, the model has also successfully completed complex tasks including a Windows7 simulator and a migration task involving over a million lines of code, demonstrating the great potential of generative AI in autonomous engineering fields.
Key points:
🚀 Advantages in long-term tasks:Cursor pointed out that GPT-5.2 is more focused on goals and doesn't take shortcuts or fall apart in long-term, large-scale autonomous programming tasks compared to Claude Opus4.5.
🌐 Hard evidence test cases: The team used AI agents to write a Rust version of a browser kernel from scratch, proving the model's engineering capability in handling millions of lines of code.
🛠️ Significant efficiency improvement: In specific tasks, the AI agent re-wrote the rendering pipeline with a performance improvement of 25 times, and it can automatically add complex visual effects such as smooth zooming and dynamic blur.