On March 19, Cursor officially announced the launch of its self-developed coding model Composer 2. As soon as the announcement was made, the developer community was in an uproar—Cursor's data shows that this model achieved 61.7% on Terminal-Bench 2.0, which is significantly higher than Claude Opus 4.6's 58.0% under the same testing environment.

Was Anthropic's flagship model surpassed by its own IDE's built-in model? Once the news spread, discussions naturally followed.

QQ_1773968400375.png

Three Core Benchmark Scores

Cursor has released three sets of benchmark data, all officially published:

  • Terminal-Bench 2.0 (agent-style terminal coding tasks): Composer 2 reached 61.7%, exceeding Claude Opus 4.6's 58.0%. However, OpenAI GPT-5.4 still leads with 75.1%.
  • CursorBench (real coding scenarios within Cursor): Composer 2 reached 61.3%, a significant improvement from the previous version Composer 1.5's 44.2%, and also outperformed Claude Opus 4.6's 58.2%.
  • SWE-bench Multilingual (multilingual software engineering): Composer 2 achieved 73.7%, showing a notable improvement over the previous version.

However, there is one point worth noting: Anthropic previously reported that Claude Opus 4.6 achieved 65.4% on Terminal-Bench 2.0 under optimized settings, which is much higher than Cursor's 58.0%. The difference lies in the testing framework—Cursor used third-party agent environments like Harbor and averaged results over five runs, while Anthropic's numbers were based on their own optimized configuration. These two sets of data are not in the same reference system, so directly comparing them is somewhat like comparing apples and oranges. Cursor did not avoid this issue; the announcement explicitly stated that "the results depend on the agent, harness, and settings."

Cost Is Just One-Tenth of Opus 4.6's

Cost-effectiveness is actually Composer 2's real secret weapon.

Pricing at $0.50 / $2.50 (per million input/output tokens), compared to Claude Opus 4.6's $5 / $25 and GPT-5.4's $2.5 / $15, the difference is clear. Cursor explains that Composer 2 was designed from the start for long-term coding tasks, combining its self-developed RL training and "self-summarization" technology to reduce both speed and cost—what they call "frontier intelligence + extreme speed."

Composer 2 is Cursor's third-generation self-developed model, following Composer 1 released in October 2025 and the 1.5 version in February 2026. This upgrade focuses on "long-horizon tasks" (long-term tasks) and has set a faster, lighter variant as the default model in Cursor IDE.

What Does This "Rise from the Ashes" Mean?

Cursor's willingness to compare its own model directly with Opus 4.6 reflects a shift in the overall logic of the AI coding tools market.

OpenAI and Anthropic are competing on general cutting-edge capabilities, while vertical tool vendors like Cursor have taken a different path: refining performance on specific tasks to the point of excellence and then leveraging price advantages to differentiate. Media outlets such as VentureBeat and The New Stack noted that Composer 2 will accelerate the practical implementation of "multi-model routing"—using Opus or GPT for complex reasoning and switching back to Composer 2 for daily, high-frequency coding, achieving benefits on both ends.

Claude Opus 4.6 was released on February 5 this year and led in multiple rankings including Terminal-Bench 2.0, Humanity's Last Exam, and GDPval-AA. Cursor's latest data at least casts doubt on this conclusion in the niche coding sector.

Developer feedback has been mostly positive so far, but many people say they will wait to see how it performs in actual projects before making a judgment—this is reasonable, as benchmarks are just benchmarks. Cursor has already opened up free trial access to Composer 2 within the IDE for subscription users.

Source of data: Official announcements from Cursor and mainstream tech media, as of March 20, 2026. Real-time rankings can be checked at tbench.ai or Cursor's official website.