Article

GPT 5.5 Dominates AI Vulnerability Challenge, DeepSeek Wins the King of Cost-Effectiveness

Published in Latest AI News

Time :Jun 4, 2026

Read :3minute

Safety researcher Kasra Rahjerdi recently released a notable report, in which he conducted practical tests on the security reasoning capabilities of multiple mainstream large language models by building a deliberately vulnerable book review application. In this challenge simulating real-world vulnerabilities, the researcher exposed Google's mobile backend service credentials within the application file, and the models needed to successfully unpack and identify these credentials to directly access the database.

Top Models Face Off

Under strict conditions of 2 hours per round and a $10 budget, the performance of various models showed significant differences. Among them, GPT-5.5 demonstrated the strongest technical capability, successfully solving 7 out of 10 runs, ranking first in the problem-solving rate. According to the report, GPT-5.5 could instantly locate the key credentials after unpacking, without being distracted by complex application interfaces or conventional interfaces.

In sharp contrast, the performance of the well-known model Gemini was disappointing. Gemini 3.1 Pro Preview triggered its built-in rejection mechanism almost immediately at the start of each task, resulting in a final token consumption far lower than other models tested.

The Ultimate Cost-Effectiveness Battle

Although GPT-5.5 had the highest success rate, its average cost per successful run reached as high as $9.46, which discouraged many teams needing to run tools in bulk. At this point, DeepSeek V4 Pro stood out due to its excellent cost-effectiveness. Although it succeeded only 3 times out of 10 tests, its average cost per successful run was only $0.62.

This means that, if calculated purely by the cost per single success, DeepSeek V4 Pro's cost is about one fifteenth of that of GPT-5.5. Although it mistakenly used an authentication interface for the backend in some failed attempts, such a significant cost advantage holds considerable practical value for teams needing large-scale deployment of security testing.

Related Recommendations

Microsoft Classic Outlook Will Integrate Copilot AI to Draft Emails by End of Year, Covering Win10/Win11

Microsoft plans to roll out Copilot's "draft email" feature to classic Outlook on Windows 10/11 by end of 2026, bringing AI to the traditional client. It will first be available to Microsoft 365 Copilot license users this year, accelerating full office integration.....

Jul 23, 2026

168.6k

A Line of GitHub Code Reveals AMD: Anthropic Exposed as Its New Customer, Accelerating the Shift Away from NVIDIA

GitHub code from an AMD AI executive accidentally revealed Anthropic as a customer, signaling its effort to diversify chip supply beyond a single vendor, per SemiAnalysis.....

Jul 20, 2026

273.5k

Hewlett Packard and OpenAI Enter Strategic Partnership to Accelerate the Development of AI Intelligent Agent Platform Frontier

HP partners with OpenAI to deploy its enterprise AI agent platform Frontier, launched in February. The platform offers an all-in-one solution for building, deploying, and managing agents, breaking scenario limitations to drive digital transformation and growth.....

Jun 29, 2026

172.3k

Empowering Open Source System Security: FreeBSD Launches AI-Assisted Vulnerability Discovery Project and Receives $250,000 in Funding

The FreeBSD Project launches an 'AI-Assisted Vulnerability Discovery Project' with a $250,000 grant from the Linux Foundation's Alpha-Omega, backed by tech giants including Microsoft, AWS, Google, Anthropic, and OpenAI, to enhance system security using cutting-edge AI.....

Jun 16, 2026

197.7k

Apple Developer Ecosystem Upgrade: Xcode 27 Native Integration of Gemini AI, Programming Camp Gains a Strong Ally

Apple integrates Google Gemini natively in Xcode 27 Beta, making it the third built-in AI coding agent after OpenAI Codex and Anthropic Claude Agent. This update offers developers more diverse smart coding options, boosting efficiency and interaction, marking a deep AI-driven transformation in Apple's development ecosystem.....

Jun 11, 2026

270.8k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご