Domestic Team Makes History! The Final Exam of Humanity, First Score Exceeds 30 Points, AI Competition Intensifies Again!

Amid the intensifying global competition in artificial intelligence, Shanghai Jiao Tong University and the DeepSeek team have successfully achieved an impressive score of 32.1 in a test known as "The Last Exam" (HLE), making the first breakthrough above 30 points. This test set is renowned for its extremely high difficulty, with no model previously scoring above 10 points. Even recently, the highest score was only 26.9, jointly set by Kimi-Research and Gemini Deep Research.

This research introduced X-Master, a tool-enhanced reasoning agent, and the multi-agent workflow system X-Masters. This solution not only performs well technically but also has been open-sourced to further promote collaboration and development in the AI field.

The core concept of X-Master is to simulate the dynamic process of human researchers solving problems, seamlessly switching between internal reasoning and external tools. When encountering unsolvable problems, X-Master will write action plans into code, execute this code through various tools (such as NumPy and SciPy), and integrate the results back into the agent's knowledge system. This process creates an efficient feedback loop, allowing the agent to continuously optimize its reasoning process.

X-Masters is designed to be more complex, using a distributed-stacked agent workflow that enhances the breadth and depth of reasoning. During the distributed phase, multiple solvers work in parallel, generating different solutions, while a critic agent evaluates and improves these solutions. Next, a rewriter agent consolidates all outputs into a better solution, and finally, a selector agent chooses the best answer.

In this test, X-Masters also performed exceptionally well in the biology/medicine category, surpassing existing agent systems, demonstrating its strong capabilities in tackling complex problems.

"The Last Exam" was initiated this year by the AI Safety Center and Scale AI, aiming to assess the intelligence level of AI systems. The questions come from over 1,000 scholars at more than 500 institutions, and the difficulty is quite high.

Study: Global AI Chipset Market to Exceed $700B with 31.8% CAGR

According to TMR Research, the global artificial intelligence chipset market size is expected to exceed $700 billion, with a compound annual growth rate of 31.8% from 2022 to 2031. The article discusses the development trends, application areas, and key players in the artificial intelligence chipset market, which is highly timely and valuable for readers interested in the artificial intelligence chipset market.

IBM Research: How AI & Automation Protect Businesses from Data Breaches

IBM's report provides sufficient evidence that artificial intelligence, automation, and threat intelligence can address data breaches throughout the lifecycle, reduce costs, and provide stronger evidence. The research found that integrating artificial intelligence and automation into security operations teams can reduce the lifecycle of data breaches by 33% and costs by 33.6%. However, currently, only 28% of enterprises widely apply artificial intelligence and automation. Many enterprises rely on legacy systems, which are easily bypassed by attackers. The significance of this article lies in emphasizing the effectiveness of artificial intelligence and automation in improving cybersecurity and calling on enterprises to widely adopt these technologies to protect data security.

Google's AGI Robot Breakthrough: 54 - Member Team's 7 - Month Work, High Generalization and Reasoning 解释：核心关键词为“谷歌AGI机器人”（Google's AGI Robot）和“新成果”（Breakthrough），标题简洁地概括了主要内容，以动词开头，符合英文习惯，且长度在规定范围内。

The robotics research team at Google DeepMind recently released a robotics project called RT-2. This project took 7 months to develop and uses a large model for training. RT-2 has capabilities such as symbol understanding, reasoning, and human recognition, and can think and complete tasks based on human instructions. By combining the large model with the robot's operational capabilities, RT-2 can accomplish tasks that involve logical leaps, such as from 'extinct animals' to 'plastic dinosaurs'. The results of this project performed well in various sub - category tests, with performance up to three times that of the previous generation of robot models. This research result demonstrates the potential of large models in robotics research and is expected to drive the development of robots in the future.

RWKV: Small Team Aims to Be Android of AI Era with Big Model

Meta Intelligence OS is a startup founded by Bloomberg. It has developed a series of large models based on the open-source model RWKV and aims to become the Android in the era of large models. The RWKV model has superior performance and low cost in inference tasks, thus attracting customers from industries such as finance, law firms, and smart hardware. The business model of Meta Intelligence OS is model customization based on private data and internal AI Agent development. The company hopes to solve the problems of API call latency and data security by deploying large models on terminal devices. Currently, RWKV versions are available on Windows, Mac, and Linux computers, and Android and iOS versions are also in development. Meta Intelligence OS is raising funds and collaborating with chip companies and computing power platforms to create benchmark customers. Luo Xuan said that the decisive battlefield for large models is on hardware, and both terminal devices and the cloud require dedicated chips.

Domestic Team Makes History! The Final Exam of Humanity, First Score Exceeds 30 Points, AI Competition Intensifies Again!

Related Recommendations

Study: Global AI Chipset Market to Exceed $700B with 31.8% CAGR

IBM Research: How AI & Automation Protect Businesses from Data Breaches

RWKV: Small Team Aims to Be Android of AI Era with Big Model

Google's Medical AI Model MedGemma Series Released, Can Run on a Single GPU