In scientific research, reasoning ability is crucial. Scientists are not just recalling facts; they need to propose hypotheses, test and refine them, and synthesize ideas across different fields. As AI models become more capable, evaluating their ability to perform deep reasoning in scientific research has become an important issue.

image.png

Recently, AI models have achieved milestone results in several major fields, including performing well in the International Mathematical and Informatics Olympiads. At the same time, advanced models like GPT-5 are effectively accelerating real scientific workflows. Researchers use these systems for interdisciplinary literature searches and complex mathematical proofs, significantly reducing research time from days or weeks to hours.

To further evaluate AI's capabilities in scientific research, we have introduced a new benchmark — FrontierScience. This benchmark focuses on assessing expert-level scientific reasoning abilities in fields such as physics, chemistry, and biology. FrontierScience includes hundreds of expert-verified challenging problems, with two problem tracks: the Olympiad version and the Research version, aiming to measure Olympic-style scientific reasoning abilities and real-world scientific research capabilities, respectively. Preliminary evaluation results show that GPT-5.2 outperforms other models in both the FrontierScience-Olympiad and Research modules.

Specifically, GPT-5.2 scored 77% in the Olympiad module and 25% in the Research module. Although current models can already support structured reasoning in research processes, there is still room for improvement in open-ended thinking abilities. Currently, scientists use these models to accelerate research processes, but they still rely on human judgment for problem framing and validation. In the future, we will continue to improve the FrontierScience benchmark and expand its application areas to help models become reliable partners in scientific discovery.

Key Points:   

🔍 FrontierScience is a newly launched benchmark designed to assess AI's reasoning abilities in scientific fields.   

📊 Preliminary evaluations show that GPT-5.2 performs outstandingly in scientific reasoning abilities, but there is still room for improvement in open-ended thinking abilities.   

🚀 Advancements in AI models are accelerating the scientific research process, and future efforts will focus on optimizing evaluation benchmarks and expanding application areas.