Recently, a story that will be recorded in the history of mathematics has taken place. Professor Timothy Gowers, a Cambridge University professor and Fields Medal winner, shared an astonishing experience on his personal blog: he used an unreleased version of ChatGPT 5.5 Pro to solve a long-standing open problem in combinatorial mathematics within just one hour.
For a long time, the academic community has been skeptical about the ability of large models to handle advanced mathematics, believing that they mostly "memorize" answers by retrieving literature or imitating known derivations. However, Gowers' test results completely shattered this prejudice. He found that this model in the internal testing phase not only identified concise arguments that even human experts might overlook, but also independently constructed highly original proof logic in the absence of existing theoretical support.
Overcoming a Problem in Additive Number Theory: A Leap from Exponential to Polynomial Bounds
The target of this test was a problem proposed by mathematician Mel Nathanson regarding the "upper bound estimation of the diameter of sumsets." Previously, MIT student Isaac Rajagopal had proven that this upper bound grows exponentially. However, under Gowers' guidance, ChatGPT 5.5 Pro began an amazing self-evolution.
In the first attempt, the model initially improved the upper bound data in just 16 minutes. Then, it showed strong confidence in the existence of a "polynomial bound" and independently identified several key technical propositions for verification. After approximately an hour of thinking and self-correction, the model submitted a complete proof. After reviewing it, Isaac Rajagopal exclaimed that the proof was not only logically flawless, but its core idea was "original and ingenious." Even if a human mathematician spent weeks thinking about it, achieving such a result would be something to be proud of.
New Ethical Challenges in Academia: Who Owns AI-Generated Papers?
As AI demonstrates this "doctoral-level" original research capability, a series of profound issues regarding academic standards and educational systems have come to the forefront. Gowers pointed out that these AI-generated achievements are fully capable of meeting the publication standards of core journals, but the current academic system has yet to make room for them. For example, the preprint platform arXiv currently explicitly refuses to accept content written by AI, which may lead to these valuable breakthroughs facing a "dissemination dilemma."
Additionally, the future of mathematical education is also facing redefinition. In the past, solving moderately difficult open problems served as the "whetstone" for training doctoral students. Now, AI can complete these tasks within an hour. This forces human scholars to seek deeper and more challenging topics. When "entry-level" research is taken over by AI, how will the core competitiveness of human mathematicians be demonstrated? This is not only a technological revolution, but also a re-examination of the boundaries of human intelligence.
