Hallucinations generated by large models—outputting factual errors with a straight face—have always been a core challenge in the AI industry. This issue is particularly fatal in high-risk fields such as healthcare and law.
For a long time, the industry's approaches to combat hallucinations have mainly focused on two paths: one is to expand training data extensively, trying to make AI "omniscient and omnipotent"; the other is to set up defensive mechanisms, allowing AI to "stay silent" when uncertain. However, both of these approaches have obvious limitations. The former cannot cover all facts in the world, leaving blind spots; the latter often leads to severe "utility tax": in order to eliminate errors, AI has to refuse to answer many correct questions, greatly sacrificing user experience.
Recently, a paper jointly published by Google Research and Tel Aviv University offers a new perspective for this dilemma: metacognition. The study proposes that the key to solving hallucinations lies not in forcing AI to never make mistakes, but in enabling AI to "know what it knows and know what it does not know."

Figure: The difference between calibration and discriminative power. The left figure shows a well-calibrated model (red line close to the diagonal), while the right reveals the harsh reality—that even with perfect calibration, reducing the error rate from 25% to 5% requires sacrificing 52% of correct answers.
The paper redefines hallucinations: the key issue is not whether the content output by AI is wrong, but whether it misleads users with a confident tone when it is not sure. Researchers believe that AI should have the ability of "faithful uncertainty." In other words, when the internal computational state of AI shows hesitation or low confidence, its wording should also reflect reservations and caution, rather than pretending to be absolute facts.
Metacognition refers to an AI's awareness of its own cognitive process. This requires large models to be able to sensitively perceive their internal states and, based on that perception, honestly express their level of confidence. In the era of AI agents, this ability is especially critical. An AI system without metacognition is like a pilot without a dashboard—it not only cannot determine when to call upon tools but also cannot distinguish the authenticity of search results, making it prone to tool misuse and even "blind flying."

Figure: Actual performance of major models on SimpleQA Verified. The five-pointed star at the top right represents the ideal goal, "Discrimination Gap" marks the gap between current models and the ideal, and "Utility Tax" indicates the utility cost paid by Claude Opus4 to achieve high accuracy.
Of course, realizing this approach also faces considerable challenges. For example, how to distinguish "true metacognition" from "deliberate performance of uncertainty," and how to avoid the negative effects of RLHF (Reinforcement Learning from Human Feedback)—because humans often prefer confident answers, which in some ways actually encourages AI to learn to pretend confidence.
For the future development of AI, the study provides practical recommendations: evaluation metrics for anti-hallucination technologies should no longer be limited to a single accuracy rate, but should assess the balance between "utility and error rate." AI does not need to become an illusion that never makes mistakes, but it must possess the basic qualities of a professional: the ability to honestly distinguish between "I am certain" and "I guess." This clear understanding of the boundaries of its knowledge is the essential path to improving AI's credibility and practical value.
