Recently, Geoffrey A. Fowler, a technology columnist for The Washington Post, shared a rather alarming experience. He imported his health data recorded on an Apple Watch over the past ten years into OpenAI's newly launched ChatGPT Health feature, only to be incorrectly rated as "fail" (F) for his heart health. This result shocked Fowler, who immediately contacted a doctor for further examination.

After a detailed medical evaluation, the doctor clearly stated that Fowler's heart health was excellent, with a very low risk of a heart attack, and even no need for additional aerobic fitness tests. This revelation relieved Fowler, but it also raised his doubts about the accuracy of AI health assessments.

image.png

After further analysis, Fowler found that the misjudgment by ChatGPT mainly stemmed from a misunderstanding of the data. The AI treated the VO2max (maximum oxygen uptake) recorded by the Apple Watch as absolutely accurate medical data, while in fact, Apple has already indicated that this data is merely an "estimate," primarily used for tracking health trends, not for clinical diagnosis. Additionally, after upgrading to a new Apple Watch, the change in resting heart rate due to sensor upgrades was incorrectly interpreted by the AI as a significant change in physiological function, completely ignoring the key factor of hardware updates.

More troublingly, ChatGPT Health showed obvious instability in its feedback. When Fowler asked the same health question multiple times, the AI's score fluctuated dramatically between "F" and "B," which was unbelievable. Moreover, the system seemed to suffer from "memory loss," forgetting Fowler's gender and age information multiple times during the conversation. Even with recent blood test reports, it selectively ignored these important clinical evidence.

Fowler's experience reminds us to remain vigilant when using AI technology for health assessments, as AI judgments are not always reliable.