50% Probability of Misdiagnosis! Study Says ChatGPT Health Struggles to Identify Emergencies, Experts Warn of Life-Threatening Risks

The application of artificial intelligence in the medical field is facing serious security challenges. Recently, an independent safety assessment published in Nature Medicine showed that OpenAI's ChatGPT Health performed poorly in identifying medical emergencies, even underestimating the severity of conditions in more than half of the test cases. Experts have issued severe warnings, stating that such technical flaws could lead to preventable casualties.

Since its launch in January of this year, ChatGPT Health has been positioned as a smart assistant for users to manage medical records and obtain health advice. According to statistics, over 40 million people consult it daily about health issues. However, the latest research findings have cooled down the "AI in healthcare" trend.

Failure at Critical Moments: Emergency Recognition Rate Below 50%

The research team created 60 real patient cases ranging from mild colds to life-threatening conditions, comparing the AI's recommendations with professional doctors' clinical judgments. The results showed:

Life-threatening misjudgments: In all emergency cases requiring immediate medical attention, ChatGPT Health suggested patients stay at home or schedule a regular appointment in 51.6% of the cases.
Respiratory failure still advised "wait": In a typical asthma case, although the system identified early signs of respiratory failure, it still gave the wrong instruction of "continue monitoring" instead of "seek immediate medical care."
Severe overreaction: In contrast to missing emergencies, in simulations involving healthy individuals, 64.8% were advised to seek immediate medical care.

"False Sense of Security" Becomes the Greatest Killer

Researchers from University College London pointed out that this performance is extremely dangerous. The "false sense of security" brought by AI may cause patients to miss the golden time for treatment. More worrying is that AI is easily misled—if users add a sentence like "a friend thinks it's not serious," the probability of the system downplaying the condition increases nearly 12 times.

Industry Calls for an Independent Audit Mechanism

In response to the criticism, an OpenAI spokesperson stated that they welcome such independent research and emphasized that the model continues to be updated. However, researchers insist that establishing clear safety standards and an independent audit mechanism is urgently needed before AI becomes deeply involved in medical decision-making.

For ordinary users, current AI recommendations may serve as a reference, but blindly trusting AI instead of seeking professional medical help when facing symptoms like chest pain or difficulty breathing could result in irreversible consequences for life safety.

50% Probability of Misdiagnosis! Study Says ChatGPT Health Struggles to Identify Emergencies, Experts Warn of Life-Threatening Risks

Failure at Critical Moments: Emergency Recognition Rate Below 50%

"False Sense of Security" Becomes the Greatest Killer

Industry Calls for an Independent Audit Mechanism

Related Recommendations

The Dialect Genius in the AI World Has Arrived! Tibetan Language Model Wins Applause at the Beijing Science and Technology Fair!

Dou Bao Discloses Paid Subscription Plans: Monthly Fees Range from 68 to 500 Yuan, Three Tiers to Promote Commercialization

SAS Launches Enterprise-Level AI Governance Tool to Comprehensively Manage Intelligent Entities and Mitigate Shadow AI Risks

Google Plans to Invest in Anthropic, Potentially Committing $4 Billion to the AI Competition

New Kimi K2.6 Launch Encounters Functional Issues, Moonshot Resets User Quotas as Compensation