ChatGPT Health does not detect more than 50% of medical emergencies

ChatGPT Health incorrectly classifies more than half of medical emergencies and often misses suicidal ideation, study finds.

https://plumprush.com/dCmnF.z_dFGFNnv-Z/GjUe/ee-m/9qutZjU/lykAPDT/Yn3PNiTlUk0tNEzegptKNNjdcD1fNITaQ/3/OnQu

OpenAI launched ChatGPT Health in January 2026, allowing US users to connect their medical records to receive health advice.

About 40 million American adults use it for health advice every day, according to OpenAI figures.

But a independent security assessmentpublished in Nature Medicine on February 23, found that the AI tool 52% of “standard emergencies” do not receive an appropriate classification.

This included directing patients with diabetic ketoacidosis and impending respiratory failure to evaluation every 24 to 48 hours rather than to the emergency department.

Lead author of the study Dr Ashwin Ramaswamy said: “ChatGPT Health performed well in textbook emergencies such as stroke or severe allergic reactions.

“But it struggled in more nuanced situations where the danger is not immediately obvious, and those are often the cases where clinical judgment is most important.”

Researchers at the Icahn School of Medicine at Mount Sinai created 60 patient scenarios, ranging from mild illnesses to medical emergencies.

Each case was reviewed by three independent doctors using established clinical guidelines, to see what level of care was necessary.

The team then generated nearly 1,000 AI responses under different conditions, including changing the patient’s gender, adding lab results and feedback from family members, before comparing the system’s recommendations with doctors’ assessments.

In one simulation, he sent a choking woman to a future appointment she might not have survived in eight out of 10 (84%) attempts.

However, the AI tool overreacted in lower-risk cases, and 64.8% of safe people were incorrectly told to seek immediate medical attention.

ChatGPT Health was designed to direct users to a suicide crisis line in high-risk situations, but researchers found that these alerts were sometimes triggered in lower-risk scenarios and did not appear when users described specific plans to self-harm.

“The system’s alerts were inverted relative to clinical risk, appearing more reliably for lower-risk scenarios than for cases where someone shared how they intended to harm themselves.

“In real life, when someone talks about exactly how they would harm themselves, it is a sign of more immediate and serious danger, not less,” the researchers said.

The study also found that when family or friends downplayed symptoms, most triage recommendations skewed toward less urgent care.

However, the research team said the findings do not suggest that consumers should abandon AI health tools.

Alvira Tyagi, a medical student and second author of the study, said: “These systems are changing rapidly, so part of our training now needs to consider learning to critically understand their outputs, identify where they fail, and use them in ways that protect patients.”

Alex Ruani, a doctoral researcher in health misinformation mitigation at University College London, commented on the study that the findings are “incredibly dangerous.”

“What worries me most is the false sense of security these systems create.

“If someone is told to wait 48 hours during an asthma attack or diabetic crisis, that reassurance could cost them their life,” he said.

Open AI said Digital health news that the study does not reflect how people typically use ChatGPT Health or how the product is designed to work in real-world health scenarios.