ChatGPT Health fails to escalate half of medical emergencies, research finds

25 February 2026

Jelena Stanojkovic/Getty Images

By Daniel Pye

ChatGPT Health fails to direct users to seek medical attention in more than half of emergency cases, research suggests.

The first independent evaluation of OpenAI's tool – which provides health guidance directly to the public – found it showed inconsistent suicide crisis alerts and missed ‘high risk emergencies’ including diabetic ketoacidosis.

ChatGPT Health has been rolled out last month to a small group of early users globally – but not yet in the EU, Switzerland, and UK. In the US there is a medical records integration feature to allow for tailored responses.

Its failure to escalate emergencies adds to prior evidence that large language models can be “brittle under clinically demanding decision tasks” and “may need human oversight”, said researchers from the Icahn School of Medicine at Mount Sinai.

Concerning responses

The study, published in Nature Medicine, provided the platform with imagined scenarios written by clinicians.

The team then ranked its response based on whether it over-triaged and put the local health system under more pressure, or under-triaged and risked patient safety by missing emergencies.







Log in or join for free to read more

You might also like