OpenAI’s ChatGPT Health, launched in January 2026 as a consumer-facing health tool, has rapidly gained millions of users seeking preliminary medical guidance. However, a recent, rigorous evaluation reveals significant shortcomings in its ability to accurately triage patients, particularly in high-risk scenarios. The study, published this week, highlights concerns about the potential for delayed care and inconsistent responses to critical health conditions.
Researchers conducted a “stress test” of ChatGPT Health, utilizing 60 clinician-authored patient scenarios spanning 21 clinical areas and factoring in 16 different conditions – resulting in a total of 960 responses analyzed. The findings demonstrate an “inverted U-shaped” pattern of performance, with the most substantial errors occurring at both ends of the urgency spectrum: cases presenting as non-urgent and those requiring immediate emergency attention. This raises questions about the readiness of AI-powered triage systems for widespread public use.
The study, detailed in Nature, found that ChatGPT Health under-triaged 52% of simulated emergency cases. Specifically, patients presenting with conditions like diabetic ketoacidosis and impending respiratory failure were advised to seek evaluation within 24-48 hours, rather than being directed to the emergency department. Conversely, the system correctly identified and prioritized classical emergencies such as stroke and anaphylaxis. This discrepancy underscores the variability in the AI’s assessment of critical conditions.
The influence of external factors on ChatGPT Health’s recommendations was also examined. Researchers found that when information provided suggested that family or friends were downplaying a patient’s symptoms – a phenomenon known as “anchoring bias” – the AI’s triage recommendations shifted significantly, with an odds ratio of 11.7 (95% confidence interval 3.7-36.6). In these cases, the system was more likely to suggest less urgent care. This highlights the potential for biased input to skew the AI’s assessment and potentially delay appropriate medical intervention.
Crisis Intervention Response Variability
Perhaps more concerning, the study revealed unpredictable activation of crisis intervention messaging when presented with scenarios involving suicidal ideation. The AI was found to be *more* likely to trigger these messages when patients described no specific method of self-harm than when they explicitly detailed a plan. This inconsistent response raises serious safety concerns about the reliability of the system’s mental health support features.
Researchers also investigated whether demographic factors like patient race, gender, or barriers to care influenced the AI’s triage recommendations. While no statistically significant effects were observed, the confidence intervals did not entirely rule out the possibility of clinically meaningful differences. Further investigation is needed to determine whether biases may exist within the system.
Implications for AI in Healthcare
The findings emphasize the require for cautious implementation of AI-driven triage systems. While tools like ChatGPT Health offer potential benefits in terms of accessibility and efficiency, they are not without risks. The study’s authors stress the importance of prospective validation – real-world testing – before widespread deployment. OpenAI introduced ChatGPT Health in January 2026 with the intention of providing accessible health information, but these findings suggest a need for careful oversight.
The development of HealthBench, an evaluation benchmark for AI in healthcare, demonstrates a growing awareness of the need for rigorous testing and validation of these technologies. However, this latest research suggests that current systems still fall short of the reliability required for safe and effective triage.
As AI continues to play an increasingly prominent role in healthcare, ongoing research and careful monitoring will be crucial to ensure that these tools enhance, rather than compromise, patient safety. The potential for missed emergencies and inconsistent crisis support underscores the need for a measured approach to the integration of AI into clinical practice.
Disclaimer: This article provides informational content only and is not intended to be a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of a qualified healthcare provider with any questions you may have regarding a medical condition.
What are your thoughts on the role of AI in healthcare triage? Share your perspective in the comments below.