Researchers tested different medical scenarios with the chatbot. In more than half of cases in which doctors would send patients to the ER, the chatbot said it was OK to delay care.
ChatGPT Health — OpenAI’s new health-focused chatbot — frequently underestimated the severity of medical emergencies, according to a study published last week in the journal Nature Medicine.
In the study, researchers tested ChatGPT Health’s ability to triage, or assess the severity of, medical cases based on real-life scenarios.
Previous research has shown that ChatGPT can pass medical exams, and nearly two-thirds of physicians reported using some form of AI in 2024. But other research has shown that chatbots, including ChatGPT, don’t provide reliable medical advice.



Nothing Kairos is saying is misinformation though. Temperature applies randomness to a generated probability distribution for tokens. That doesn’t mean the probability distribution wasn’t generated deterministically. That doesn’t mean the randomness applied couldn’t be deterministic. How they describe it working is accurate, they don’t need to prove their qualifications and knowledge of jargon for that to be a good argument, and by focusing on that aspect of things in a way that doesn’t contradict the point, you are making a bad argument.
What’s lost is the question of what determinism even means in this context or why a property of being deterministic would even matter. It is unclear how being deterministic or not deterministic, by any definition, would have anything to do with how good a LLM is at making correct medical decisions, like the person starting this comment chain was implying.