A new study found that nearly half of the medical advice generated by popular AI chatbots like ChatGPT and Grok is problematic. The chatbots frequently provided incorrect health information, faked scientific references, and refused to admit ignorance.
The study was done in Feb 2025 and they probably wrote the research proposal months before then, waited for approval / funding, etc. I don’t know the process of how academia works but I imagine it to be very slow and bureaucratic.
https://bmjopen.bmj.com/content/16/4/e112695
Well they didn’t even use the latest models in Feb 2025. They should’ve used DeepSeek R1 and OpenAI o3-mini which use additional test time compute to arrive at better answers. They used GPT 3.5 which was about 2½ years old at the time.