The largest user study of large language models (LLMs) for assisting the general public in medical decisions has found that they present risks to people seeking medical advice due to their tendency to provide inaccurate and inconsistent information.
A new study from the Oxford Internet Institute and the Nuffield Department of Primary Care Health Sciences at the University of Oxford, carried out in partnership with MLCommons and other institutions, reveals a major gap between the promise of large language models (LLMs) and their usefulness for people seeking medical advice. While these models now excel at standardised tests of medical knowledge, they pose risks to real users seeking help with their own medical symptoms.
Key findings
- No better than traditional methods
Participants used LLMs to identify investigate health conditions and decide on an appropriate course of action, such as seeing a GP, or going to the hospital, based on information provided in a series of specific medical scenarios developed by doctors. Those using LLMs did not make better decisions than participants who relied on traditional methods like online searches or their own judgment.
- Communication breakdown
The study revealed a two-way communication breakdown. Participants often didn’t know what information the LLMs needed to offer accurate advice, and the responses they received frequently combined good and poor recommendations, making it difficult to identify the best course of action.
Read the full story on the Nuffield Department of Primary Care Health Sciences website.
