Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Millions of people are now turning to AI chatbots for answers about their health — but a major new study warns this trust may be misplaced. The largest user study to date examining how large language models (LLMs) support real people making medical decisions finds that these systems can provide inaccurate, inconsistent, and potentially dangerous advice when users seek help with their own symptoms.

A woman watching the screen of a laptop

The largest user study of large language models (LLMs) for assisting the general public in medical decisions has found that they present risks to people seeking medical advice due to their tendency to provide inaccurate and inconsistent information.

A new study from the Oxford Internet Institute and the Nuffield Department of Primary Care Health Sciences at the University of Oxford, carried out in partnership with MLCommons and other institutions, reveals a major gap between the promise of large language models (LLMs) and their usefulness for people seeking medical advice. While these models now excel at standardised tests of medical knowledge, they pose risks to real users seeking help with their own medical symptoms.

Key findings

  • No better than traditional methods

Participants used LLMs to identify investigate health conditions and decide on an appropriate course of action, such as seeing a GP, or going to the hospital, based on information provided in a series of specific medical scenarios developed by doctors. Those using LLMs did not make better decisions than participants who relied on traditional methods like online searches or their own judgment.

  • Communication breakdown

The study revealed a two-way communication breakdown. Participants often didn’t know what information the LLMs needed to offer accurate advice, and the responses they received frequently combined good and poor recommendations, making it difficult to identify the best course of action.

 

Read the full story on the Nuffield Department of Primary Care Health Sciences website.