Researchers from Oxford University’s Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), University College London and the Centre for Ethnic Health Research, supported by Health Data Research UK, have for the first time studied the full detail of ethnicity data in the NHS. They outline the importance of using representative data in healthcare provision and have compiled this information into a research-ready database.
The new study, published in Nature Scientific Data, is the first part of a three-phase project that aims to reduce bias in AI health prediction models which are trained on real-world patient data. The project, which addresses ethnicity disparities that were highlighted during the pandemic, is part of the UK Government’s COVID-19 Data and Connectivity National Core Study led by Health Data Research UK.
The researchers used de-identified data on ethnicity and other characteristics from general practice and hospital health records, accessed safely within NHS England’s Secure Data Environment (SDE) service, via the British Heart Foundation Data Science Centre’s CVD-COVID-UK/COVID-IMPACT Consortium. This is the first time that patient ethnicity data has been studied at this depth and breadth for the whole population of England. The researchers were able to combine records to analyse patient self-identified ethnicity recorded through over 489 potential codes.
Researchers analysed how more than 61 million people in England identified their ethnicity in over 250 different groups. They also looked at the characteristics of those with no record of their ethnicity, and how conflicts in patient ethnicity data can arise. The data, now available for other researchers to use, shows that 1/10 patients lack ethnicity records, and around 12% of patients had conflicting ethnicity codes in their patient records.