Developing a Comprehensive Framework for Clinical Validation of Generative AI in Primary Care: Beyond Performance Metrics

Commercial partner: EMIS

This project aims to develop a robust framework for the clinical evaluation of generative AI large language models (LLMs) in primary care settings, with a specific focus on the critical phase of clinical validation. While existing literature primarily concentrates on algorithmic performance, this research seeks to bridge the gap between technical capabilities and real-world clinical applicability.

The rapid development of generative AI models, particularly large language models (LLMs), has generated significant excitement and expectations within the healthcare sector. These powerful AI systems have demonstrated remarkable abilities to assist clinicians in various tasks, such as summarizing patient records, generating treatment plans, and answering medical queries. However, the clinical translation of these technologies remains a significant challenge, as healthcare professionals and regulators require rigorous evidence of safety, efficacy, and alignment with clinical workflows before widespread adoption.

This project aims to address this gap by developing a comprehensive framework for the clinical evaluation of generative AI LLMs in primary care settings.
Systematic Literature Review:
The first stage of the project will involve a comprehensive review of current evaluation methods for AI in healthcare. This review will identify the strengths and limitations of existing approaches, with a particular focus on gaps in clinical validation methodologies. By synthesizing the current state of the field, the research team will gain a deeper understanding of the challenges and opportunities in assessing the real-world performance of generative AI LLMs in primary care.

Qualitative Research:
In the second stage, the project will leverage qualitative research methods to engage with key stakeholders, including general practitioners (GPs), policymakers, industry representatives, patient representatives, clinical safety specialists, information governance specialists and regulators. Through in-depth interviews, the research team will seek to understand the evidence requirements and concerns of these stakeholders regarding the safe and effective implementation of LLMs in primary care settings. This qualitative data will be instrumental in shaping the design of the subsequent quantitative phase of the research.

Discrete Choice Experiment (DCE):
Building on the insights gained from the literature review and qualitative research, the project will then employ a Discrete Choice Experiment (DCE) to quantify the preferences of healthcare professionals for different aspects of LLM implementation in primary care. We will sample multiple types of healthcare professionals to understand differing preferences. The DCE will present healthcare providers with a series of hypothetical scenarios, with LLMs’ attributes describing: performance, integration, and clinical workflow impact. By analyzing the choices made by participants, we will determine the relative importance of LLMs’ attributes.

The DCE will provide valuable data to inform the development of the clinical evaluation framework, as it will reveal the priorities and trade-offs that healthcare professionals are willing to make when considering the adoption of generative AI LLMs in their clinical practice. This information will be crucial in ensuring that the framework aligns with the needs and preferences of end-users, thereby increasing the likelihood of successful implementation and uptake.

The overarching goal of this project is to develop a comprehensive, evidence-based framework that can guide the clinical evaluation of generative AI LLMs in primary care settings. By addressing the gap between technical capabilities and real-world clinical applicability, this research aims to facilitate the responsible and effective integration of these transformative technologies into the healthcare system.

Apply using course: DPhil Primary Health Care

Cookies on this website

Developing a Comprehensive Framework for Clinical Validation of Generative AI in Primary Care: Beyond Performance Metrics