Observational health data Science: an introduction to real-world data, epidemiology, and machine learning, in person
Thursday, 14 May 2026, 9am to 2pm
Apply for this courseThis course is suitable for MSD students and research staff interested in developing a basic understanding of “real-world” or observational health data science. The sessions are interactive. No prior knowledge is required.
course aim
This course will provide an opportunity to learn about the foundations of a series of key topics in observational health data science. It includes an introduction to “real-world” data sources, epidemiology principles, and applied supervised and unsupervised machine learning for uncovering patterns in “big data” through a variety of applications such as clinical risk prediction and identification of disease clusters.
The course is carefully designed based on student-led needs to cover topics not collectively taught in a similar course across the Division
course content
SESSION 1 – Introduction to real-world data sources in the UK: CPRD GOLD & Aurum, ONS, HES, UK BIOBANK
Students will learn about the most influential data sources available in the UK: why they are collected, how they are structured and linked, and how to gain permission to use them. The challenges of real-world data will be made apparent together with solutions to implement and achieve data harmonisation and standardised analytics
SESSION 2 – Introduction to Epidemiology Basics
Students will learn the principles and scope of epidemiology and examine the benefits and limitations of epidemiological studies. Concepts such as ‘PICO’, ‘confounding’ and ‘bias’ will be introduced and the differences between various study designs such as cohort and case-control examined.
SESSION 3 – Introduction to Machine Learning for Healthcare
This session will offer a brief overview of machine learning methods for healthcare applications including supervised and unsupervised learning, followed by real-world examples of data analysis using routinely-collected data.
SESSION 4 – Introduction to Unsupervised Learning Approaches
This session will cover unsupervised learning methods and its application to cluster analyses, sub-group detection using routinely-collected data and actual clinical case studies.
The aim is to help participants in becoming familiar with some of the key observational health data sources and data science approaches, along with strengths and limitations when applied in a variety of clinical scenarios.
course objectives
This course would be suitable for those interested in developing a basic understanding of “real-world” or observational health data science as a foundation for more advanced studies towards improving healthcare practice and policy focusing on improving health access, interventions, and outcomes.
participant numbers
Maximum 20
ATTENDANCE CERTIFICATE ON SURVEY COMPLETION
It is now a requirement that you complete the three short questions in the survey you receive after attending the course. Once you have submitted the survey, you will be sent an email with a link to your attendance certificate. This is to ensure we receive the feedback we need to evaluate and improve our courses. Survey results are downloaded and stored anonymously.
