Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

** This course is now closed for booking ** This interactive online course is suitable for postgraduate research staff and students wanting to learn about simulating data to make their research more robust and reproducible. Please note that while the first part of the course in Canvas is available to those outside of MSD, only MSD postgraduate research staff and students can attend live sessions.

Course objectives

  • Gain a deep understanding of basic statistical methods used for null-hypothesis significance testing
  • Find out how to simulate experimental data with known characteristics then apply standard statistical tests
  • Gain a comprehensive sense of what 'p-value' means
  • Benefit from a gentle introduction to R programming language

You are expected to complete pre-course work on Canvas and then attend two live sessions:

22 February 2021 @ 1pm - 3pm

8 March 2021 @ 1pm - 3pm

Please note: postgraduate research staff and students outside of MSD can access the Canvas materials, but due to limited numbers, only those from MSD can participate in the live sessions.

Course format

Two sessions of online lectures/exercises in Canvas, broken into chunks of between 7-18 minutes, with each session followed by an optional interactive online session in small groups to discuss questions submitted in advance by participants. These sessions will be moderated by Professor Dorothy Bishop, Dr Paul Thompson or Dr Adam Parker from the Department of Experimental Psychology. 

Because numbers are limited for the interactive online sessions, only participants from MSD will be selected to take part on the basis of the questions submitted.

We will aim to group participants with similar questions. These questions might include: 

  • requests by novices for more explanation/instruction of the lecture material  
  • advice on best approaches for simulating data for specific projects 
  • advice on how to extend the simulation approach to more complex designs or datasets, or different analytic approaches (e.g. Bayesian methods).   

Please note:

To give the course organisers time to prepare, all questions must be submitted to courses@medsci.ox.ac.uk at least one week before the interactive session.

Participants should keep the slot free for the interactive session; we will give at least one day’s notice to those selected to attend an interactive session. 

In case we are unable to accommodate people for the interactive session, we will inform them in advance and aim to point them to additional resources relevant to their questions. 

Please note that different versions of Excel could create compatibility issues.

course description

In recent years, there has been an increasing focus on the 'replication crisis', with evidence that much published research is not robust. There are many reasons for this: in this course, the focus will be on the importance of having a deep understanding of the basic statistical methods that are commonly used for null-hypothesis significance testing. Human cognition is not well-suited to thinking about probability, and so if researchers are simply trained to apply statistical tests, they may do so in a way that is very likely to generate non-replicable findings. A good way to gain a deep understanding of the nature and limitations of statistical methods is to simulate experimental data with known characteristics, and then apply standard statistical tests.  This course will be in two parts, and students should attend both of these.  

In session 1, the focus will be on gaining a deep understanding of what a p-value means, and in particular how easy it is to obtain 'significant' results from null data if there is a flexible approach to data analysis. In session 2, the focus moves to simulating datasets where there is a real effect, so consider how choice of experimental design and analytic approach may influence whether the effect is detected. 

No prior knowledge of coding is required, and the initial exercises will use Excel, to illustrate the basic principles that are involved in data simulation. Subsequent exercises will use the R programming language. No prior knowledge of R is required, and indeed this course can act as a gentle introduction to R. However, to get benefit from the course, participants should follow along the coding exercises, and for this they will need to have R, R studio and some related packages and scripts installed: the instructions for doing this will be provided in a document 'Initial installations'. 

learning outcomes

At the end of session 1, participants should be able to: 

  • Simulate distributions of variables with known means and standard deviations, using R. 
  • Understand when and why it is important to apply corrections for multiple statistical tests. 
  • Understand the value of simulating data to check out an analysis plan prior to running an experiment 

At the end of session 2, participants should be able to: 

  • Understand why doing research with an insufficient sample size is wasteful and can result in false acceptance of a null hypothesis 
  • Understand how to use simulation to do a power analysis 
  • Be aware of how power can be influenced by choice of experimental design and measures 

Session 1 (total viewing time 114 minutes) 

Simulating random data (null effects) 

block 

Duration 

Content 

R Script 

 

1.1 

18.43 

Introduction: using Excel to illustrate data simulation 

1.2 

9.36 

Doing a t-test on simulated data (Excel) 

1.3 

7.12 

Explanation of p-hacking 

1.4 

11.00 

The Garden of Forking Paths 

1.5 

12.38 

Simulating data in R 

Simulation_ex1_intro 

1.6 

17.46 

Walking through the script 

Simulation_ex1_intro 

1.7 

7.48 

Repeatedly running simulation in a loop 

Simulation_ex1_multioutput 

1.8 

14.49 

Simulating correlated variables 

Simulation_ex2a_correlations 

1.9 

13.45 

Overview; different kinds of p-hacking. Different simulation packages 

Forkingpaths_demo 

Session 2 (total viewing time 81 minutes) 

Simulating data with an estimated true effect 

 

block 

Duration 

Content 

R Script 

 

2.1 

8.30 

Impact of N on estimates of mean 

sampling_demo 

2.2 

7.16 

How N and effect size affect p-value distribution 

Simulation_ex1_multioutput 

2.3 

13.37 

 

More on effect sizes, and a demonstration using simstudy package 

simstudy_power_demo 

2.4 

8.16  

Confronting the problem of low power 

2.5 

19.30 

Comparing power of between vs within-subjects design using faux package 

faux_demo_bw 

2.6 

12.04 

Increasing number of observations to enhance power 

simulating_items 

2.7 

10.17 

Increasing number of observations to improve test-retest reliability 

simulating_reliability 

REFERENCE 

Bishop, D. V. M. (2019). World View: Rein in the four horsemen of irreproducibility. Nature, 568, 435. doi:10.1038/d41586-019-01307-2 

ATTENDANCE CERTIFICATE ON SURVEY COMPLETION

It is now a requirement that you complete the three short questions in the survey you receive after attending the course. Once you have submitted the survey, you will be sent an email with a link to your attendance certificate. This is to ensure we receive the feedback we need to evaluate and improve our courses. Survey results are downloaded and stored anonymously.

PLEASE NOTE

Where no cost is indicated in the shopping trolley, no deposit is required. However, two or more non-attendances or late cancellations without good reason will be logged and may mean you cannot attend any further MSD training that term. Please refer to our Terms and Conditions for further information.