Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.
Skip to main content

Online Assessment: What is the best question type? How many options should students choose from?

This is very much a moot point. The extended matching questions (EMQs) originally espoused by Case and Swanson, are gradually being replaced within the MSD, and more widely in the medical sciences, by standard multiple choice questions (MCQs). This is in response to various papers suggesting that, while increasing the number of answers from which students choose can increase the difficulty of questions, there is very little loss in discrimination with significantly fewer options e.g Swanson et al., 2006.

The effect of decreasing the number of choices per item while lengthening the test proportionately is to increase the efficiency of the test for high-level examinees and to decrease its efficiency for low-level examinees
- Lord, F. M. (1977). Optimal number of choices per item—A comparison of four approaches. Journal of Educational Measurement, 14, 33– 38.

In fact, in a meta-analysis of previous research, Rodriguez suggests that in many cases, choosing between three options is most efficient as it minimises time taken to write questions, and for students to answer them, allowing a better coverage of the subject matter. The argument is that, in many questions, only two of the incorrect answer options are really distractors anyway, any additional options are eliminated immediately, effectively leaving a three-option MCQ. The same argument suggests that, although a five option MCQ might appear to give only a 20% likelihood of getting the right answer by chance, as long as students have sufficient ability and aren’t under time pressure - so are not guessing completely randomly, if there are only three plausible answers, the likelihood is actually 33%.

This may be an over-simplification, with its assumptions about the ability of examiners to write plausible distractors and the abilities of students to spot and discount the implausible, but it does suggest that the number of answer options should be chosen to match the question/subject matter/ability of the author/ability of the students, rather than arbitrarily deciding on a fixed value.

In practice in the MSD, five option MCQs now form the great majority of summative assessment questions while EMQs, multiple response questions (tick boxes), and MCQs with a different number of options make up the remainder.

Some useful guides to question-writing include:

If you only have 5 minutes: 10 Tips for Writing Great Multiple Choice Questions

More in-depth:

Constructing Written Test Questions For the Basic and Clinical Sciences by Case and Swanson for the US National Board of Medical Examiners

Guidelines for writing Multiple Choice Questions (MCQs), Single Best Answer (SBA) & Extended Matching Items (EMI) formats from The Joint Committee on inter-collegiate examinations

Writing Questions for Undergraduate Exams – A resource pack for Medical Schools Council Assessment Alliance.

Campbell, D. E. (2011), How to write good multiple-choice questions. Journal of Paediatrics and Child Health, 47: 322–325. doi:10.1111/j.1440-1754.2011.02115.x

At Oxford, there is also a regular seminar run by the Medical School on question writing.

Online Assessment: Why can’t I access outside quizzes from my home computer? Why am I getting a ‘connection timed out’ message?

The online assessment system is only accessible to computers on the Oxford University network or which are connected to it, virtually, using the Virtual Private Network (VPN) service from IT Services. This requires you to download a piece of client software onto your machine and login to the network using your Oxford username and remote access password.

Online Assessment: Standard setting - helping to create a defensible assessment

Two of the methods used to determine pass marks for online assessment within MSD are:

  1. The ‘Cohen method’ sets the pass mark relative to the performance of the best-performing students (typically the 95th centile) on the basis that these students will not vary significantly from year to year. This method requires a fairly large cohort of students for reliability. Cohen-Schotanus, J., & van der Vleuten, C. P. (2010). A standard setting method with the best performing students as point of reference: Practical and affordable. Medical teacher, 32(2), 154–160.
  2. Variations on the Angoff method (see e.g. QuestionMark handout) in which subject matter experts determine how likely a minimally competent or borderline candidate is to answer correctly. This can be applied question by question and then summed. Alternatively, variations on the Ebel method can be used to classify questions according to difficulty (e.g. easy, medium, hard) and relevance (e.g essential, important, acceptable and questionable). The Angoff likelihoods can then be applied to each class of question. This method requires a group of subject matter experts to go through an assessment question by question. We have a spreadsheet which can be used to collate Ebel-Angoff ratings.

Others, not used in the MSD as far as we are aware, but worth considering, are: the Hofstee method and the Borderline method for OSCEs

Online Assessment: What results and analysis can I expect from my online assessment?

We can normally produce results immediately after the last participant has submitted their answers, although if any remarking is required (to account for a problem question), this will take a little longer.

Analysis for MCQs, with one answer per question, can also be produced fairly instantly using the tools in Perception. Where the a question requires more than one answer from a student, we have to run the results through our own analysis software which can take significantly longer, particularly when the questions haven’t been created using our question making tool.

See What do difficulty, correlation, discrimination, etc. in the question analysis mean?

Online Assessment: What do difficulty, correlation, discrimination, etc. in the question analysis mean?

Difficulty (or p value) should probably be called easiness as a value of 1 indicates that all students got the question correct and a value of 0 indicated that no participant gave the correct answer. It is calculated by dividing the mean score on a question by the maximum possible. A good rule of thumb is that difficulty should be around the pass mark of your assessment.

Item Discrimination, where available, is calculated by taking the top and bottom 27% of students, based on overall score in the assessment, and subtracting the fraction of bottom group who gave the answer from the fraction of the top group who gave the answer, giving a range of -1 to 1. So:

  • A positive item discrimination means a higher proportion of people in the top group chose the answer than in the bottom group. A high positive value for the correct answer generally means the question is a good discriminator, which is what we want (but is difficult to achieve!). A positive discrimination for an incorrect answer may indicate a problem, but could just mean that it is a good distractor.
  • An item discrimination of 0 means the same number of people from each group gave the answer, so the answer doesn’t discriminate at all. Questions where everyone got the correct answer will always have a discrimination of 0.
  • A negative item discrimination means a higher proportion of people in the bottom group chose the answer. This would be expected for an incorrect answer. A negative discrimination on a correct answer may indicate something is wrong, as the ‘good’ students are not choosing the correct answer.

Item-total correlation discrimination uses a Pearson product moment coefficient to give the correlation between the question score and the assessment score. Higher positive correlation values indicate that participants who obtain high question scores also obtain high assessment scores and that participants who obtain low question scores also obtain low assessment scores. This is what we want. Values below one indicate potentially worrying low scoring participants doing well on this question and/or high-scoring students doing badly. So, low values for questions here could indicate unhelpful questions that are worth looking at in more detail.

WebLearn: How can I see what students are going to see?

  1. Use the Preview Site button in the top right-hand corner to switch to see how an ‘access’ user would see your site. Click Exit preview to return to normal. However, this will not let you check things such as whether users in a particular group can see something.
  2. For a more reliable test of your site, particularly if it concerns something that is sensitive in some way, you are far better off adding yourself as an external user to your site and any necessary group(s) and then logging in as that user.
    1. In Site Info, select Add Participants
    2. In the Email Address of Non-Oxford Participant box, paste in a personal email address that you can access the click Continue.
    3. From the Roles list, choose access then Continue.
    4. Choose whether an email is sent to that account which contains a link to the site. If the account hasn’t already been created for this address before, an automatic email will be sent linking to a page where the user can choose a password. Click Continue.
    5. Check the details on the confirmation page then click Finish.
    6. Now check your email account for an email from weblearn-noreply@it.ox.ac.uk – check your ‘junk’ folder if it doesn’t appear almost immediately.
    7. In the email, click on the link which begins:
      Accept this invitation https://weblearn.ox.ac.uk/accountvalidator...
      and complete the form, choosing a password you can remember.
    8. If you have another browser on your machine (other than the one you have been using so far), open this up now and visit the page you are working on. If not, click the Logout button (or close your browser) and reopen the page.
    9. Now click the Other Users Login button and login using your personal email address and the password you have just set – you are now looking at the site as an access user.

General: Is educational technology really worth it? Is there any evidence of its usefulness?

The situation IN MSD at OXFORD

In Oxford’s teaching environment, where face to face and small group teaching are the norm, the main roles for educational technology are probably:

GENERAL EDUCATIONAL TECHNOLOGY

However, there is a considerable body of research out there on the efficacy of educational technology.

In a meta-analysis of 25 meta-analyses with minimal overlap in primary literature, encompassing 1,055 primary studies, Tamin et at. (2011) found that educational technology led to a mean 12% higher in attainment vs controls but that:

“effect sizes pertaining to computer technology used as “support for cognition” were significantly greater than those related to computer use for “presentation of content.” Taken together with the current study, there is the suggestion that one of technology’s main strengths may lie in supporting students’ efforts to achieve rather than acting as a tool for delivering content.”

Higgins et al. (2012) carried out a similar meta-analysis of 48 meta-analyses. Although it was based on research in schools, its principles are equally applicable in HE. They were more cautious about the general impact of educational technology:

“the correlational and experimental evidence does not offer a convincing case for the general impact of digital technology on learning outcomes”

But did find evidence for the benefits of certain approaches to the use of educational technology:

  • Collaborative use of technology (in pairs or small groups) is usually more effective than individual use, though some pupils, especially younger children, may need guidance in how to collaborate effectively and responsibly.
  • Technology can be as powerful as a short but focused intervention to improve learning, particularly when there is regular and frequent use (about three times a week) over the course of about a term (5 - 10 weeks). Sustained use over a longer period is usually less effective at providing this kind of boost to attainment.
  • Remedial and tutorial use of technology can be particularly practical for lower attaining pupils, those with special educational needs or those from disadvantaged backgrounds in providing intensive support to enable them to catch up with their peers.
  • In researched interventions, technology is best used as a supplement to normal teaching rather than as a replacement for it. This suggests some caution in the way in which technology is adopted or embedded in schools.

Higgins, S., Xiao, Z., and Katsipataki, M., 2012. The Impact of Digital Technology on Learning: A Summary for the Education Endowment Foundation. University of Durham/Education Endowment Foundation. 52pp.

Tamim, R. M., Bernard, R. M., Borokhovsi, E., Abrami, P. C., & Schmid, R. F., 2011. What Forty Years of Research Says About the Impact of Technology on Learning: A Second-Order Meta-Analysis and Validation Study. Review of Educational Research, 81(1), 4–28.

To support these more general studies, several studies have looked at the impact of individual technologies.

Audience response

Kay, R.H and LeSage, A., 2009. Examining the benefits and challenges of using audience response systems: A review of the literature. Computers and Education, 53(3): 819-827

Preszler, R.W., Dawe, A., Shuster, C.B. and Schuster, M., 2007. Assessment of the effects of student response systems on student learning and attitutudes over a broad range of biology courses. CBE-Life Sciences Education, 6(1): 29-41

Peer ASSESSMENT

Although a review of secondary school studies, and also including self-assessment alongside peer assessment, in their systematic review Sebba et al., 2008, reported that most studies reported some positive outcomes including: pupil attainment (9 out of 15 studies); pupil self esteem (7 out of 9 studies) and increased engagement with learning (17 out of 20 studies).

A more pertinent, but much smaller, study of 71 medical students by Eldredge et al., 2013 found that students who had been trained in peer assessment had higher mean scores (45/7 out of 49) on a formative test than did students in the control group (43.5 out of 49) with a respectable p-value of 0.06.

Sebba, J., Deakin Crick, R., Yu, G., Lawson, H and Harlen, W., 2008.Systematic review of research evidence of the impact on students in secondary schools of self and peer assessment - Report. Social Science Research Unit, Institute of Education, University of London. 29pp.

Eldredge, J.D., Bear, D.G., Wayne, S.J. and Perea, P.P., 2013. Student peer assessment in evidence-based medicine (EBM) searching skills training: an experiment. J Med Libr Assoc., 101(4): 244–251