This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Serious Games, is properly cited. The complete bibliographic information, a link to the original publication on http://games.jmir.org, as well as this copyright and license information must be included.
The gold standard for evaluating medical students’ knowledge is by multiple choice question (MCQs) tests: an objective and effective means of restituting book-based knowledge. However, concerns have been raised regarding their effectiveness to evaluate global medical skills. Furthermore, MCQs of unequal difficulty can generate frustration and may also lead to a sizable proportion of close results with low score variability. Serious games (SG) have recently been introduced to better evaluate students’ medical skills.
The study aimed to compare MCQs with SG for medical student evaluation.
We designed a cross-over randomized study including volunteer medical students from two medical schools in Paris (France) from January to September 2016. The students were randomized into two groups and evaluated either by the SG first and then the MCQs, or vice-versa, for a cardiology clinical case. The primary endpoint was score variability evaluated by variance comparison. Secondary endpoints were differences in and correlation between the MCQ and SG results, and student satisfaction.
A total of 68 medical students were included. The score variability was significantly higher in the SG group (σ2 =265.4) than the MCQs group (σ2=140.2;
Our study suggests that SGs are more effective in terms of score variability than MCQs. In addition, they are associated with a higher student satisfaction rate. SGs could represent a new evaluation modality for medical students.
Student evaluation is one of the most important components of a medical educational program and is used for training and for validating degrees and career options. If handled well, it can improve student motivation for learning and provide educators useful feedback. Medical education cannot be limited to book-based knowledge which is defined as the ability to provide an answer from medical literature [
From January to September 2016, we included all volunteer medical students with previous cardiology validation in two medical schools (University Paris Descartes, Paris, France and University Denis Diderot, Paris, France). Students were randomized in a cross-over design between two groups to avoid order bias. Group 1 started with evaluation by SG and finished with evaluation by MCQs and group 2 performed alternatively. The tests were performed in the examination centers of both medical schools. Both tests lasted 30 minutes and the tests were performed consecutively. The study was approved by the educational committee of both institutions. All students gave their informed consent before inclusion.
We used a clinical case from an SG (Medusims, Paris, France and iLUMENS, Medical Simulation Department, Université Sorbonne Paris Cité, Paris, France). The SG focuses on the management of atrial fibrillation. It represents a cardiologist and a patient within a free tridimensional (3D) environment within a medical office, and is available on computers and tablets (
Serious game illustration (in French).
We built an online MCQ test of 15 questions based on the SG clinical case with the same clinical and electrographic presentation. Each MCQ presented five possible answers. The student scored full points if all the selected answers were correct, 50% if one answer was incorrect, and 20% if two answers were incorrect. No points were awarded if three or more answers were incorrect. The correction was aligned to the SG correction giving a final score out of 100 points. A translated version of the MCQ test is available in the
Questionnaires to record student characteristics and satisfaction were designed by a psychologist from the Medical Simulation Department of University Paris Descartes (iLUMENS, Paris, France). The student satisfaction questionnaire was filled in immediately after each evaluation using website. The student characteristics questionnaire was filled in online at the end of the study protocol to assess the medical degree and whether the student played video games regularly at the time of the study (gamers) or not (non-gamers;
Flowchart.
The primary objective of the study was to compare the students’ scores at the MCQs and SG tests. The related primary endpoint was therefore the score variability calculated as a variance for each test. The secondary endpoints were student satisfaction with semiquantitative questions expressed from 0 (no, not at all) to 5 (yes, entirely) and the correlation between the test results. Subgroup analysis was performed for SG results between gamers and non-gamers.
Summary descriptive statistics are reported as mean and standard deviation, median (inter quartile range), or counts (%), as appropriate. We used the
A total of 68 medical students were included (34 in each group), of which 29 were male (43%) and the mean age was 23(SD 1) years. Students were in their 5th [
Student characteristics.
Student description |
Overall | Group 1 |
Group 2 |
Comparison between groups 1 and 2 ( |
Sex (male), n (%) | 29 (43) | 17 (50) | 12 (35) | .22 |
Age in years, mean (SD) | 23 (1) | 23 (1) | 23 (1) | .26 |
Year of medical school, mean (SD) | 4.7 (1.0) | 4.7 (0.8) | 5.1 (0.9) | .08 |
Cardiology internship within the past 12 months, n (%) | 34 (50) | 16 (47) | 18 (53) | .74 |
Owns a cell phone with Internet connection and social network account, n (%) | 67 (99) | 33 (97) | 34 (100) | >.99 |
Owns a tablet, n (%) | 31 (46) | 13 (39) | 18 (54) | .20 |
Owns a computer with Internet connection possession, n (%) | 68 (100) | 34 (100) | 34 (100) | >.99 |
Owns a video game console, n (%) | 21 (31) | 14 (42) | 7 (20) | .07 |
Past video game experience, n (%) | 60 (88) | 28 (83) | 32 (94) | .26 |
Age in years at first video game experience, mean (SD) | 9 (3) | 9 (3) | 9 (3) | .51 |
Currently playing video games, n (%) | 22 (32) | 14 (40) | 8 (26) | .31 |
Hours of video game per week, mean (SD) | 1.6 (3.0) | 1.9 (3.7) | 1.3 (2.1) | .65 |
The score variability expressed as variance of the students’ results was significantly higher in the SG group (σ2=265.4) compared with MCQs group (σ2=140.2;
Result’s histogram.
Individual test results in the left panel (A) and correlation coefficient in the right panel (B).
Satisfaction questionnaire: results are expressed as mean (SD) of numeric ordinal variable from 1 (no, not at all) to 5 (yes, entirely).
Questions | Serious game | Multiple choice questions | |
Did you encounter difficulties to answer the questions? | 2.18 (1.14) | 2.21 (1.14) | .89 |
Were you able to concentrate while answering the questions? | 3.93 (0.99) | 3.71 (1.06) | .15 |
Do you think that this test is close to clinical reality? | 4.21 (0.75) | 2.68 (0.88) | <.001 |
Did you find this test stressful? | 2.51 (1.05) | 2.30 (1.17) | .24 |
Did you understand the goal of the test? | 4.24 (0.75) | 3.97 (0.94) | .10 |
Do you consider that this kind of test represents a proper evaluation? | 3.91 (0.87) | 3.04 (1.02) | <.001 |
Are you satisfied with your test performance? | 3.05 (1.09) | 3.22 (0.98) | .41 |
Did you think that your knowledge progressed after this test? | 3.56 (1.09) | 2.42 (0.99) | <.001 |
Are you satisfied with this type of evaluation? | 3.88 (1.42) | 2.98 (1.53) | <.001 |
Assessment of serious games as a tool to learn medicine. Results are expressed as mean (SD) of numeric ordinal variable from 1 (no, not at all) to 5 (yes, entirely).
Assessment of serious games as a tool to learn medicine | Serious game |
Educational quality | 4.86 (0.35) |
Feeling of connection or attachment to the serious game | 3.60 (1.19) |
Possibility of playing with other students | 3.26 (1.18) |
Possibility of comparing results with other students | 3.44 (1.33) |
Fun | 3.37 (1.16) |
Original, innovative or new | 3.90 (0.98) |
Possibility to adapt level of difficulty | 4.36 (0.68) |
Availability on smartphone | 4.00 (1.07) |
The satisfaction questionnaires showed a significantly higher overall self-reported satisfaction for the SG compared with the MCQ test. Students reported that the SG was closer to clinical practice, represented a proper evaluation and that they felt to have learned more with the SG than with MCQs, thus representing a better evaluation modality (
The questionnaire was also designed to evaluate whether students thought that SGs could be an interesting tool to learn and evaluate medical skills. Most of the students thought that it could be. The highest ranking points (>4) were educational quality, the possibility of adapting the level of difficulty of an SG and the availability on smart phone (
This study is the first to compare an SG to MCQs in terms of score variability for medical students. This study demonstrates that the SG was associated with a higher score variability and lower mean score compared with MCQs. Moreover, the SG was associated with significantly higher student satisfaction compared with MCQs. Most medical student evaluation to date is based on MCQ tests which are performed on a large student population. Student grading might therefore be difficult with a sizable proportion of students scoring the same and limited score variability between them. We believe that tests evaluating a large population of medical students should include overall results variability and be of high student satisfaction. For these reasons, we sought to evaluate medical students with a simulation based on an SG compared to MCQs. MCQs evaluate medical knowledge by the means of closed questions, but medical skills and competence are better assessed by on site (bedside) evaluation or simulation [
Medical education encompasses both medical knowledge and reasoning skills. Although it is simple to develop MCQs to test medical knowledge, it becomes much more challenging to evaluate reasoning skills and global medical skills with MCQs. Interestingly, our study did not find any correlation between the two sets of test results, suggesting that success in MCQs does not predict success in SGs and vice versa. This finding might suggest that good results in an SG are different from pure medical knowledge evaluation and that medical skills might increase result variability since the medical knowledge tested were similar in both tests. If SGs are considered to be closer to medical practice, this finding questions the effectiveness of MCQs in evaluating medical students [
We acknowledge several limitations in our study. Our study compared two different test modalities evaluating a relatively small number of medical students in managing a cardiology clinical case. Therefore, further studies are needed to confirm our findings in larger student populations and in other medical fields. Although we found an order bias in our study—the second test was associated with better results because of similar questions, retention of the students’ answers, and indirect access to the corrections—we believe that randomization in two similar groups allowed us to draw reliable conclusions. As specific questionnaires were designed for this study, no pretest was conducted. Nevertheless, we believe that the questionnaires are valid, since each student acted as is his own control in this study, interpreting the questions in the same way when evaluating two different test modalities. Finally, our sample consisted of volunteer students and we cannot rule out the fact that they might have a particular interest in SGs. This might also limit the generalization of our conclusions to the whole population of medical students.
SGs potentially represent a new evaluation modality for medical students. Our study suggests that they are more effective in grading medical students with a higher variability of performance. In addition, SGs seem to be associated with higher student satisfaction compared to MCQs.
Multiple choice questions test translated in English.
Multiple choice questions
Serious games
Standard Deviation
The authors thank Dr G Kanellopoulos and G Durand-Viel for their help and support with this project.
None declared.