This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Serious Games, is properly cited. The complete bibliographic information, a link to the original publication on https://games.jmir.org, as well as this copyright and license information must be included.
Clinical reasoning (CR) is a fundamental skill for all medical students. In our medical education system, however, there are shortcomings in the conventional methods of teaching CR. New technology is needed to enhance our CR teaching, especially as we are facing an influx of new health trainees. China Medical University (CMU), in response to this need, has developed a computer-based CR training system (CMU-CBCRT).
We aimed to find evidence of construct validity of the CMU-CBCRT.
We recruited 385 students from fifth year undergraduates to postgraduate year (PGY) 3 to complete the test on CMU-CBCRT. The known-groups technique was used to evaluate the construct validity of the CBCRT by comparing the test scores among 4 training levels (fifth year MD, PGY-1, PGY-2, and PGY-3).
We found that test scores increased with years of training. Significant differences were found in the test scores on information collection, diagnosis, and treatment and total scores among different training years of participants. However, significant results were not found for treatment errors.
We provided evidence of construct validity of the CMU-CBCRT, which could determine the CR skills of medical students at varying early stage in their careers.
Each year, several hundred thousand students enter medical school, all of whom need to equip themselves with the necessary health care skills and knowledge [
Traditionally, CR is taught in the classroom (didactic lecture) and by the patient’s side (clinical clerkship) [
In contrast to a paper- or lecture-based curriculum, computer-based CR training allows trainees to interactively take information from patients in a step-by-step process. There is also the possibility of accumulating a large volume of cases through international collaboration.
Currently, computer-based CR training can have different interfaces such as text, graphics, and animation [
Types of computer-based clinical reasoning simulations and comparison.
Media | Advantage | Disadvantage |
Text based | Relatively easy and rapid to develop; less expensive | Low level of fidelity |
Graphic and animation based | Presents rich clinical evidence; moderate cost with enhanced fidelity | Replicates only part of clinical settings; low level of interactivity |
Virtual reality | Combines highly sophisticated, life-like models with computer animations; can provide interactivity and feedback | Challenge to developers; often expensive |
Sponsored by the National Medical Examination Center of China, China Medical University (CMU) started to developed computer-based CR training system in 2001. Educators and researchers at the Institute for International Health Professions Education and Research of CMU began to work with clinicians to develop cases for training CR skills and established the computer-based CR testing (CBCRT) system. Since 2002, CBCRT has been used as one part in the final comprehensive examinations of CMU to test the clinical skills of undergraduate students.
The CBCRT is composed of 5 interactive modules that allow students to interact with simulations to complete tasks: (1) history taking and physical examination, (2) writing orders and obtaining lab and medical imaging results, (3) reviewing obtained results, (4) working out diagnosis and differential diagnosis, and (5) observing the patient’s condition change at different phases and changing locations for managing different therapies. The main features of the CMU-CBCRT virtual patient are displayed in
To test the face validity of the CMU-CBCRT, we called a series of meetings with physicians and surgeons at which we screened and selected key information on each clinical topic for CR training. When a CR case was developed, our clinical team was surveyed to verify their clinical relevance. They then evaluated the interactive interface and rated their level of satisfaction.
To briefly summarize, the CBCRT provided clinical features of patients including history and physical and laboratory findings and then requires students to make a diagnosis as well as a treatment plan for the simulated patients. The CBCRT has also been welcomed by the examinees based on their positive feedback toward the system. Of 300 students surveyed using the questionnaire, 99.4% enjoyed participating in the CBCRT examination; 95.9% believed that the system accurately represents the real clinical environment; 72.5% agreed that the CBCRT is a better tool for teaching their clinical abilities. We can thus believe that the face validity of the CBCRT is satisfactory.
However, face validity is the weakest form of validity evidence. It can be only used at the primary stage of designing an assessment method [
This study investigates construct validity of the CMU-CBCRT in medical trainees over 5 medical school. We hypothesize that the CMU-CBCRT will be able to determine CR level among different years of health trainees; specifically, senior trainees will achieve higher CBCRT scores than juniors.
Screenshot displaying the main features of the China Medical University–computer-based clinical reasoning testing.
Methods used for the project were reviewed and approved by the ethical review boards of the CMU (ERB 2016-027) and the 5 medical schools. Informed consent was obtained from each participant before they started the test with the CMU-CBCRT.
From November 24 to December 8, 2016, we implemented the CMU-CBCRT system in 5 collaborative medical schools: China Medical University, Fudan University School of Medicine, Sun Yat-sen University School of Medicine, Xuzhou Medical University, and Binzhou Medical College.
In China, medical students start their clerkships on the fifth year of medical training. The clinical training will continue for their 3 postgraduate years (PGYs). PGY 1 to PGY 3 is similar to the residency in North American medical school. We recruited students from fifth year undergraduates to PGY 3. The actual number of participants from each of 5 medical schools and their training years are shown as
Students from the 5 medical schools and their training years.
Schools | Fifth year medical student | PGY-1a | PGY-2 | PGY-3 | Subtotal |
China Medical University | 40 | 18 | 7 | 2 | 67 |
Fudan University School of Medicine | 17 | 41 | 16 | 18 | 92 |
Sun Yat-sen University School of Medicine | 12 | 28 | 20 | 12 | 72 |
Xuzhou Medical University | 20 | 19 | 20 | 21 | 80 |
Binzhou Medical College | 20 | 19 | 19 | 16 | 74 |
Total | 109 | 125 | 82 | 69 | 385 |
aPGY: postgraduate year.
Before testing, each student was asked to watch a 5-minute presentation and get familiar with the testing interface. Demographics and level of medical training were surveyed and recorded. The computer recorded participants’ typing and computer activity, including the typing and performance times. The interaction between a learner and how data are captured is displayed in
Outline of interaction flow through China Medical University–computer-based clinical reasoning testing.
The known-groups technique was used to evaluate the construct validity of the CBCRT by comparing the scores among the fifth year MD, PGY-1, PGY-2, and PGY-3 participants. Testing scores, including total and subtotal, were compared over the 4 training groups using a 1-way analysis of variance (ANOVA). Results were reported as mean and standard deviation.
ANOVA revealed a group difference in total score among training levels (
Students from the 5 medical schools and their training years.
Scores | Fifth year medical student, mean (SD) | PGY-1a, mean (SD) | PGY-2, mean (SD) | PGY-3, mean (SD) | |
Information collection | 43.42 (12.63) | 46.70 (11.48) | 49.73 (9.12) | 51.38 (9.08) | <.001 |
Diagnosis | 10.90 (4.74) | 11.24 (4.97) | 12.76 (3.90) | 11.25 (4.22) | .034 |
Treatment | 4.79 (3.81) | 4.61 (3.36) | 6.19 (3.73) | 5.45 (3.72) | .013 |
Treatment error | –0.06 (023) | –0.04 (0.20) | 0.00 (0.00) | –0.01 (0.12) | .13 |
Total | 59.01 (16.68) | 62.50 (14.45) | 68.68 (11.76) | 68.06 (12.67) | .001 |
aPGY: postgraduate year.
Total score of students over training years.
ANOVA revealed group differences by training level between information collection (
Subscore for information collection of students over training years.
Subscore for diagnosis of students over training years.
Subscore for treatment of students over training years.
Before applying an assessment tool for use with medical students, we must obtain evidence for the instrument’s reliability and validity [
Looking specifically into the 4 categories of skills that we tested, we found that the most significant differences were revealed in the information collection, diagnosis, and treatment scores among junior and senior medical students. This was as predicted. With years of training, their experience and ability to clinically reason are improving, and as a result, they performed better on the information collection, diagnosis, and treatment, as well as the total CBCRT score. This further suggests that the CMU-CBCRT can determine the CR skills of students at varying levels.
We also carefully studied and analyzed why there were no significant differences in treatment error scores among the 4 training groups. For a simulated case of myocardial infarction, we can observe from the test result the challenge faced by participants who have never experienced this form of examination before. When the passing score was set at 60%, the average score in this case (59.01 [SD 16.68]) did not pass. The choice of wrong treatment is a negative item in the scoring system, so the item writing expert is very cautious in formulating the scoring standard. Only behavior that caused extreme consequences resulted in points being deducted, and the weight was set at a very low level (ie, –1%). In this test, we observed that treatment error behavior happened more with junior students than senior students, although without statistical significance.
In the absence of an available gold standard for measuring CR, evidence for construct validity is sought after in this area of research. This is an ongoing process, in which the skill measured by the assessment tool is linked to some other attribute by a hypothesis or construct. With the development of validity theory, the validity concept has a new connotation and forms a method based on multilevel evidence [
With the positive evidence presented, we should still be aware that validity verification is a dynamic process [
However, there were some limitations in our study to its generalizability. First, the respondents of the research were from only 5 medical institutions in China. Second, the findings of our study were limited by the representativeness and scale of the study population.
We provided evidence of construct validity of the CMU-CBCRT. It is able to determine CR skills over different levels of medical education, especially in the early stage of the students’ medical careers.
Scoring sheet and scoring rubrics.
China Medical University–computer-based clinical reasoning testing data.
analysis of variance
computer-based clinical reasoning testing
China Medical University
clinical reasoning
problem-based learning
postgraduate year
Authors thank the National Medical Examination Center of China, which sponsored this research through the funded project Sustainable Development of Computer-based Case Simulations Examination System. We thank educators and collaborators from China Medical University, Fudan University School of Medicine, Sun Yat-sen University School of Medicine, Xuzhou Medical University, and Binzhou Medical College who provided of testing places and expertise that greatly assisted the research. Special thanks to Ms Kritika Taparia for proofreading the manuscript and provide valuable comments.
TZ, XG, and BQ collected the data. TZ, BZ, and BS analyzed the data and wrote the first draft of the manuscript.
None declared.