This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Serious Games, is properly cited. The complete bibliographic information, a link to the original publication on https://games.jmir.org, as well as this copyright and license information must be included.
Using traditional simulators (eg, cadavers, animals, or actors) to upskill health workers is becoming less common because of ethical issues, commitment to patient safety, and cost and resource restrictions. Virtual reality (VR) and augmented reality (AR) may help to overcome these barriers. However, their effectiveness is often contested and poorly understood and warrants further investigation.
The aim of this review is to develop, test, and refine an evidence-informed program theory on how, for whom, and to what extent training using AR or VR
We conducted a realist synthesis using the following 3-step process: theory elicitation, theory testing, and theory refinement. We first searched 7 databases and 11 practitioner journals for literature on AR or VR used to train health care staff. In total, 80 papers were identified, and information regarding context-mechanism-outcome (CMO) was extracted. We conducted a narrative synthesis to form an initial program theory comprising of CMO configurations. To refine and test this theory, we identified empirical studies through a second search of the same databases used in the first search. We used the Mixed Methods Appraisal Tool to assess the quality of the studies and to determine our confidence in each CMO configuration.
Of the 41 CMO configurations identified, we had moderate to high confidence in 9 (22%) based on 46 empirical studies reporting on VR, AR, or mixed simulation training programs. These stated that realistic (high-fidelity) simulations trigger perceptions of realism, easier visualization of patient anatomy, and an interactive experience, which result in increased learner satisfaction and more effective learning. Immersive VR or AR engages learners in
Technical and nontechnical skills training programs using AR or VR for health care staff may trigger perceptions of realism and deep immersion and enable easier visualization, interactivity, enhanced skills, and repeated practice in a safe environment. This may improve skills and increase learning, knowledge, and learner satisfaction. The future testing of these mechanisms using hypothesis-driven approaches is required. Research is also required to explore implementation considerations.
As in most businesses, upskilling health care workers is vital to improving and advancing existing skills and practices and closing gaps in knowledge so that employees may continue practicing with ease [
Traditional health care training consists of role modeling, shadowing, and the
Virtual reality (VR) and augmented reality (AR) training programs may help to overcome these barriers because they can be continuously available and used independently by learners, and they do not increase costs with use [
VR is a computer-generated simulated environment in which users are immersed [
The effectiveness and success of VR and AR training programs is often nonlinear and complicated. This is because fidelity and perceptions of
Previous literature reviews have focused on the novelty, application, and effectiveness of VR and AR training programs for health professionals, including for surgical training [
This realist review explores why there is variation in the effectiveness of VR and AR training programs and what factors influence their implementation and maintenance. Realist reviews can help to understand how, for whom, and in which contexts and conditions interventions or programs (such as the use of AR or VR for training) work. They offer a theory-driven approach to producing causal explanations of how different mechanisms of action may be triggered, which then lead to intended and unintended outcomes [
Ultimately, a program theory developed in alignment with realist methods will result in a collection of context-mechanism-outcome (CMO) configurations that consider context, mechanisms, and outcomes. The program theory explains how an intervention may contribute to a chain of events (ie, mechanisms) that result in expected and desired or unexpected outcomes. The realist approach also considers how interventions may work differently within different contexts or conditions. CMO configurations are presented as follows:
Underlying the realist methodology is the expectation that the VR or AR intervention does not produce outcomes by itself but is instead influenced by underlying social entities, processes, or social structures (mechanisms) [
The aim of this realist review is to develop, test, and refine an evidence-informed program theory on how, for whom, and to what extent training using AR or VR
The review addressed the following questions:
How, for whom, and to what extent does training using AR or VR for upskilling health care workers work?
What facilitates or constrains the implementation (and maintenance) of training using AR or VR in health and care settings?
This realist review adheres to the processes explained in the RAMESES (Realist and Meta-narrative Evidence Syntheses: Evolving Standards) training documents [
The purpose of the first step was to elicit an initial program theory from candidate theories found within existing literature, which could then be refined and tested. Academic and practitioner theories were located by searching a range of databases and practitioner journals for literature on using AR or VR to upskill health professionals. The databases, search terms, and eligibility criteria are presented in
Databases
MEDLINE
Scopus
CINAHL
Embase
Education Resource Information Centre
PsycINFO
Web of Science
Journals
Academic Medicine
MedEdPORTAL
Medical Teacher
International Journal of Medical Education
Journal of Continuing Education in the Health Professions
GMS Journal for Medical Education
Focus on Health Professional Education
Medical Education
Journal of Nursing Education and Practice
Nurse Education Today
International Journal of Nursing Studies
Keywords with Boolean operators
Search example (Scopus)
Inclusion criteria
Using simulation technologies (any type of immersion)
Health workers, care workers, and postgraduate or registered learners
Any health, care, or university-based setting
Covers detail on what contexts, how, and for whom they
Published in English
Exclusion criteria
Simulation technologies that do not use augmentation or virtual reality (eg, web-based e-learning interventions or manikins)
Undergraduate students
Published in languages other than English
Exceptions
Work including undergraduate learners or other simulation technologies can be included if the data for postgraduate or registered learners and augmented reality or virtual reality can be separated
In alignment with previous realist reviews (eg, the study by Wong et al [
Data were extracted by 2 authors (NG and DD) into a coding sheet on Excel (Microsoft Corporation). This included information on the study (eg, author, date, title, research design, and sample), the intervention, contexts, mechanisms, outcomes, learning or technology adoption theories mentioned, and barriers and facilitators to implementation (or maintenance; see Table S1 in
A narrative synthesis was conducted to determine overlapping CMO configurations and the most common barriers and facilitators to implementation and maintenance. We aggregated authors’ hypothesized mechanisms, regardless of whether they had been tested, to identify the common ways in which VR or AR affect and lead to the outcomes. The learning and technology adoption theories were also summarized and used to discuss and make meaning of the CMO configurations (in step 2).
Finally, the research team discussed the initial program theory and selected a number of CMO configurations to test, focusing on those that were expected to be most feasible, measurable, and likely to apply or transfer to future AR and VR interventions aimed at upskilling health care workers.
The purpose of step 2 was to test the initial program theory, using existing evidence. Empirical literature was identified in a 2-step process. First, empirical studies were identified from the first search by removing nonempirical and non–full-length papers. Second, the same search as in step 1 was repeated but with a time frame of 3-6 months to identify recently published work that may have been missed. This search was conducted on March 8, 2021. We used the same screening process as in step 1 to assess the relevance of newly identified articles. The first author (NG) screened the papers to identify a shortlist of possibly eligible papers. The second author (DD) then independently screened a random selection of these papers (abstracts and titles: 2/9, 20%; full texts: 1/2, 50%), with interrater agreement rates of 100%.
The same items as in step 1 were extracted, along with specific evidence for the mechanisms (where applicable) and the expected outcomes identified in the initial program theory. Studies that did not provide evidence relating to the outcomes were excluded. Studies were assessed for quality using the Mixed Methods Appraisal Tool (MMAT; version 2018) [
The MMAT consists of 2 screening questions and 5 study design–specific criteria that could be scored 1 (yes) or 0 (no) [
The quality of all the studies was assessed by 1 author (NG), whereas a second author (DD) assessed the quality of 22% (10/46) of the studies. We calculated the Cohen κ using SPSS software (version 23; IBM Corp) to determine the interrater reliability between the 2 authors.
To refine the theory, evidential fragments (parts of studies, rather than entire studies, that provided evidence) from the second search were compared and matched to the initial program theory. We made revisions by identifying differences and presented the final theory as a narrative and diagrammatic summary. The most commonly identified learning or technology adoption theories were used to discuss the program theory.
We then assessed our confidence in each CMO configuration as high, moderate, low, or very low according to the criteria presented in
Criteria used to determine confidence in each context-mechanism-outcome configuration.
Confidence | Number of supporting studies | Contesting studies (if applicable), % | MMATa average score, % |
High | ≥8 | 0-20 | 76-100 |
Moderate | 5-7 | 21-29 | 51-75 |
Low | 4 | 30-74 | 26-50 |
Very low | ≤3 | 75-100 | 0-25 |
aMMAT: Mixed Methods Appraisal Tool.
The extended PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart [
Extended PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart depicting the literature search and screening process.
The initial search identified 1042 papers. After deduplication and abstract and title screening, 186 full texts, including 8 studies snowballed from the literature, were reviewed, of which we excluded 106 (56.9%), leaving 80 (43.01%) papers for inclusion for eliciting the initial theory. The most common reasons for exclusion were not including health care workers (39/106, 36.8%), not focusing on education and training (29/106, 27.4%), or relevant information not being separable (17/106, 16%).
The second search identified 46 recently published empirical studies. After deduplication and abstract and title screening, 7 full texts were screened, of which 5 (71%) were excluded because they did not cover AR or VR (3/5, 60%), did not include health care workers (1/5, 20%), or did not focus on education or training (1/5, 20%). Of the 7 studies, the 2 (29%) that remained were combined with the empirical literature from the first search (n=54). Of these 56 studies, 46 (82%) were included in testing and refining the theory, after 10 (18%) were excluded for not providing evidence on the CMO configurations.
The 80 papers identified in the first search consisted of empirical research (55/80, 69%), literature reviews (22/80, 28%), case reports (2/80, 3%), and cost-benefit analyses (1/80, 1%). Of these, 83% (66/80) focused on VR, 11% (9/80) on AR, and 6% (5/80) focused on both.
Of the 46 empirical studies included in the second stage of the review, almost half (22/46, 48%) were quantitative descriptive studies [
A range of health care professionals participated, including surgeons, nurses, physicians, pharmacists, technicians, social workers, radiologists, community health workers, ophthalmologists, dentists, and respiratory therapists. Clinical experience ranged from <2 months [
In the initial program theory, a total of 12 contexts were identified.
The context-mechanism-outcome configurations identified in our initial program theory.
Context | Mechanisms | Outcomesa |
1. Realistic (high-fidelity) simulations |
Perceptions of realistic haptics and imagery Triggers interactive learning Lack of perceived realism in haptics or tactile sensation |
Enhanced skills and proficiency Learner satisfaction with realism More effective learning Preference for non-VRb learning, for example, laboratory dissection or physical reality |
2. Artificial intelligence–enabled VRc |
Provides feedback and highlights deficiencies |
—d |
3. VR or ARe that immerses learners |
Engages or exposes learners in deep immersion Provides a safe environment free from patient harm Cybersickness |
Higher engagement and participation in training Improved learning, knowledge, and comfort with knowledge Improved skill performance |
4. Comfortable devicesc |
Cybersickness |
Poor learning experience |
5. VR or AR that delivers standardized teaching |
Provides feedback to leaners Enables repeated practice |
Improves skill or performance Leads to better patient outcomes in the future |
6. Visualization through VR or AR |
Interactive experience Easier and more detailed visualization of patient anatomy Perceived realism of the imagery |
Learner satisfaction with tool and realism Increased understanding or learning of content Improved performance or skill |
7. Accounts for physical and mental workloadc |
Psychological improvements (reduced stress and improved self-confidence) |
Decreased mental demand, effort, and physical workload scores |
8. Team training delivered by AR or VRc |
Interaction between learners and environment, as well as real-time collaboration and communication |
Improves teamwork Results in learner satisfaction |
9. Knowledge or skill transfer |
Enhances skills Practice in safe environment (with no risk to patients) Deliberate practice |
Knowledge transfer to clinical practice Skills transfer to cadaver, box trainer, and surgery and procedure Better patient care in the future |
10. Used with a teacherc | — |
Improved instruction |
11. Embedded in curriculumc | — | — |
12. Limited training opportunities |
Provides feedback on performance, skill or technique Repeated practice Access to experiential learning opportunities Safe and stress-free learning environment |
Skill improvement, technical proficiency, and reduced incidence of complications or errors Learner satisfaction Improvements for learners with less experience |
13. Novices |
Feedback and objective measurement of skills or knowledge Independent or self-directed training Safe, static, and risk-free environment without endangering patients Repeated practice Exposure to experience |
Technical proficiency and skill acquisition Improved performance (including operative performance) Learner satisfaction: VR was preferred Novices (less experienced people) improved most |
aContext + Mechanisms = Outcomes.
bVR: virtual reality.
cThe context-mechanism-outcome configurations for which we had low confidence that there would be evidence available to test them.
dNot available.
eAR: augmented reality.
The interventions presented in the empirical literature aimed at improving technical, behavioral, or nontechnical skills. The technical skills included laparoscopic procedural skills and camera navigation [
Of the 46 studies, 22 (48%) used nonimmersive VR simulators, of which computer-based programs and the LapSim, AnthroSim, and MIST-VR simulators were the most commonly used [
Of the 46 studies, 24 (52%) used haptic technology for force feedback or tactile sensation [
There was substantial agreement for the MMAT appraisals between the 2 raters (NG and DD; 90%; κ=0.778, 95% CI 0.625-0.931;
Overall, of the 46 studies, 13 (28%) were of high quality and 3 (7%) were of low quality, whereas the remaining 30 (65%) were of moderate quality. Of the 46 studies, 9 (20%) quantitative descriptive [
In all, 6 contexts were identified. We distinguished technology-related conditions (Table S3 in
Diagram of our program theory on AR and VR training for health care workers built from the context-mechanism-outcome configurations in which we had moderate or high confidence. AR: augmented reality; VR: virtual reality.
The first condition relates to when VR (all levels of immersion, with and without haptics), AR, and a combination of VR and AR training programs portray realistic (high-fidelity) simulations or imagery (eg, on patient anatomy). This triggered perceptions of reality, enabled visualization of patient anatomy, and triggered an interactive experience [
Across the mechanisms, 2 expected outcomes included more effective learning (increased understanding and learning of content as well as enhanced skills, proficiency, and performance) and increased learner satisfaction. There was strong supporting evidence for more effective learning when perceptions of realism and easier visualization were triggered. For example, in the study by Balian et al [
Increased learner satisfaction was contested within the evidence. Some studies identified that their haptic tools hindered perceptions of realism [
The second condition relates to when fully immersive VR (with and without haptics) or AR with a manikin immersed learners in the training environment [
Improved learning, knowledge, and comfort with knowledge and skill performance were observed by 22% (10/46) of the studies [
In the training-related context of knowledge and skill transfer, AR, combined AR and VR, and VR (all levels of immersion, with and without haptics) were used. When teaching transferable skills, three mechanisms may be triggered: enhancement of existing skills, practice in a perceived safe environment (away from patient harm, time restraints, and stress), and deliberate practice [
Empirical evidence was found for transferable skills, especially enhancing skills. Enhanced skills through VR or AR training helped to transfer knowledge and skills to clinical settings [
The last training-related context relates to when VR (nonimmersive and fully immersive, with and without haptics) or AR were used to train novices (learners with little or no experience). The programs were expected to trigger various resources and mechanisms, including feedback and objective measurement of skills or knowledge; independent and self-directed learning; a safe, static, and risk-free learning environment; repeated practice; and exposure to experience [
Evidence showed that repeated attempts and practice on VR or AR simulators significantly improved skills such as speed of decision-making [
Information regarding barriers and facilitating factors for implementing and maintaining VR or AR training programs for health care professionals was extracted from the studies included in creating (step 1) and refining (step 2) the program theory.
Some argued that high up-front expenses created barriers to implementation and maintenance, including purchasing simulators and headsets as well as software licenses, technology maintenance, staff training, and programming requirements [
The cost of VR and AR was expected to decrease with commercialization and market competition in this area [
A lack of acceptance (ie, negative attitudes) of VR and AR [
It was expected that a cultural change toward acceptance will occur when VR gains traction [
Developmental and logistical considerations further create barriers because implementing and maintaining VR and AR programs requires imagination, resources, and planning [
The studies highlighted access to training as a facilitator to uptake [
Conversely, some studies reported that learners were not able to complete the training because of scheduling conflicts with patients and time constraints [
The complexity involved in developing a standardized curriculum created barriers to implementation [
According to the studies, leadership and collaboration are crucial to facilitate implementation [
To our knowledge, this is the first realist review to explore AR and VR training programs for health care professionals. It contributes a transferable program theory that may be applicable to diverse health professionals and across AR and VR technologies with varying levels of fidelity and use of haptics or additional tools.
A total of 80 published papers were used to develop an initial program theory, and 46 empirical studies that reported on VR, AR, or mixed simulation training programs for health professionals then helped to refine and test the theory. A total of 41 individual CMO configurations were identified, across 6 contexts and conditions. Of the 41 CMO configurations, we had moderate to high confidence in 9 (22%) and low and very low confidence in 5 (11%) and 27 (59%), respectively. Our low confidence was often due to contesting studies as well as the outcomes (especially those on patient results) not being substantiated with sufficient empirical evidence.
We also identified barriers and facilitators to implementation and maintenance, which must be acknowledged for the CMO configurations to be operationalized. The most common barriers were up-front costs, poor acceptance, negative experiences (ie, cybersickness), logistics, and the complexity involved in developing a curriculum. Decreasing costs due to commercialization and the cost-effectiveness of training, a cultural shift toward acceptance, access to training opportunities, and leadership and collaboration facilitated implementation.
The CMO configurations can be explained by applying learning theories identified within some of the reviewed literature [
Cognitive load theory (CLT) can also help to explain the mechanisms, especially in the context of realistic simulations and visualization. CLT assumes that people have a finite amount of working memory available [
Some of the CLT literature suggests that VR and AR may help to reduce extraneous load (ie, processes not related to learning) by providing cues and feedback in real time [
It was evident that the literature on implementation is premature, with little focus on implementation experiences [
There was a clear absence of AR and VR training programs for allied health staff, care workers, and within care- and community-based settings. There was also less focus on simple behavioral skills such as disposing of hazardous medical waste or practicing hand hygiene, for which AR and VR smartphone apps have already been developed [
As is common in realist reviews [
More work is also needed to increase the confidence in some of the CMO configurations for which we had low or very low confidence and to understand context-dependent implementation outcomes, along with updating the barriers and facilitators to implementation. Cost and acceptance, for example, may not be a barrier in the future, given that commercialization and market demand will reduce up-front costs, whereas increasing use may create a cultural change that favors acceptance.
Unlike some realist reviews [
Limitations included not sense checking our CMO configurations with AR or VR training experts as well as not comprehensively searching for gray literature. This meant that some initial theories might have been missed. In addition, only 20% (9/46) of the included studies were assessed for quality by 2 researchers. As such, interpretation of our quality assessments may be subject to some caution. However, we did not exclude research because of low quality and amalgamated the quality of the studies to determine our confidence in the CMO configurations; therefore, we do not expect this to bias our results. Interrater reliability was also substantial.
This review explored the complex nature of AR and VR training programs for health care staff, highlighting how they may actually work in practice, for whom they are most likely to work, and in which contexts and circumstances or under which conditions they may work. We found evidence for improved skills, learning and knowledge, and learner satisfaction, but there was little evidence on patient results. We had moderate to high confidence that VR and AR training programs trigger perceptions of realism and deep immersion as well as enable easier visualization of patient anatomy, interactivity, enhanced skills, and repeated practice in a safe environment. Future testing of these mechanisms using hypothesis-driven approaches is required. More research is also required to explore implementation and maintenance considerations. Ultimately, our evidence-informed program theory can be used to support the development and implementation of AR and VR training programs for health care providers and as a starting point for further research.
Data extraction items, summary of the empirical articles (n=46), context-mechanism-outcome configurations, and our confidence in the configurations.
augmented reality
cognitive load theory
context-mechanism-outcome
Mixed Methods Appraisal Tool
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
Realist and Meta-narrative Evidence Syntheses: Evolving Standards
virtual reality
This work is funded by the National Institute for Health Research Applied Research Collaboration Greater Manchester. The views expressed in this publication are those of the authors and not necessarily those of the National Institute for Health Research or the Department of Health and Social Care.
NG conceived and designed the review with support from DD, SNvdV, and PW. NG and DD identified and analyzed the literature and conducted the quality assessments. All authors contributed to developing the program theory. NG wrote the first draft of the manuscript. All authors revised and approved the final manuscript.
None declared.