Published on 16.07.19 in Vol 7, No 3 (2019): Jul-Sep
Preprints (earlier versions) of this paper are available at http://preprints.jmir.org/preprint/14620, first published May 06, 2019.
Diagnostic Markers of User Experience, Play, and Learning for Digital Serious Games: A Conceptual Framework Study
Background: Serious games for medical education have seen a resurgence in recent years, partly due to the growth of the video game industry and the ability of such games to support learning achievements. However, there is little consensus on what the serious and game components in a serious game are composed of. As a result, electronic learning (e-learning) and medical simulation modules are sometimes mislabeled as serious games. We hypothesize that one of the main reasons is the difficulty for a medical educator to systematically and accurately evaluate key aspects of serious games.
Objective: This study aimed to identify markers that can evaluate serious games and distinguish between serious games, entertainment games, and e-learning.
Methods: Jabareen’s eight-phase framework-building procedure was used to identify the core markers of a serious game. The procedure was modified slightly to elicit “diagnostic criteria” as opposed to its original purpose of a conceptual framework. Following the identification of purported markers, the newly developed markers were tested on a series of freely available health care serious games—Dr. Game Surgeon Trouble, Staying Alive, and Touch Surgery—and the results were compared to the published test validity for each game.
Results: Diagnostic criteria for serious games were created, comprising the clusters of User Experience (UX), Play, and Learning. Each cluster was formed from six base markers, a minimum of four of which were required for a cluster to be considered present. These criteria were tested on the three games, and Dr. Game Surgeon Trouble and Staying Alive fit the criteria to be considered a serious game. Touch Surgery did not meet the criteria, but fit the definition of an e-learning module.
Conclusions: The diagnostic criteria appear to accurately distinguish between serious games and mediums commonly misidentified as serious games, such as e-learning modules. However, the diagnostic criteria do not determine if a serious game will be efficacious; they only determine if it is a serious game. Future research should include a much larger sample of games designed specifically for health care purposes.
JMIR Serious Games 2019;7(3):e14620
The past decade has seen a surge of digital games seeking to educate, train, or otherwise inform their players in a broad spectrum of topics . These serious games are often defined as games developed for nonrecreational purposes and are frequently deployed as adjuncts to education and therapy [ , ]. Documented uses of serious games include teaching mathematics to young children, motivating the elderly to exercise, and increasing surgical competencies of medical students and residents [ - ]. Although such games are unable to replace full-time educators, they are effective at reinforcing concepts learned through conventional teaching methods [ ].
This reinforcement is particularly notable in fields characterized by intensive, high-volume learning, such as medical education. Perhaps, due to the ubiquity of game playing in the present day, where video games generate astronomical levels of revenue, it is no surprise that educators are seeking to bend the vast potential of gaming toward serious causes like learning [, , ]. This shift in paradigm comes at the heels of the once ironclad assumptions linking the notions of play with time-wasting and frivolousness [ , ]. As more serious games begin featuring in institutions across the globe, they receive ever-increasing attention from stakeholders seeking to leverage the benefits of serious games—be it for profit or learning.
However, despite the growing interest in serious games and the vast array of such games being pushed to market, there remains little consensus on core components that afford a serious game its seriousness while also retaining the fun and entertaining elements that characterize recreational games [, , ]. Many serious games are affected by a variety of issues that impede the said serious agenda. Although such games are often the result of attempts to gamify an existing teaching method, these often suffer from poor instructional or game design and thus perform poorly as compared to methods they were meant to replace or support [ , ]. Surprisingly, the reverse is also true: When educational elements are inserted into near-completed games as an afterthought, they do not necessarily perform well [ ].
For an educational serious game to be effective, the educational content must be robust, appropriate for the target audience, and well-integrated into the game [, ]. Thus, it comes as no surprise that many serious games are now subject to rigorous validation studies long before implementation.
Recent reviews of the literature suggested that the vast majority of research remains dedicated to testing the efficacies of specific serious games for their intended purposes and, where applicable, the cognitive processes activated by playing these games [, , , , ]. Often, newly developed games are trailed to a targeted sample, typically with a control group, and game efficacy is determined by whether the game achieved its intended result, such as increases in standardized test scores. This user-centric focus, while meritorious, often does not addressed the factors, mechanisms, or processes that afford serious games efficacy and receptivity by their target audiences. It leaves the following question unanswered: “How do I know this serious game will be both serious and a game?”
Consequently, this has resulted in a gap in the literature with regard to consistent, definitive, and validated diagnostic criteria to serve both as a reference and basis of evaluation for serious games. This has impeded the development of new serious games whose topics have not yet been subject to extensive testing and the objective-appropriate selection of ready-made games already available on the market. The lack of criteria is especially punishing for educational pathways featuring intensive study, such as medical institutions, that may be seeking additional educational strategies to enhance student engagement and quality of learning.
The development and acquisition of serious games are costly, time-consuming endeavors, especially in the context of health care. Often, the initiators of serious game development are health care professionals idealizing games to tackle specific health or educational challenges, while the actual game makers are technical specialists with comprehensive knowledge of game development. The absence of practical guidelines, or diagnostic criteria, risks the production of ineffective serious games which, in turn, is compounded by the poor allocation and utilization of resources and leads to institutions being discouraged from adopting games that might significantly enhance the performance of their students.
Although Yusoff et al  and Rooney [ ] have proposed two frameworks established for use by game designers and educators, both were designed for use during the game development process, as opposed to the validation of ready-made games.
The framework by Yusoff et al  combined learning theory with gaming requirements to ensure games meet learning outcomes [ ]. The Learning Activity, built from the intended learning outcomes and game aspects that support learning and engagement, was key to this framework. It acted with the game’s genre and could be modified based on feedback derived from a player’s achievement within the game. However, the framework acts as a guiding tool during the developmental phase of game design and does not readily function as a validation framework. The theoretical bases of the framework also require that users possess familiarity with either or both game development and pedagogy and may not be used easily by prospective game producers unfamiliar with either.
Rooney  proposed a triadic interaction of play, pedagogy, and fidelity that together form a framework for serious game design in higher education. They discussed the theoretical underpinnings and key literature and challenges addressed. Although play and pedagogy referred to game play and the pedagogical aspects of learning, fidelity was defined as the extent to which the game emulated the real world, both physically, such as visual displays and behaviors of physics engines, and functionally, as the extent to which the game behaves like the real world in response to player actions. However, Rooney [ ] acknowledged difficulties in balancing the framework’s components, in part, due to the multidisciplinary and sometimes competing nature of game design that has thus far prevented reconciliation of the components into a coherent theoretical framework. In addition, no validation of the framework, such as designing a game from the ground up, was provided.
Although several frameworks related to serious games have been proposed, all require above-average competencies in the knowledge of game development and are applicable in stages of game development. While touched upon, both frameworks did not define the base markers comprising serious games, were not validated against the existing ready-made serious games, and were not created with specific relevance to or confirmed to be compatible with medical and medical education games. In this context, none are easily distilled into the base components of a serious game.
Therefore, this study aims to deconstruct serious games into their base markers before validating them and an accompanying diagnostic criteria for evaluating serious games that can be used by nongaming experts. The proposed “diagnostic criteria” would enable the validation of ready-made serious games and act as a guide to ensure newly created serious games are both serious and games. The study’s objectives are hence twofold. The study will first detail the procedure used for the deconstruction of serious games into their base markers. These markers will then be tested on three established serious games to ensure that the diagnostic criteria are able to assess the serious and game aspects of serious games.
Deconstruction of Serious Games Into Base Markers
shows the eight-phase procedure by Jabareen [ ] for building a conceptual framework that was used for the isolation of base markers.
In phase 1, sources of serious game data were extensively mapped. Sources consisted of multidisciplinary journal articles reporting on evaluative research on serious games, consultations with educators experienced in game-based teaching, and auditing of learning-through-play workshops during international conferences at the Lee Kong Chian School of Medicine in Singapore. In phase 2, the accumulated data were extensively reviewed and categorized according to discipline, representative power, and importance. In phase 3, a qualitative inquiry into the data was conducted. Concepts that surfaced were identified and named even if they competed with or contradicted each other. In phase 4, concepts were deconstructed to identify their main attributes, characteristics, and roles in serious games. Deconstructed concepts were then sorted by their ontological, epistemological, and methodological roles. In phase 5, concepts with similar features were condensed, reducing the total number of concepts to more manageable levels. In phase 6, using an iterative process, the newly integrated concepts were synthesized into a single theoretical framework. Concepts were subject to repeated resynthesis until a sensible general theoretical framework was recognized. In the study, three clusters of interrelated serious games markers were produced instead of a framework. In phase 7, we examined three established serious games to determine if the markers proposed by Jabareen’s process can be confirmed in games via the detection of each cluster’s base markers. Finally, in phase 8, as theoretical frameworks representing multidisciplinary phenomena remain dynamic, the tested framework will eventually be revised in light of new findings and developments as part of future studies undertaken by the authors and the broader serious games scientific community.
Games Used to Validate the Framework
Three serious games for medical education were used in the validation of the clusters and their base markers. All three games had been previously validated as serious games efficacious in their target areas or audiences and are easily accessible by medical students and professionals, as they are freely available from either the Android or iOS mobile app stores or from the developer’s website [- ].
“Touch Surgery” simulates hundreds of different medical procedures in three-dimensional environments and enables users to train in a three-dimensional environment. Upon selection of a procedure, users are guided through all the appropriate steps of an operation step-by-step, following which, players are provided opportunities for rehearsal and self-assessment via multiple-choice questions that focus on cognitive decision making. Of the available modules, the chest tube insertion procedure was chosen due to its relative novelty to the investigators of the study.
“Dr Game, Surgeon Trouble” trains medical residents to recognize and correct responses to equipment failure events during laparoscopic surgery. Player attention is held by a match-three-puzzle minigame unrelated to surgery and must concurrently solve equipment-related problems in a visually embedded laparoscopic tower. The onset of problems is accompanied by signals such as camera blurring or changes in lighting intensity, which occur partially beyond a player’s direct focus of attention, emulating a surgical environment. Upon detection of a problem, the player is then moved to a troubleshooting mode where the educational aspect of the game features. At this stage, players interact with the laparoscopic tower to resolve equipment malfunctions and are given a set number of “attempts” at the correct solution. Throughout, the game maintains a cycle of challenges, actions, and direct feedback.
“Staying Alive” teaches the general public and health professionals about the management of a sudden cardiac arrest. Players are presented a three-dimensional environment wherein a man has just collapsed due to the onset of cardiac arrest and is guided through the appropriate measures and techniques required to maximize his chances of survival. Difficulty increases across levels, where level 1 takes place in an indoor office, while level 2 takes place in a sports field.
Validation of the Markers
In the validation phase of the study, each game was played till completion and markers that surfaced were noted using the Serious Games Markers Scoring Protocol (and ). As the markers serve as a means to detect the base markers of the game, as opposed to how well or how much of a marker is represented, a binary scoring system was utilized. Markers that clearly featured in games were marked as present and their total scores were tallied.
Markers of Serious Games
shows the three clusters of markers that, when brought together, form the base composition of a serious game—User Experience (UX), Play, and Learning—that must be present for a serious game to be considered both serious and a game. This is in contrast to recreational games and e-learning, which are respectively missing the clusters of Learning and Play.
UX encompassed the player’s cognitive and affective experiences while playing and interacting with the game and found ways in which they were moderated by the game’s actual play, professional applicability, and usability through its user interface. The cluster comprised elements detailed in.
The cluster of Play encompassed the game’s nonlearning mechanics, genres, style, control mechanism (ie, controller and keyboard), within-game objectives, playable content, and infrastructure (ie, cloud-based, hardware requirements). The cluster comprised markers detailed in.
Learning encompassed the scenes, scenarios, or situations wherein the player is exposed to the knowledge or skills the game intends to impart. They need not be singular, and distinct instances can be spread out or embedded into the game’s central mechanics. The cluster comprised markers detailed in.
Due to the need for a simple validation process, each marker of each cluster is marked on a binary level—they are either present (1) or not present (0)—and must be indisputably present in a game to be considered so. Each cluster (UX, Play, and Learning) contains six markers and will be scored from 0 to 6 depending on the number of markers present, with a score of ≥4 denoting a cluster as present. When all clusters score ≥4, the game is considered a serious game.
The serious game to be validated is to be played from start to finish or, for games designed to end midway due to player error, until five game-ending mistakes are committed or until the player is no longer willing to continue due to the inability to overcome the obstacle wherein the mistakes were committed, whichever comes first. The player is to observe the game for each cluster’s markers and record them accordingly.
To facilitate the ease of reporting game scores posttabulation, scores are reported in a UX/Play/Learning format, abbreviated as #U/#P/#L, where # ranges from 0 to 6 depending on the number of markers recorded for the respective clusters.
Scoring the Seriousness of the Games
“Touch Surgery” scored 6U/0P/5L. It was found to possess all six markers of UX and five markers for Learning, missing out on Expectation Defying due to the absence of conditioning stimuli in the Learning cluster of the game. However, the game scored 2 points in the cluster of Play, as the “game” component resembled e-learning as opposed to games defined by Alvarez and Djaouti .
“Dr. Game Surgeon Trouble” scored 6U/5P/6L. It was found to possess all six markers of UX and Learning, but missed out the Scaffolding component in Play due to the study being unable to confirm if its match-three-puzzle minigame increased in difficulty over time nor did the frequency of problem signaling increasing over time.
“Staying Alive” scored 6U/5P/5L. It was also found to possess all six markers of UX but missed out on the Possibility component in Play because the second level was mostly the same as the first and there was uncertainty about whether there was an increase in difficulty.
summarizes the game scores for the three games.
The diagnostic markers of serious games were developed by an iterative process that began with the identification of knowledge within the existing literature. During this process, the two existing frameworks of serious games proposed by Yusoff et al  and Rooney [ ] were found to be used during the game development process and not as a validation tool for ready-made games [ , ]. As no such tool existed, this study attempted to close the gap in the literature through the construction and validation of diagnostic markers that included extensive deconstruction and resynthetization of data until a sensible cluster of markers was recognized. The newly developed clusters were then tested on a series of ready-made health care serious games. Although all games were previously validated by at least one study, some appeared to defy the criteria’s definition of a serious game.
Haubruck et al  conducted a randomized controlled trial that demonstrated the validity and effectivity of “Touch Surgery” as an educational adjunct for chest tube insertion procedures [ ]. The study applied the Gameplay/Purpose/Scope (G/P/S) classification model by Djaouti et al [ ] to confirm Touch Surgery’s status as a serious game. However, the G/P/S model aims to precisely classify a serious game into a subgenre via a system that acknowledges both the “serious” and “game” dimensions. The model’s principal goal was to allow educators to find serious games that were useful for their specific causes but were otherwise not designed for or marketed toward their use. A secondary application of the model was to identify entertainment games that could be repurposed for use in education. In both circumstances, the model requires that the game being classified qualifies as one to begin with.
Similarly, notwithstanding the study by Haubruck et al, no citation referring to Touch Surgery as a serious game could be found, including the manufacturer’s website. This discrepancy is most likely due to a case of mistaken identity due to the multitude of definitions for what exactly construes a serious game. Djaouti et al  noted that serious gaming draws expertise from a broad range of fields such as pedagogy, computer science, and medicine that may not wholly agree with the definition of a serious game [ ].
Alvarez and Djaouti defined “serious games” for health care as games that were designed specifically for the serious purpose of providing education via digital devices . The term “serious gaming” was defined as the use of any digital games for health care education and could also refer to nonserious games used for a serious purpose, which they termed “serious diverting” [ ]. Therefore, although none of the definitions define what construes a game, it can be inferred that the related software must be games to begin with. A related concept (“gamification”) was defined as the use of game elements in real-world applications that typically were not games to begin with [ ]. Such characteristics include rewarding users of e-learning with points, badges, or achievements upon completion of a module. As “Touch Surgery” does not demonstrate transparent implementation of game elements, does not offer users challenging problems outside of its assessment module, and was not cited as a serious game by its manufacturer, it is more applicably categorized as an e-learning tool than a serious game, thereby falling in line with the markers and proposed diagnostic criteria, notwithstanding its efficacy as an educational tool.
Conversely, “Dr. Game Surgeon Trouble” appears to fit well with the proposed criteria for a serious game, and its status as one is supported by existing literature. Graafland et al  had developed the game specifically to improve the problem recognition and resolution skills of surgical trainees [ , ]. The game has had its construct validity established, and a randomized controlled trial demonstrated favorable outcomes in improving trainee problem recognition and resolution. Moreover, while there is no evidence to suggest that Graafland et al [ ] employed existing conceptual frameworks of serious games during the development phase, the game demonstrates the distinct, yet well-integrated game and educational components that fit neatly within common definitions of serious games. Although requiring only a few minutes to complete, “Staying Alive” appeared to fit well within the diagnostic criteria, and its status as a serious game is supported by both the manufacturer and a randomized controlled study by Drummond et al [ ].
Validating all three games through either Yusoff’s  or Rooney’s [ ] conceptual frameworks would logically require that both first be adapted for use in ready-made games. Such a venture, while plausible, is likely a complex undertaking that requires some degree of familiarity with game development, game playing, and related pedagogies such as learning through play or game-based learning in addition to validating the suitability of the adapted frameworks or diagnostic criteria. Instead, due to the differing goals of each, a more appropriate use would be their original purposes: Yusoff’s [ ] framework to ensure newly developed serious games can meet their learning outcomes and Rooney’s [ ] framework to ensure a balance of design and pedagogical elements during game development.
The diagnostic markers for serious games were designed with an aim of simplicity and this, in turn, also serves as one of its limitations. In this regard, the markers may only be used to discriminate between serious games and other digital solutions purporting to be serious games, but belonging to other fields like e-learning. These markers, and the associated diagnostic criteria, cannot determine the efficacy of a serious game, as demonstrated by the analysis of “Staying Alive,” a serious game that did not perform as well as its creators had hoped as compared to a conventional e-learning alternative.
Another limitation is the lack of readily available, high-quality health care serious games to test the markers. Owing to the specific or controlled uses of health care serious games, many remain noncommercialized or unmaintained and thus difficult to acquire. The study was also unable to utilize serious games that required specialized or customized hardware due to logistical constraints.
The diagnostic markers of serious games presented in this study offer a simpler alternative that may be used by professionals or educators without extensive familiarity with serious games. They allow for the validation of ready-made games on the market and are used to confirm the presence of “seriousness” in any given health care serious game.
Although the markers may also be used during the game development phase to ensure the end product is both serious and a game, further evaluation is required to confirm its validity in this regard. The dynamic nature of diagnostic markers and the accompanying criteria, akin to the those employed by modern medicine, could potentially see new, possibly industry-specific markers being developed from evidence surfacing in future works, which might result in criteria that enable educators or professionals to determine the efficacy of premade serious games while maintaining their simplistic approach to game validation.
Conflicts of Interest
Multimedia Appendix 1
Serious Games Markers Scoring Protocol (analog).DOCX File, 18KB
Multimedia Appendix 2
Serious Games Markers Scoring Protocol (digital).XLSX File (Microsoft Excel File), 15KB
- Connolly T, Boyle E, MacArthur E, Hainey T, Boyle J. A systematic literature review of empirical evidence on computer games and serious games. Computers & Education 2012 Sep;59(2):661-686. [CrossRef]
- Alvarez J, Djaouti D. Introduction Au Serious Games. Limoges, France: Questions Théoriques; 2012.
- Gentry SV, Gauthier A, L'Estrade Ehrstrom B, Wortley D, Lilienthal A, Tudor Car L, et al. Serious Gaming and Gamification Education in Health Professions: Systematic Review. J Med Internet Res 2019 Mar 28;21(3):e12994 [FREE Full text] [CrossRef] [Medline]
- Ravyse W, Seugnet Blignaut A, Leendertz V, Woolner A. Success factors for serious games to enhance learning: a systematic review. Virtual Reality 2016 Sep 20;21(1):31-58. [CrossRef]
- Ke F, Abras T. Games for engaged learning of middle school children with special learning needs. Br J Educ Technol 2012 May 23;44(2):225-242. [CrossRef]
- Gorbanev I, Agudelo-Londoño S, González RA, Cortes A, Pomares A, Delgadillo V, et al. A systematic review of serious games in medical education: quality of evidence and pedagogical strategy. Med Educ Online 2018 Dec;23(1):1438718 [FREE Full text] [CrossRef] [Medline]
- Chon S, Timmermann F, Dratsch T, Schuelper N, Plum P, Berlth F, et al. Serious Games in Surgical Medical Education: A Virtual Emergency Department as a Tool for Teaching Clinical Reasoning to Medical Students. JMIR Serious Games 2019 Mar 05;7(1):e13028 [FREE Full text] [CrossRef] [Medline]
- Dobrovsky A, Borghoff UM, Hofmann M. An approach to interactive deep reinforcement learning for serious games. 2016 Presented at: IEEE; Oct 16-18, 2016; Wroclaw, Poland p. 85.
- Heimo OI, Harviainen JT, Kimppa KK, Mäkilä T. Virtual to Virtuous Money: A Virtue Ethics Perspective on Video Game Business Logic. J Bus Ethics 2016 Dec 10;153(1):95-103. [CrossRef]
- Bigdeli S, Kaufman D. Digital games in medical education: Key terms, concepts, and definitions. Med J Islam Repub Iran 2017;31:52 [FREE Full text] [CrossRef] [Medline]