@Article{info:doi/10.2196/21620, author="Mitre-Hernandez, Hugo and Covarrubias Carrillo, Roberto and Lara-Alvarez, Carlos", title="Pupillary Responses for Cognitive Load Measurement to Classify Difficulty Levels in an Educational Video Game: Empirical Study", journal="JMIR Serious Games", year="2021", month="Jan", day="11", volume="9", number="1", pages="e21620", keywords="video games; pupil; metacognitive monitoring; educational technology; machine learning", abstract="Background: A learning task recurrently perceived as easy (or hard) may cause poor learning results. Gamer data such as errors, attempts, or time to finish a challenge are widely used to estimate the perceived difficulty level. In other contexts, pupillometry is widely used to measure cognitive load (mental effort); hence, this may describe the perceived task difficulty. Objective: This study aims to assess the use of task-evoked pupillary responses to measure the cognitive load measure for describing the difficulty levels in a video game. In addition, it proposes an image filter to better estimate baseline pupil size and to reduce the screen luminescence effect. Methods: We conducted an experiment that compares the baseline estimated from our filter against that estimated from common approaches. Then, a classifier with different pupil features was used to classify the difficulty of a data set containing information from students playing a video game for practicing math fractions. Results: We observed that the proposed filter better estimates a baseline. Mauchly's test of sphericity indicated that the assumption of sphericity had been violated ($\chi$214=0.05; P=.001); therefore, a Greenhouse-Geisser correction was used ($\epsilon$=0.47). There was a significant difference in mean pupil diameter change (MPDC) estimated from different baseline images with the scramble filter (F5,78=30.965; P<.001). Moreover, according to the Wilcoxon signed rank test, pupillary response features that better describe the difficulty level were MPDC (z=−2.15; P=.03) and peak dilation (z=−3.58; P<.001). A random forest classifier for easy and hard levels of difficulty showed an accuracy of 75{\%} when the gamer data were used, but the accuracy increased to 87.5{\%} when pupillary measurements were included. Conclusions: The screen luminescence effect on pupil size is reduced with a scrambled filter on the background video game image. Finally, pupillary response data can improve classifier accuracy for the perceived difficulty of levels in educational video games. ", issn="2291-9279", doi="10.2196/21620", url="http://games.jmir.org/2021/1/e21620/", url="https://doi.org/10.2196/21620", url="http://www.ncbi.nlm.nih.gov/pubmed/33427677" }