Past research has identified student engagement, or the extent to which students participate and are involved in classroom activities, as a crucial factor determining both the quality of education programs and the academic performance of individual students. As a result, many educators worldwide are actively trying to devise courses that maximize student engagement.
Assessing student engagement effectively and reliably, however, can be fairly challenging. Techniques to monitor the engagement and participation of students in the classroom over time, and without intruding or adversely impacting their learning experience, would thus be of great value, as they could be used to investigate the effectiveness of courses and education strategies.
Researchers at University of Tübingen and Leibniz Institute für Wissensmedien in Germany, as well as University of Colorado Boulder, have recently investigated the potential of machine-learning techniques for assessing student engagement in the context of classroom research. More specifically, they devised a deep-neural-network-based architecture that can estimate student engagement by analyzing video footage collected in classroom environments.
"We used camera data collected during lessons to teach a deep-neural-network-based model to predict student engagement levels," Enkelejda Kasneci the leading HCI researcher in the multidisciplinary team that carried out the study, told TechXplore. "We trained our model on ground-truth data (e.g., expert ratings of students' level of engagement based on the videos recorded in the classroom). After this training, the model was able to predict, for instance, whether data obtained from a particular student at a particular point in time indicates high or low levels of engagement."
The model devised by Kasneci and her colleagues can scan large datasets of videos shot in classroom environments and identify instances where student engagement was either high or low. According to Peter Gerjets, the leading cognitive psychologist in the team, such a method could help to identify classroom instruction strategies that are associated with high student attention and could also be used in teacher training programs.
"For us as a research team, it is very important to stress that the goal is not to closely monitor specific students, but rather to develop intelligent engagement strategies for more effective instruction," Gerjets explained. "It can be used to improve teaching effectiveness, but never for teacher evaluation. In fact, when it comes to objectives regarding the application of these technologies, using machine learning to support instructional scenarios is not only a question of what can, but also of what should be done with these technologies. In all steps of our research, we closely attend to ethical issues that need to be discussed concerning topics of privacy, transparency, fairness, accountability and intended use."
Videos filmed in classroom environments have been used to conduct education-related research for several years now. So far, however, these videos were typically analyzed manually and annotated by human raters, who were asked to identify patterns or details relevant to the specific project they were being used for. Recent advancements in the field of computer vision and machine learning, however, have enabled the development of techniques that can automatically analyze large amounts of videos and identify specific patterns in them, including the one developed by the researchers in Tübingen and UC Boulder.
"Most previous works based on face analysis were on small-scale video data and depended on good face alignment and handcrafted features," Enkelejda Kasneci and Sidney D'Mello told TechXplore. "Deep learning, however, offers us the opportunity to learn useful representations from big data and improves the performance of engagement classifiers. Our study was aimed at enabling the automated estimation of engagement as seamlessly as possible without requiring any expensive manual ratings or intrusive sensors."
The deep neural model was primarily trained on visual data. Levels of student engagement can be gauged primarily by looking at a student's attention and at his/her emotional responses (i.e., attentional and affective cues). The researchers thus trained two residual neural networks, the first (Attention-Net) was trained to estimate the direction that the heads of student pointed towards and the second (Affect-Net) to determine their emotions by analyzing their facial expressions.
"Subsequently, we trained readout classifiers based on both of these features to classify engagement in three categories: low, medium and high," Kasneci said. "These classifiers are based on support vector machines, random forest, multilayer perceptron, and long- and short-term memory approaches."
Instead of training their algorithms on raw images, the researchers trained them on deep embeddings (i.e., low-dimensional representations of these images). This allows them to be easily re-trained or personalized using very limited new data (a short video sequence of 60-seconds).
"To summarize, our study showed that deep learning can efficiently capture engagement in classroom research. The generalized engagement patterns, together with the corresponding teaching content, can be used to devise more effective educational strategies," Kasneci said. "This way, classroom research studies can be conducted more efficiently, consequently helping to improve teaching effectiveness. However, alongside the ethical considerations, there are also open research questions regarding deep learning, for instance those related to dataset and algorithmic fairness, interpretability and robustness."
In their next studies, the researchers plan to test the validity and effectiveness of their technique to assess student engagement on different groups of students. They will also develop their approach further, to ensure its reliability, fairness and interpretability.
As it was designed specifically for research purposes, the model ensures the anonymity of students captured in video recordings. Moreover, the system deletes raw video footage immediately after it is used to extract deep embeddings and only stores data related to an overall group of students (as opposed to individual students). While theoretically it can map an individual student's engagement over time, this can be easily avoided.
"There are a number of intriguing questions that we plan to address in our next studies, and they cover both the more computational and the more content-related aspects of our research," Ulrich Trautwein, an educational psychologist involved in the study, told TechXplore. "Our goal is to better understand the antecedents of different levels of engagement in classrooms and how they can be positively influenced by high teaching quality. At this point, we also emphasize that engagement is complex and that the present technology mainly focuses on overt behavioral engagement based on visible behaviors. There is still a lot to be done to measure more covert engagement states like elaborative processing and mind wandering, but let me reiterate: We strongly oppose any use of such solutions for real-world classroom monitoring of students and teachers, both for ethical reasons and because of possible negative side effects of such arrangements on student motivation and learning and for teacher rights."
More information: Multimodal engagement analysis from facial videos in the classroom. arXiv:2101.04215 [cs.CV]. arxiv.org/abs/2101.04215
© 2021 Science X Network