January 7, 2021 feature
A framework to evaluate the cognitive capabilities of machine learning agents
Over the past decade or so, computer scientists have developed machine learning (ML) techniques that perform remarkably well on a variety of tasks. While these algorithms are designed for artificially replicating human cognitive skills, there is still a lack of tools to compare their capabilities with those of humans.
With this in mind, two researchers at Savitribai Phule Pune University (SPPU) in India have recently created a framework to perform cognitive evaluations of machine-learning agents. This unique framework, outlined in a paper published in Elsevier's Cognitive Systems Research journal, draws parallels from human cognition, as it is described by psychology theories, and machine cognition.
"When I started working on my core research about few shot learning (FSL), my advisor and I contemplated on how humans can easily learn to classify objects visually and why it is so difficult for machines," Suvarna Kadam, one of the researchers who carried out the study, told TechXplore. "Humans can generalize, but machines find it quite challenging. A quick analysis of state-of-the-art FSL methods made us realize that it is not just hard to assess 'how much is learned' with performance metrics, but often, we also have no idea if a machine is truly comprehending the task at hand or merely mimicking."
Once they realized that there is a lack of reliable methods to evaluate the cognition of ML techniques, Kadam and her supervisor Vinay Vaidya started asking themselves fundamental questions about machine cognition and how it could be effectively assessed. Eventually, they decided to devise a structured approach that could help researchers to understand how machines acquire new skills and assess how much they actually learned. The framework they created offers a simple way of thinking about machine cognition, drawing parallels with human cognition.
"We decided to use humanity's collective wisdom about how humans learn and how they measure learning," Kadam explained. "Our framework uses human cognitive theories to provide stepwise guidelines to assess a machine's learning in any domain. It advocates that we list a domain's tasks and check whether they are simple or challenging to implement, which then allows us to arrange tasks in a taxonomy based on their cognitive difficulty."
The framework created by Kadam and Vaidya is designed to prompt reflection about what makes a task harder or easier to tackle than another. Human learning is generally evaluated based on how well a learner did on a specific task. The framework proposed by the researchers can be used to evaluate a machine's task-specific cognition, utilizing a concept referred to as task taxonomy.
"Since humans are very good at generalizing and quickly adapting to a new task, we also demonstrated how to quantify the generalization potential of machines," Kadam said. "For the first time, our study highlighted the fact that machines are displaying higher intelligence and we must move beyond performance metrics to measure it."
In their recent paper, Kadam and Vaidya used their framework to compare two state-of-the-art ML techniques. It could thus also prove useful for other research teams who are trying to identify the 'best' ML model for completing a specific task among different options.
In the future, the same framework could also help to better understand the processes behind a machine's predictions or actions. This could ultimately improve the reliability of AI systems, allowing developers to attain greater insight on their cognitive capabilities.
"With this framework, we explored how cognition and learning are intertwined, and learning is greatly influenced by cognition," Kadam said. "However, learning is also greatly affected by the skills a learner possesses and attitude she/he carries. It would be really interesting to see if we can extend our work to assess physical and emotional skills of machines. Though emotional skills of machines look distant and unrealized, machines are already being used in close human interactions (e.g., chatbots, robots for caregiving or companionship, etc.), so we feel they should also be tested on their emotional quotient."
© 2021 Science X Network