October 23, 2020 feature
PESAO: An experimental setup to evaluate the perceptions of freely moving humans
Humans regularly tackle and solve a variety of complex visuospatial problems. In contrast, most machine learning and computer vision techniques developed so far are designed to solve individual tasks, rather than applying a set of capabilities to any task they are presented with.
Researchers at York University in Canada have been trying to better understand the mechanisms that allow humans to actively observe their environment and solve the wide range perception tasks that they encounter every day, with the hope of informing the development of more sophisticated computer vision systems. In a paper pre-published on arXiv, they presented a new experimental setup called PESAO (psychophysical experimental setup for active observers), which is specifically designed to investigate how humans actively observe the world around them and engage with it.
"The hallmark of human vision is its generality," Prof. John K. Tsotsos, one of the researchers who carried out the study, told TechXplore. "The same brain and visual system allow one to play tennis, drive a car, perform surgery, view photo albums, read a book, gaze into your loved one's eyes, go online shopping, solve 1000-piece jigsaw puzzles, find lost keys, chase after his/her young daughter when she appears in danger and so much more. The reality is that as incredible as AI successes have been so far, it is humbling to acknowledge how far there still is to go."
The key objective of the research carried out by Tsotsos' research lab is to gain a better understanding of the mechanisms and processes that allow humans to solve a variety of problems. This could inform the development of machine learning systems that can achieve human-like performance on a multitude of tasks, rather than specializing on a single application.
"One cannot take the visual system of Google's self-driving car and ask it to solve a jigsaw puzzle, nor can one ask any of the top-performing image categorization systems to serve as the vision component of a tennis-playing robot," Tsotsos said. "The successes have all been unitaskers (they have a single function), while the human visual system is a multitasker and the tasks one can teach that system seem unbounded. Our research lab is interested in understanding and developing algorithms that can achieve these complex functions."
A few years ago, Tsotsos and his doctoral student Markus D. Solbach started looking for past research that offered valuable insight about how humans solve perception tasks and were disappointed to find close to nothing. So far, in fact, psychologists and neuroscientists never carried out experiments in which humans solved complex tasks within a staged 3-D environment. The goal of their recent study was to fill this gap in the literature, by developing an experimental setup that could support these experiments.
"To the best of our knowledge, PESAO is the first of its kind, as it combines precise head motion tracking in full 3-D with gaze tracking at microsecond resolution, while a subject is entirely untethered and can move freely and naturally," Solbach, the lead researcher in the study, told TechXplore. "These characteristics are crucial to explore active human visual perception, a powerful ability that humans are remarkably capable of every waking moment of the day, but which is surprisingly still poorly understood."
When they designed PESAO, Solbach and Tsotsos tried to ensure its generalizability and adjustability, as they wanted to make sure that other research teams were also able to use it, adapting it to their needs. PESAO was made publicly available and can now be accessed by other teams at http://data.nvision2.eecs.yorku.ca/PESAO/.
The first experiment they conducted using the PESAO setup focused on a specific visuospatial problem that involved determining whether two objects are the same or different. This is something that most humans do automatically on an everyday basis and that robots should also ideally be able to do. The results they collected suggest that this problem is difficult, if not impossible, to solve using a single algorithm.
"Human subjects performing this task exhibit a range of strategies chosen depending on how the task is presented, such as starting positions of objects and of the observer," Tsotsos said. "Our earlier theoretical results on such visuospatial problems are consistent with such a problem decomposition because the general problem can be proved to be intractable. Furthermore, the variability of how humans move about during the solution of the task shows that any pure machine learning strategy is unlikely to be possible without an impractically enormous amount of computational power."
While many past studies have explored how humans tackle different perception tasks, they typically did so by presenting participants with 2-D content on a screen, rather than asking them to engage with real 3-D environments. In the future, PESAO could thus enable new types of studies where the perception capabilities of humans are evaluated in a more realistic setting.
So far, the experimental setup developed by Solbach and Tsotsos spans across an area of 400cm x 300cm and can be used to track human subjects at a frequency of 120Hz. In addition, PESAO records a human subject's head motion, gaze, eye movements, first-person and birds-eye video footage, angular rate and experimenter notes; all of which is synchronized at a microsecond resolution.
To this point, the researchers used PESAO to investigate a single perception task, but they now plan to conduct further studies investigating human active object recognition capabilities. Active object recognition is a crucial visual ability that could enhance the performance of any robotic system designed to assist humans in their homes or with potential applications in manufacturing, customer service and healthcare settings.
"For PESAO, we plan to extend the setup with light sensors to measure the actual light intensity in the scene and the stimulus," Solbach said. "It goes without saying, our visual system cannot function without light; hence measuring light can be useful for a wide range of vision research."
In their upcoming experiments, Solbach and Tsotsos will ask subjects to complete a "spatial relations task" and a "hidden patterns task." The first of these is a task in which an observer tries to determine the spatial relations between objects in a 3-D environment (i.e., which one is in closer to them, which one is further away, higher, lower, to the left or right, etc.). The "hidden patterns" task, on the other hand, asks a person to determine if a 3-D pattern is embedded within a larger 3-D pattern, for instance in samples where one pattern is camouflaged into another.
"These are two among the large number of visuospatial tasks commonly presented in a 2-D form in children's games and IQ tests, but which have never been studied in a 3-D active observation context," Solbach said.
© 2020 Science X Network