A robot that can detect subtle noises in its surroundings and use them to localize nearby humans
To safely share spaces with humans, robots should ideally be able to detect their presence and determine where they are located, so that they can avoid accidents and collisions. So far, most robots were trained to localize humans using computer vision techniques, which rely on cameras or other visual sensors.
A research team at the Georgia Institute of Technology (Georgia Tech) has developed an alternative method for localizing a person that relies on the subtle sounds naturally produced when moving around in a given environment. This method, introduced in a paper pre-published on arXiv, can be applied to a broad range of robotic systems.
"Our group has recently been interested in exploring a high-level theme of research regarding what types of 'hidden' information are freely available that we can train models on," Mengyu Yang, one of the authors of the paper, told Tech Xplore. "Often in robotics, acoustic human detection requires the person to produce extraneous sounds such as talking or clapping. Based on these interests, we wanted to see if the subtle and incidental sounds that humans inadvertently produce as they move can be that 'free' signal."
The acoustic localization method proposed by Yang and his colleagues relies on machine learning algorithms. The team thus had to first compile a dataset that would allow them to effectively train their algorithms.
The dataset they created, dubbed the Robot Kidnapper dataset, contains 14 hours of high-quality four-channel audio recordings paired with 360 RGB camera footage. These recordings were collected during experimental trials where people were asked to move around a robot in different ways.
"To collect the dataset, we recorded participants moving around a Stretch RE-1 robot at various levels of 'sneakiness' (e.g., walking quietly, walking normally, etc.)," Yang explained. "With this data, we're able to train machine learning models that take audio in the form of spectrograms and predict whether there is actually a person nearby and if so, their location relative to the robot."
The machine learning technique developed by Yang and his colleagues was trained to localize humans solely based on sound. As it only requires audio recorded by microphones, it could theoretically be implemented on any robot with an integrated microphone.
The researchers trained their model to ignore external and irrelevant noises, such as those originating from heating, ventilation, and air conditioning systems, as well as sounds produced by the robot itself. In initial tests, they tested their technique on the Stretch RE-1 robot, a low-cost and compact robotic manipulator developed by Hello Robot.
"We believe our audio-based method for human detection is important for the development of multi-modal person detection systems that are robust to failures," Yang said. "Robots commonly use cameras or lidar to navigate around people, but should those sensors fail or become unavailable (low-lit environments, occlusions, etc.), our method allows robots to fall back solely onto audio, which is usually already available in most hardware setups. Moreover, when interacting with robots, people should not be expected to intentionally create extra sounds, which is what previous works rely on."
In initial tests with the Stretch RE-1 robot, the team's technique was found to perform twice as well as other acoustic localization methods, allowing for effective localization of nearby humans solely based on the sounds incidentally produced while walking. These results highlight the feasibility of acoustic localization, which is highly scalable and less intrusive than camera-based localization.
"We believe this is an improvement over previous works on acoustic human detection because our method does not require the person to produce extraneous sounds to be heard by the robot," Yang said.
"This can potentially be useful for robots that navigate in shared indoor spaces with people (household robots, industrial robots, etc.), allowing for a non-intrusive method for detecting where people are. While methods with cameras can potentially capture identifying features such as faces or tattoos and acoustic methods that require people to talk for example can capture their voice, the data we use for human detection is also much more difficult to identify the person with."
In the future, the technique for human localization devised by Yang and his colleagues could help to improve the safety and performance of robots designed to closely collaborate with humans, while also preserving their users' privacy. This work could also inspire other research groups to create other localization methods for robotic or even security-related applications that rely on subtle sounds.
"We collected data of people standing still in addition to moving around," Yang added. "While our current paper only focuses on detecting and localizing moving people, we hope to in a future work be able to detect people standing still as well using audio only, perhaps through the faint sounds of their breathing or even from the slight changes to the ambient sound of the room due to their presence."
More information: Mengyu Yang et al, The Un-Kidnappable Robot: Acoustic Localization of Sneaking People, arXiv (2023). DOI: 10.48550/arxiv.2310.03743
© 2023 Science X Network