March 5, 2019 feature
Generating cross-modal sensory data for robotic visual-tactile perception
Perceiving an object only visually (e.g. on a screen) or only by touching it, can sometimes limit what we are able to infer about it. Human beings, however, have the innate ability to integrate visual and tactile stimuli, leveraging whatever sensory data is available to complete their daily tasks.
Researchers at the University of Liverpool have recently proposed a new framework to generate cross-modal sensory data, which could help to replicate both visual and tactile information in situations in which one of the two is not directly accessible. Their framework could, for instance, allow people to perceive objects on a screen (e.g. clothing items on e-commerce sites) both visually and tactually.
"In our daily experience, we can cognitively create a visualization of an object based on a tactile response, or a tactile response from viewing a surface's texture," Dr. Shan Luo, one of the researchers who carried out the study, told TechXplore. "This perceptual phenomenon, called synesthesia, in which the stimulation of one sense causes an involuntary reaction in one or more of the other senses, can be employed to make up an inaccessible sense. For instance, when one grasps an object, our vision will be obstructed by the hand, but a touch response will be generated to 'see' the corresponding features."
The perceptual phenomenon described by Dr. Luo typically occurs when a perception source is unavailable (e.g., when touching objects inside a bag without being able to see them). In such situations, humans might "touch to see" or "see to feel," interpreting features related to a particular sense based on information gathered using their other senses. If replicated in machines, this visual-tactile mechanism could have several interesting applications, particularly in the fields of robotics and e-commerce.
If robots were able to integrate visual and tactile perception, they could plan their grasping and manipulation strategies more effectively based on the visual characteristics of the objects that they are working with (e.g. shape, size, etc.). In other words, robots would perceive the overall tactile properties of objects before grasping them, using visual information gathered by cameras. While grasping an object outside of the camera's field of view, on the other hand, they would use tactile-like responses to make up for the the lack of visual information available.
If paired with a tactile device, which has not yet been developed, the framework proposed by Dr. Luo and his colleagues could also be used in e-commerce, for instance, allowing customers to feel the fabric of clothes before purchasing them. With this application in mind, the researchers used conditional generative adversarial networks to generate pseudo visual images using tactile data and vice versa (i.e. to generate tactile outputs using visual data).
"In online marketplaces, customers shop by viewing pictures of clothes or other items," Dr. Luo said. "However, they are unable to touch these items to feel their materials. Feeling an item is quite important while shopping, particularly when buying delicate items, such as underwear. Allowing users to feel items at home, using a tactile device that is yet to be developed, the cross-modal sensory data generation scheme proposed in our paper can help e-commerce customers to make more informed choices."
Dr. Luo and his colleagues evaluated their model on the VITac dataset, which contains macro images and tactile readings (captured using a GelSight sensor) of 100 different types of fabric. They found that it could effectively predict sensory outputs for one sense (i.e. vision or touch) using data relevant to the other.
The study carried out by Dr. Luo and his colleagues attained remarkable results in generating realistic tactile and visual patterns for different fabrics in the absence of tactile or visual information, accordingly. Using their framework, the researchers successfully 'replicated' tactile elements of fabrics using visual data, and vice versa.
"To our best knowledge, this work is the first attempt to achieve robotic cross-modal visual-tactile data generation, which can also be extended to cross-modal data generation for other modalities," Dr. Luo said. "The practical implications of our study are that we can make use of other senses to make up an inaccessible sense."
In the future, the framework proposed by Dr. Luo and his colleagues could be used to improve grasp and manipulation strategies in robots, as well as to enhance online shopping experiences. Their method could also be used to expand datasets for classification tasks, by generating sensory data that would otherwise be inaccessible.
The paper outlining this recent study, pre-published on arXiv, will be presented at the 2019 International Conference on Robotics and Automation (ICRA), which will take place in Montreal, Canada, between the 20th and 24th of May. At the conference, Dr. Luo will also be running a workshop related to the topic of his study, called "ViTac: Integrating vision and touch for multimodal and cross-modal perception."
© 2019 Science X Network