November 28, 2023 feature

An approach that allows robots to learn in changing environments from human feedback and exploration

by Ingrid Fadelli , Tech Xplore

To best assist humans in real-world settings, robots should be able to continuously acquire useful new skills in dynamic and rapidly changing environments. Currently, however, most robots can only tackle tasks that they have been previously trained on and can only acquire new capabilities after further training.

Researchers at University of Washington and Massachusetts Institute of Technology (MIT) recently introduced a new approach that allows robots to learn new skills while navigating changing environments. This approach, presented at the 7th Conference on Robot Learning (CoRL), utilizes reinforcement learning to train robots using human feedback and information gathered while exploring their surroundings.

"The idea for this paper came from another work we published recently," Max Balsells, co-author of the paper, told Tech Xplore. The current paper is available on the arXiv preprint server.

"In our previous study, we explored how to use crowdsourced (potentially inaccurate) human feedback gathered from hundreds of people over the world, to teach a robot how to perform certain tasks without relying on extra information, as is the case in most of the previous work in this field."

While in their previous study, Balsells and their colleagues attained promising results, the method they proposed had to be constantly reset to teach robots new skills. In other words, each time the robot tried to complete a task, its surroundings and settings would go back to how they were before the trial.

"Having to reset the scene is an obstacle if we want robots to learn any task with as little human effort as possible," Balsells said. "As part of our recent study, we thus set out to fix that issue, allowing robots to learn in a changing environment, still just from human feedback, as well as random and guided exploration."

The new method developed by Balsells and his colleagues has three key components, dubbed the policy, goal selector and the density model, each supported by a different machine-learning technique. The first model essentially tries to determine what the robot needs to do to get to a specific location.

"The goal of the policy model is to understand which actions the robot has to take to arrive at a certain scenario from where it currently is," Marcel Torne, co-author of the paper, explained. "The way this first model learns that is by seeing how the environment changed after the robot took an action. For example, by looking at where the robot or the objects of the room are after taking some actions."

Essentially, the first model is designed to identify the actions that the robot will need to take to reach a specific target location or objective. In contrast, the second model (i.e., the goal selector) guides the robot while it is still learning, communicating the moment when it is closer to achieving a set goal.

"The objective of the goal selector is to tell in which cases the robot was closer to achieving the task," Balsells said. "That way, we can use this model to guide the robot by commanding the scenarios that it has already seen, in which it was closer to achieving the task. From there, the robot can just do random actions to explore more that part of the environment. If we didn't have this model, the robot wouldn't do meaningful things, making it very hard for the first model to learn anything. This model learns that from human feedback."

The team's approach ensures that as a robot moves in its surroundings, it continuously relays scenarios it encounters to a specific website. Crowdsourced human users then browse through these scenarios and the robot's corresponding actions, letting the model know when the robot is closer to achieving a set goal.

"Finally, the goal of the third model (i.e., the density model) is to know whether the robot already knows how to get to a certain scenario from where it currently is," Balsells said. "This model is important to make sure that the second model is guiding the robot to scenarios that the robot can get to. This model is trained on data representing the progression from different scenarios to the scenarios in which the robot ended up."

The third model within the researchers' framework basically ensures that the second model only guides the robot to accessible locations that it knows how to reach. This promotes learning through exploration, while reducing the risk of incidents and errors.

"The goal selector guides the robot to make sure that it goes to interesting places," Torne said. "Notably, the policy and density models learn just by looking at what happens around, that is, how the location of the robot and the objects change as the robot interacts. On the other hand, the second model is trained using human feedback."

Notably, the new approach proposed by Balsells and his colleagues only relies on human feedback to guide the robot in its learning, rather than to specifically demonstrate how to perform tasks. It thus does not require extensive datasets containing footage of demonstrations and can promote flexible learning with fewer human efforts.

"By using the third model to know which scenarios the robot can actually get to, we don't have to reset anything, the robot can learn continuously even if some objects are no longer at the same location," Torne said. "The most important aspect of our work is that it allows anyone to teach a robot how to solve a certain task just by letting it run on its own while connecting it to the internet, so that people around the world tell it from time to time in which moments it was closer to achieving the task."

The approach introduced by this team of researchers could inform the development of more reinforcement learning-based frameworks that enable robots to improve their skills and learn in dynamic real-world environments. Balsells, Torne and their colleagues now plan to expand their method, providing the robot some 'primitives' or basic guidelines on how to perform specific skills.

"For example, right now the robot learns which motors it has to move at every time, but we could program how the robot could move to a certain point of a room, and then the robot wouldn't need to learn that; it would just need to know where to move to," Balsells and Torne added.

"Another idea that we want to explore in our next studies is the use of big pre-trained models already trained for a bunch of robotics tasks (e.g., ChatGPT for robotics), adapting them to specific tasks in the real world using our method. This could allow anyone to easily and quickly teach robots to achieve new skills, without having to retrain them from scratch."

More information: Max Balsells et al, Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback, arXiv (2023). DOI: 10.48550/arxiv.2310.20608

Journal information: arXiv

Citation: An approach that allows robots to learn in changing environments from human feedback and exploration (2023, November 28) retrieved 29 June 2024 from https://techxplore.com/news/2023-11-approach-robots-environments-human-feedback.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New method uses crowdsourced feedback to train robots

42 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

23 hours ago

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

An approach that allows robots to learn in changing environments from human feedback and exploration

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

New method uses crowdsourced feedback to train robots

Novel learning framework allows robots to perform interactive tasks in sequential order

Researchers expand ability of robots to learn from videos

New dual-arm robot achieves bimanual tasks by learning from simulation

A robot that can autonomously explore real-world environments

New technique helps user understand why a robot failed, then fine-tune it to perform task

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

Phys.org

Medical Xpress

Science X

An approach that allows robots to learn in changing environments from human feedback and exploration

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

New method uses crowdsourced feedback to train robots

Novel learning framework allows robots to perform interactive tasks in sequential order

Researchers expand ability of robots to learn from videos

New dual-arm robot achieves bimanual tasks by learning from simulation

A robot that can autonomously explore real-world environments

New technique helps user understand why a robot failed, then fine-tune it to perform task

Recommended for you

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

Your Privacy