November 13, 2019 feature

Using imitation and reinforcement learning to tackle long-horizon robotic tasks

by Ingrid Fadelli , Tech Xplore

Reinforcement learning (RL) is a widely used machine-learning technique that entails training AI agents or robots using a system of reward and punishment. So far, researchers in the field of robotics have primarily applied RL techniques in tasks that are completed over relatively short periods of time, such as moving forward or grasping objects.

A team of researchers at Google and Berkeley AI Research has recently developed a new approach that combines RL with learning by imitation, a process called relay policy learning. This approach, introduced in a paper prepublished on arXiv and presented at the Conference on Robot Learning (CoRL) 2019 in Osaka, can be used to train artificial agents to tackle multi-stage and long-horizon tasks, such as object manipulation tasks that span over longer periods of time.

"Our research originated from many, mostly unsuccessful, experiments with very long tasks using reinforcement learning (RL)," Abhishek Gupta, one of the researchers who carried out the study, told TechXplore. "Today, RL in robotics is mostly applied in tasks that can be accomplished in a short span of time, such as grasping, pushing objects, walking forward, etc. While these applications have a lot value, our goal was to apply reinforcement learning to tasks that require multiple sub-objectives and operate on much longer timescales, such as setting a table or cleaning a kitchen."

Before they started developing their approach, Gupta and his colleagues reviewed previous literature to try and determine why longer tasks are particularly hard to tackle using current RL techniques. In their paper, they suggest that there are generally two main reasons for this.

First, it is hard for a robot to identify optimal solutions for solving long and complex tasks on its own. Second, it is difficult for the agent to successfully tackle a long task for which feedback is provided only at the end of a long sequence. Relay policy learning, the new approach to learning that they presented, is designed to address both of these challenges head-on.

"To address the challenge of having robots solve long-horizon tasks on their own, we decided to simplify the problem and use human-provided demonstrations," Gupta said. "Solving long tasks is difficult because it's extremely hard to have a robot discover an interesting behavior on its own—human-provided demonstrations can be used as a guideline for interesting things to do in an environment."

The approach for robot learning proposed by Gupta and his colleagues has two distinct stages, one in which an agent learns by imitating humans and the other based on RL. In the imitation learning stage, a robot is fed human demonstrations of how to complete a task and produces goal-conditioned hierarchical policies.

In their study, the researchers used their approach to train an artificial agent called Franka on multi-stage and long-horizon manipulation tasks in a simulated kitchen environment, which was modeled using the physics simulator platform MuJoCo. This environment consisted of a kitchen with an openable microwave, four oven burners, an oven light switch, a kettle, two hinged cabinets and a sliding cabinet door.

"Importantly, learning from demonstrations alone is not enough to solve the challenging tasks in our simulated kitchen environment," Karol Hausman, another researcher involved in the study, told TechXplore. "In order to improve upon this initial solution, we allow the robots to practice the tasks on their own to further refine their behaviors."

Essentially, using the relay policy learning method proposed by the researchers, an agent initially learns by processing human demonstrations of how to complete a given task and then continues learning on its own via RL. To make the process of learning long-horizon policies easier, the team used a new data-relabeling algorithm that allows an agent to learn goal-conditioned hierarchical policies.

"In order to tackle the challenge of sparse feedback, we use a hierarchical structure for our control policies: The high-level policy proposes goals that the low-level policy tries to accomplish—for example, close a cabinet, turn the burner off, etc.," Hausman explained. "This way, the task can be easily decomposed into smaller subproblems that can be solved with reinforcement learning bootstrapped from human-provided demonstrations."

Guppta, Hausman and their colleagues evaluated the effectiveness of relay policy learning for training robots in long-horizon tasks within the simulated kitchen environment they created, achieving very promising results. They found that with the right policy structure and demonstration data, their approach allowed robots to tackle much longer horizon tasks than they initially thought possible.

"We hope that our findings can open up new avenues of combining imitation and reinforcement learning research and gives us a potential direction that can allow robots to perform long, complex tasks," Hausman said.

In the future, the relay policy learning approach introduced by Gupta, Hausman and their colleagues could be used to train robots on a broader range of long-horizon tasks. The researchers have so far only tested their technique in a simulated environment; thus, it would be interesting to evaluate it in real-world settings and see whether it achieves equally promising results.

"As a next step, we would like to look into the problem of generalization beyond the demonstration data," Hausman said. "Eventually, we would also like to further improve the data-efficiency of our method, move to pixel observations and enable real-world learning on a physical robot."

More information: Relay policy learning: solving long-horizon tasks vis imitation and reinforcement learning. arXiv:1910.11956 [cs.LG]. arxiv.org/abs/1910.11956

relay-policy-learning.github.io/

Citation: Using imitation and reinforcement learning to tackle long-horizon robotic tasks (2019, November 13) retrieved 30 June 2024 from https://techxplore.com/news/2019-11-imitation-tackle-long-horizon-robotic-tasks.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Solving a Rubik's Cube with a dexterous hand

99 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Jun 28, 2024

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

Using imitation and reinforcement learning to tackle long-horizon robotic tasks

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Solving a Rubik's Cube with a dexterous hand

An algorithm to teach robots pre-grasping manipulation strategies

Learning from mistakes and transferable skills—the attributes for a worker robot

Showing robots 'tough love' helps them succeed, finds new study

RoboTurk: A crowdsourcing platform for imitation learning in robotics

Researchers use machine learning to teach robots how to trek through unknown terrains

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Phys.org

Medical Xpress

Science X

Using imitation and reinforcement learning to tackle long-horizon robotic tasks

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Solving a Rubik's Cube with a dexterous hand

An algorithm to teach robots pre-grasping manipulation strategies

Learning from mistakes and transferable skills—the attributes for a worker robot

Showing robots 'tough love' helps them succeed, finds new study

RoboTurk: A crowdsourcing platform for imitation learning in robotics

Researchers use machine learning to teach robots how to trek through unknown terrains

Recommended for you

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Your Privacy