December 29, 2020

Army research leads to more effective training model for robots

Multi-domain operations, the Army's future operating concept, requires autonomous agents with learning components to operate alongside the warfighter. New Army research reduces the unpredictability of current training reinforcement learning policies so that they are more practically applicable to physical systems, especially ground robots.

These learning components will permit autonomous agents to reason and adapt to changing battlefield conditions, said Army researcher Dr. Alec Koppel from the U.S. Army Combat Capabilities Development Command, now known as DEVCOM, Army Research Laboratory.

The underlying adaptation and re-planning mechanism consists of reinforcement learning-based policies. Making these policies efficiently obtainable is critical to making the MDO operating concept a reality, he said.

According to Koppel, policy gradient methods in reinforcement learning are the foundation for scalable algorithms for continuous spaces, but existing techniques cannot incorporate broader decision-making goals such as risk sensitivity, safety constraints, exploration and divergence to a prior.

Designing autonomous behaviors when the relationship between dynamics and goals are complex may be addressed with reinforcement learning, which has gained attention recently for solving previously intractable tasks such as strategy games like go, chess and videogames such as Atari and Starcraft II, Koppel said.

Prevailing practice, unfortunately, demands astronomical sample complexity, such as thousands of years of simulated gameplay, he said. This sample complexity renders many common training mechanisms inapplicable to data-starved settings required by MDO context for the Next-Generation Combat Vehicle, or NGCV.

"To facilitate reinforcement learning for MDO and NGCV, training mechanisms must improve sample efficiency and reliability in continuous spaces," Koppel said. "Through the generalization of existing policy search schemes to general utilities, we take a step towards breaking existing sample efficiency barriers of prevailing practice in reinforcement learning."

Koppel and his research team developed new policy search schemes for general utilities, whose sample complexity is also established. They observed that the resulting policy search schemes reduce the volatility of reward accumulation, yield efficient exploration of an unknown domains and a mechanism for incorporating prior experience.

"This research contributes an augmentation of the classical Policy Gradient Theorem in reinforcement learning," Koppel said. "It presents new policy search schemes for general utilities, whose sample complexity is also established. These innovations are impactful to the U.S. Army through their enabling of reinforcement learning objectives beyond the standard cumulative return, such as risk sensitivity, safety constraints, exploration and divergence to a prior."

Notably, in the context of ground robots, he said, data is costly to acquire.

"Reducing the volatility of reward accumulation, ensuring one explores an unknown domain in an efficient manner, or incorporating prior experience, all contribute towards breaking existing sample efficiency barriers of prevailing practice in reinforcement learning by alleviating the amount of random sampling one requires in order to complete policy optimization," Koppel said.

The future of this research is very bright, and Koppel has dedicated his efforts towards making his findings applicable for innovative technology for Soldiers on the battlefield.

"I am optimistic that reinforcement-learning equipped autonomous robots will be able to assist the warfighter in exploration, reconnaissance and risk assessment on the future battlefield," Koppel said. "That this vision is made a reality is essential to what motivates which research problems I dedicate my efforts."

The next step for this research is to incorporate the broader decision-making goals enabled by general utilities in reinforcement learning into multi-agent settings and investigate how interactive settings between reinforcement learning agents give rise to synergistic and antagonistic reasoning among teams.

According to Koppel, the technology that results from this research will be capable of reasoning under uncertainty in team scenarios.

Provided by The Army Research Laboratory

Citation: Army research leads to more effective training model for robots (2020, December 29) retrieved 16 April 2024 from https://techxplore.com/news/2020-12-army-effective-robots.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Army advances learning capabilities of drone swarms

18 shares

Feedback to editors

Neutron scattering study points the way to more powerful lithium batteries

33 minutes ago

Taichi: A large-scale diffractive hybrid photonic AI chiplet

7 hours ago

New insight about the working principles of bipolar membranes could guide future fuel cell design

9 hours ago

Using sound waves for photonic machine learning: Study lays foundation for reconfigurable neuromorphic building blocks

11 hours ago

Samsung returns to top of the smartphone market: Industry tracker

12 hours ago

Safeguarding the future of online security with AI and metasurfaces

Apr 15, 2024

Security vulnerability in browser interface allows computer access via graphics card

Apr 15, 2024

AI's new power of persuasion: Study shows LLMs can exploit personal information to change your mind

Apr 15, 2024

Research team manufactures the first universal, programmable and multifunctional photonic chip

Apr 15, 2024

Researchers develop stretchable quantum dot display

Apr 15, 2024

Load comments (0)

Army research leads to more effective training model for robots

Neutron scattering study points the way to more powerful lithium batteries

Taichi: A large-scale diffractive hybrid photonic AI chiplet

New insight about the working principles of bipolar membranes could guide future fuel cell design

Using sound waves for photonic machine learning: Study lays foundation for reconfigurable neuromorphic building blocks

Samsung returns to top of the smartphone market: Industry tracker

Safeguarding the future of online security with AI and metasurfaces

Security vulnerability in browser interface allows computer access via graphics card

AI's new power of persuasion: Study shows LLMs can exploit personal information to change your mind

Research team manufactures the first universal, programmable and multifunctional photonic chip

Researchers develop stretchable quantum dot display

Army advances learning capabilities of drone swarms

Researchers exploit weaknesses of master game bots

Researchers introduce new algorithm to reduce machine learning time

Training agents to walk with purpose: Improving machine learning and relational data classification

Robots deciding their next move need help prioritizing

Teaching humanoid robots different locomotion behaviors using human demonstrations

Using sound waves for photonic machine learning: Study lays foundation for reconfigurable neuromorphic building blocks

Taichi: A large-scale diffractive hybrid photonic AI chiplet

AI's new power of persuasion: Study shows LLMs can exploit personal information to change your mind

Engineers recreate Star Trek's Holodeck using ChatGPT and video game assets

Adding a telescopic leg beneath a quadcopter to create a hopping drone

New computer vision tool can count damaged buildings in crisis zones and accurately estimate bird flock sizes

Phys.org

Medical Xpress

Science X

Army research leads to more effective training model for robots

Neutron scattering study points the way to more powerful lithium batteries

Taichi: A large-scale diffractive hybrid photonic AI chiplet

New insight about the working principles of bipolar membranes could guide future fuel cell design

Using sound waves for photonic machine learning: Study lays foundation for reconfigurable neuromorphic building blocks

Samsung returns to top of the smartphone market: Industry tracker

Safeguarding the future of online security with AI and metasurfaces

Security vulnerability in browser interface allows computer access via graphics card

AI's new power of persuasion: Study shows LLMs can exploit personal information to change your mind

Research team manufactures the first universal, programmable and multifunctional photonic chip

Researchers develop stretchable quantum dot display

Related Stories

Army advances learning capabilities of drone swarms

Researchers exploit weaknesses of master game bots

Researchers introduce new algorithm to reduce machine learning time

Training agents to walk with purpose: Improving machine learning and relational data classification

Robots deciding their next move need help prioritizing

Teaching humanoid robots different locomotion behaviors using human demonstrations

Recommended for you

Using sound waves for photonic machine learning: Study lays foundation for reconfigurable neuromorphic building blocks

Taichi: A large-scale diffractive hybrid photonic AI chiplet

AI's new power of persuasion: Study shows LLMs can exploit personal information to change your mind

Engineers recreate Star Trek's Holodeck using ChatGPT and video game assets

Adding a telescopic leg beneath a quadcopter to create a hopping drone

New computer vision tool can count damaged buildings in crisis zones and accurately estimate bird flock sizes

Your Privacy