Side view of Valkyrie robot and the 2D humanoid character modelled according to Valkyrie robot. Credit: Yang, Komura & Li

Researchers at the University of Edinburgh have developed a hierarchical framework based on deep reinforcement learning (RL) that can acquire a variety of strategies for humanoid balance control. Their framework, outlined in a paper pre-published on arXiv and presented at the 2017 International Conference on Humanoid Robotics, could perform far more human-like balancing behaviors than conventional controllers.

When standing or walking, human beings innately and effectively use a number of techniques for under-actuated that help them to keep their balance. These include toe tilting and heel rolling, which create better foot-ground clearance. Replicating similar behaviors in could greatly improve their motor and movement capabilities.

"Our research focuses on using deep RL to solve dynamic locomotion of humanoid robots," Dr. Zhibin Li, a lecturer in robotics and control at the University of Edinburgh, who carried out the study, told TechXplore. "In the past, locomotion was mainly done using conventional analytical approaches—model based, which are limited because they require human effort and knowledge, and demand high computing power to run online."

Requiring less human effort and manual tuning, machine learning techniques could lead to the development of more effective and specific controllers than traditional engineering approaches. A further advantage of using RL is that the computation for these tools can also be outsourced offline, resulting in faster online performance for high dimensional control systems, such as humanoid robots.

A simulated Valkyrie robot in toe/heel tilting pose. Credit: Yang, Komura & Li

"Given the increasingly powerful deep RL algorithms, an increasing number of research studies have started using deep RL to solve control tasks, as the recent progress in deep RL algorithms designed for continuous action domain has brought forward the possibility to apply reinforcement learning continuous control tasks that involve complicated dynamics," Dr. Li explained. "The main objective of our research was to explore the possibilities of using deep reinforcement learning to acquire versatile control policies comparable or better than analytical approaches while using less human effort."

The framework developed by Dr. Li, in collaboration with Dr. Taku Komura and Ph.D. student Chuanyu Yang, uses deep RL to attain high-level control policies. Constantly receiving feedback of the robot's state, these strategies enable desired joint angles at a lower frequency.

"At the low-level, proportional and derivative (PD) controllers are used at a much higher control frequency to guarantee the stable joint motions," Ph.D. student Chuanyu said. "The inputs for the low-level PD controller are desired joint angles produced by the high-level neural network, and the outputs are the desired torques for joint motors."

The researchers tested the performance of their algorithm and achieved highly promising results. They found that transferring human knowledge from control engineering methods to the reward design for RL algorithms enabled balance control strategies that resembled those used by humans. Moreover, as RL algorithms improve through a trial and error process, automatically adapting to new situations, their framework requires little hand tuning or other interventions by human engineers.

State features for the biped. Yang, Komura & Li

"Our study shows that learning can be a powerful tool to produce comparable balancing results to that of a human-engineered controller with less manual tuning effort and shorter time," Dr. Li said. "The algorithm we developed is even capable of learning emerged human-like behaviors such as tilting around toes or heels, which most engineering methods are unable to perform."

Dr. Li and his colleagues are now working on an extension of their study that applies RL to a full body Valkyrie robot in a 3-D simulation. In this new research effort, they were able to generalize human-resembling balancing strategies to walking and other locomotion tasks.

"Eventually, we would like to apply this hierarchical framework of combining machine learning and robot control to real humanoid robots, as well as to other robotic platforms," Dr. Li said.

More information: Emergence of human-comparable balancing behaviors by deep reinforcement learning. arXiv: 1809.02074v1 [cs.RO]. arxiv.org/abs/1809.02074