A new approach to overcome multi-model forgetting in deep neural networks

In recent years, researchers have developed deep neural networks that can perform a variety of tasks, including visual recognition and natural language processing (NLP) tasks. Although many of these models achieved remarkable results, they typically only perform well on one particular task due to what is referred to as "catastrophic forgetting."

Essentially, catastrophic forgetting means that when a model that was initially trained on task A is later trained on task B, its performance on task A will significantly decline. In a paper pre-published on arXiv, researchers at Swisscom and EPFL identified a new kind of forgetting and proposed a new approach that could help to overcome it via a statistically justified weight plasticity loss.

"When we first started working on our project, designing neural architectures automatically was computationally expensive and unfeasible for most companies," Yassine Benyahia and Kaicheng Yu, the study's primary investigators, told TechXplore via e-mail. "The original aim of our study was to identify new methods to reduce this expense. When the project started, a paper by Google claimed to have drastically reduced the time and resources required to build neural architectures using a new method called weight-sharing. This made autoML feasible for researchers without huge GPU clusters, encouraging us to study this topic more in depth."

During their research into neural network-based models, Benyahia, Yu and their colleagues noticed a problem with weight sharing. When they trained two models (e.g. A and B) sequentially, model A's performance declined, while model B's performance increased, or vice versa. They showed that this phenomenon, which they called "multi-model forgetting," can hinder the performance of several auto-mL approaches, including Google's efficient neural architecture search (ENAS).

"We realized that weight-sharing was causing models to impact each other negatively, which was causing the architecture search process to be closer to random," Benyahia and Yu explained. "We also had our reserves on architecture search, where only the final results are shed to light and where there is no good framework to evaluate the quality of the architecture search in a fair way. Our approach could help to fix this forgetting problem, as it is related to a core method that nearly all recent autoML papers rely on, and we consider such impact to be huge to the community."

In their study, the researchers modeled multi-model forgetting mathematically and derived a novel loss, called weight plasticity loss. This loss could reduce multi-model forgetting substantially by regularizing the learning of a model's shared parameters according to their importance for previous models.

"Basically, due to the over-parameterization of neural networks, our loss decreases parameters that are 'less important' to the final loss first, and keeps the more important ones unchanged," Benyahia and Yu said. "Model A's performance is thus unaffected, while model B's performance keeps increasing. On small datasets, our model can reduce forgetting up to 99 percent, and on autoML methods, up to 80 percent in the middle of training."

In a series of tests, the researchers demonstrated the effectiveness of their approach for decreasing multi-model forgetting, both in instances where two models are trained sequentially and for neural architecture search. Their findings suggest that adding weight plasticity in neural architecture search can significantly improve the performance of multiple models on both NLP and computer vision tasks.

The study carried out by Benyahia, Yu and their colleagues sheds light on the issue of catastrophic forgetting, particularly that which occurs when multiple models are trained sequentially. After modeling this problem mathematically, the researchers introduced a solution that could overcome it, or at least drastically reduce its impact.

"In multi-model forgetting, our guiding principle was to think in formulas and not just by simple intuition or heuristics," Benyahia and Yu said. "We strongly believe that this 'thinking in formulas' can lead researchers to great discoveries. That is why for further research, we aim to apply this approach to other fields of machine learning. In addition, we plan to adapt our loss to recent state-of-the-art autoML methods to demonstrate its effectiveness in solving the weight-sharing problem observed by us."

More information: Overcoming multi-model forgetting. arXiv:1902.08232 [cs.LG]. arxiv.org/abs/1902.08232

Efficient neural architecture search via parameter sharing. arXiv:1802.03268 [cs.LG]. arxiv.org/abs/1802.03268

A new approach to overcome multi-model forgetting in deep neural networks

A new approach for modeling central pattern generators (CPGs) in reinforcement learning

A new framework to generate human motions from language prompts

Personalization has the potential to democratize who decides how LLMs behave

Holographic displays offer a glimpse into an immersive future

Neural networks can mediate between download size and quality, according to researcher

A coffee roastery in Finland has launched an AI-generated blend. The results were surprising

Microsoft teases lifelike avatar AI tech but gives no release date

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

The world's largest 3D printer is at a university in Maine. It just unveiled an even bigger one

Researchers develop tiny chip that can safeguard user data while enabling efficient computing on a smartphone

Aerogel-based phase change materials improve thermal management, reduce microwave emissions in electronic devices

Researchers develop high-energy-density aqueous battery based on halogen multi-electron transfer

Extracting high-purity gold from electrical and electronic waste

How potatoes, corn and beans led to breakthrough in smart windows technology

New metasurface innovation unlocks precision control in wireless signals

A win-win approach: Maximizing Wi-Fi performance using game theory

Plasma treatment enhances electrode material for fuel cells in industry, homes and vehicles

People, not design features, make a robot social

An ultralow-concentration electrolyte for lithium-ion batteries

A new approach to overcome multi-model forgetting in deep neural networks

Let us know if there is a problem with our content

Thank you for taking time to provide your feedback to the editors

Share article

E-MAIL THE STORY