August 23, 2021

Improve machine learning performance by dropping the zeros

by King Abdullah University of Science and Technology

KAUST researchers have found a way to significantly increase the speed of training. Large machine learning models can be trained significantly faster by observing how frequently zero results are produced in distributed machine learning that use large training datasets.

AI models develop their "intelligence" by being trained on datasets that have been labeled to tell the model how to differentiate between different inputs and then respond accordingly. The more labeled data that goes in, the better the model becomes at performing whatever task it has been assigned to do. For complex deep learning applications, such as self-driving vehicles, this requires enormous input datasets and very long training times, even when using powerful and expensive highly parallel supercomputing platforms.

During training, small learning tasks are assigned to tens or hundreds of computing nodes, which then share their results over a communications network before running the next task. One of the biggest sources of computing overhead in such parallel computing tasks is actually this communication among computing nodes at each model step.

"Communication is a major performance bottleneck in distributed deep learning," explains Jiawei Fei from the KAUST team. "Along with the fast-paced increase in model size, we also see an increase in the proportion of zero values that are produced during the learning process, which we call sparsity. Our idea was to exploit this sparsity to maximize effective bandwidth usage by sending only non-zero data blocks."

Building on an earlier KAUST development called SwitchML, which optimized internode communications by running efficient aggregation code on the network switches that process data transfer, Fei, Marco Canini and their colleagues went a step further by identifying zero results and developing a way to drop transmission without interrupting the synchronization of the parallel computing process.

"Exactly how to exploit sparsity to accelerate distributed training is a challenging problem," says Fei. "All nodes need to process data blocks at the same location in a time slot, so we have to coordinate the nodes to ensure that only data blocks in the same location are aggregated. To overcome this, we created an aggregator process to coordinate the workers, instructing them on which block to send next."

The team demonstrated their OmniReduce scheme on a testbed consisting of an array of graphics processing units (GPU) and achieved an eight-fold speed-up for typical deep learning tasks.

"We are now adapting OmniReduce to run on programmable switches using in-network computation to further improve performance," Fei says.

More information: Jiawei Fei et al, Efficient sparse collective communication and its application to accelerate distributed deep learning, Proceedings of the 2021 ACM SIGCOMM 2021 Conference (2021). DOI: 10.1145/3452296.3472904

Provided by King Abdullah University of Science and Technology

Citation: Improve machine learning performance by dropping the zeros (2021, August 23) retrieved 29 June 2024 from https://techxplore.com/news/2021-08-machine-zeros.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Machine learning at speed with in-network aggregation

225 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

20 hours ago

Researchers develop the fastest possible flow algorithm

23 hours ago

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

Improve machine learning performance by dropping the zeros

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Machine learning at speed with in-network aggregation

Predicting wireless traffic using AI could improve the reliability of future wireless communication

Brain-on-a-chip would need little training

Platform teaches nonexperts to use machine learning

Automated MRI image labelling processes 100,000 brain exams in under 30 minutes

Less chat leads to more work for machine learning

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Phys.org

Medical Xpress

Science X

Improve machine learning performance by dropping the zeros

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Machine learning at speed with in-network aggregation

Predicting wireless traffic using AI could improve the reliability of future wireless communication

Brain-on-a-chip would need little training

Platform teaches nonexperts to use machine learning

Automated MRI image labelling processes 100,000 brain exams in under 30 minutes

Less chat leads to more work for machine learning

Recommended for you

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Your Privacy