July 6, 2018

An AI system for editing music in videos

by Adam Conner-Simons, Massachusetts Institute of Technology

Amateur and professional musicians alike may spend hours pouring over YouTube clips to figure out exactly how to play certain parts of their favorite songs. But what if there were a way to play a video and isolate the only instrument you wanted to hear?

That's the outcome of a new AI project out of MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL): a deep-learning system that can look at a video of a musical performance, and isolate the sounds of specific instruments and make them louder or softer.

The system, which is "self-supervised," doesn't require any human annotations on what the instruments are or what they sound like.

Trained on over 60 hours of videos, the "PixelPlayer" system can view a never-before-seen musical performance, identify specific instruments at pixel level, and extract the sounds that are associated with those instruments.

For example, it can take a video of a tuba and a trumpet playing the "Super Mario Brothers" theme song, and separate out the soundwaves associated with each instrument.

The researchers say that the ability to change the volume of individual instruments means that in the future, systems like this could potentially help engineers improve the audio quality of old concert footage. You could even imagine producers taking specific instrument parts and previewing what they would sound like with other instruments (i.e. an electric guitar swapped in for an acoustic one).

In a new paper, the team demonstrated that PixelPlayer can identify the sounds of more than 20 commonly seen instruments. Lead author Hang Zhao says that the system would be able to identify many more instruments if it had more training data, though it still may have trouble handling subtle differences between subclasses of instruments (such as an alto sax versus a tenor).

Previous efforts to separate the sources of sound have focused exclusively on audio, which often requires extensive human labeling. In contrast, PixelPlayer introduces the element of vision, which the researchers say makes human labels unnecessary, as vision provides self-supervision.

The system first locates the image regions that produce sounds, and then separates the input sounds into a set of components that represent the sound from each pixel.

"We expected a best-case scenario where we could recognize which instruments make which kinds of sounds," says Zhao, a Ph.D. student at CSAIL. "We were surprised that we could actually spatially locate the instruments at the pixel level. Being able to do that opens up a lot of possibilities, like being able to edit the audio of individual instruments by a single click on the video."

PixelPlayer uses methods of "deep learning," meaning that it finds patterns in data using so-called "neural networks" that have been trained on existing videos. Specifically, one neural network analyzes the visuals of the video, one analyzes the audio, and a third "synthesizer" associates specific pixels with specific soundwaves to separate the different sounds.

The fact that PixelPlayer uses so-called "self-supervised" deep learning means that the MIT team doesn't explicitly understand every aspect of how it learns which instruments make which sounds.

However, Zhao says that he can tell that the system seems to recognize actual elements of the music. For example, certain harmonic frequencies seem to correlate to instruments like violin, while quick pulse-like patterns correspond to instruments like the xylophone.

Zhao says that a system like PixelPlayer could even be used on robots to better understand the environmental sounds that other objects make, such as animals or vehicles.

Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: An AI system for editing music in videos (2018, July 6) retrieved 6 July 2024 from https://techxplore.com/news/2018-07-ai-music-videos.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Translating instruments, styles, genres at Facebook Artificial Intelligence Research

115 shares

Feedback to editors

New contaminant-tolerant catalyst could help capture carbon directly from smokestacks

16 hours ago

Is AI a major drain on the world's energy supply?

16 hours ago

Adding audio data when training robots helps them do a better job

17 hours ago

New electrolyte design boosts lithium metal battery range while minimizing fluorine content

17 hours ago

A new brain-inspired artificial dendritic neural circuit

18 hours ago

Student designs wearable purifier to protect underground train users and improve air quality

Jul 4, 2024

Cool roofs outperform green roofs in urban climate modeling study

Jul 4, 2024

Japan deploys humanoid robot for railway maintenance

Jul 4, 2024

Think you're funny? ChatGPT might be funnier

Jul 3, 2024

'Open-washing' generative AI: How Meta, Google and others feign openness

Jul 3, 2024

Load comments (1)

An AI system for editing music in videos

New contaminant-tolerant catalyst could help capture carbon directly from smokestacks

Is AI a major drain on the world's energy supply?

Adding audio data when training robots helps them do a better job

New electrolyte design boosts lithium metal battery range while minimizing fluorine content

A new brain-inspired artificial dendritic neural circuit

Student designs wearable purifier to protect underground train users and improve air quality

Cool roofs outperform green roofs in urban climate modeling study

Japan deploys humanoid robot for railway maintenance

Think you're funny? ChatGPT might be funnier

'Open-washing' generative AI: How Meta, Google and others feign openness

Translating instruments, styles, genres at Facebook Artificial Intelligence Research

Lost sounds of the past brought to life (w/ Video, Audio)

Music in our ears: The science of timbre

Human-computer music performances use system that links music and musical gestures (w/ Video)

Machine learning drives NSynth Super's new sounds of music

Tiny speakers break a barrier for sound

Computer scientists develop new and improved camera inspired by the human eye

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Phys.org

Medical Xpress

Science X

An AI system for editing music in videos

New contaminant-tolerant catalyst could help capture carbon directly from smokestacks

Is AI a major drain on the world's energy supply?

Adding audio data when training robots helps them do a better job

New electrolyte design boosts lithium metal battery range while minimizing fluorine content

A new brain-inspired artificial dendritic neural circuit

Student designs wearable purifier to protect underground train users and improve air quality

Cool roofs outperform green roofs in urban climate modeling study

Japan deploys humanoid robot for railway maintenance

Think you're funny? ChatGPT might be funnier

'Open-washing' generative AI: How Meta, Google and others feign openness

Related Stories

Translating instruments, styles, genres at Facebook Artificial Intelligence Research

Lost sounds of the past brought to life (w/ Video, Audio)

Music in our ears: The science of timbre

Human-computer music performances use system that links music and musical gestures (w/ Video)

Machine learning drives NSynth Super's new sounds of music

Tiny speakers break a barrier for sound

Recommended for you

Computer scientists develop new and improved camera inspired by the human eye

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Your Privacy