March 6, 2024 feature

The AI bassist: Sony's vision for a new paradigm in music production

by Ingrid Fadelli , Tech Xplore

Generative artificial intelligence (AI) tools are becoming increasingly advanced and are now used to produce various personalized content, including images, videos, logos, and audio recordings. Researchers at Sony Computer Science Laboratories (CSL) have recently been working on tools for producers and artists that can assist them in creating new music.

In a recent paper posted on the arXiv preprint server, researcher Marco Pasini and his colleagues Stefan Lattner and Maarten Grachten at Sony CSL, introduced a new latent diffusion model that can create realistic and effective bass accompaniments for musical tracks. Diffusion models are deep learning techniques that can learn to generate images, audio or other samples that capture the overall structure underlying a dataset.

"Musical audio generation is currently a popular research topic, with many institutes, companies, and start-ups exploring various use cases," co-author Lattner told Tech Xplore. "At Sony CSL, we aim to assist music artists and producers in their workflow by providing AI-powered tools. However, we have noticed that the most common approach of AI tools generating complete musical pieces from scratch (often controlled only by text input) is not very interesting to artists."

When reviewing previously proposed music generation techniques, the researchers at Sony CSL found that they were not optimal for artists and producers. Specifically, they found that many tools did not allow users to create music aligned with their unique preferences and style.

"Artists require tools that can adjust to their unique style and can be utilized at any point in their music production process," Lattner said. "Therefore, a generative music tool should be able to analyze and take into account any intermediate creation of the artist when proposing new sounds."

In their recent paper, the researchers introduced a new model that can automatically generate bass accompaniments that match the style and tonality of an input music track, irrespective of the elements it contains (i.e., vocals, guitar, drums, etc.). Their proposed tool was designed to generate incisive basslines that complement songs well, thus assisting producers and artists in their creative process.

"Our system can process any type of musical mix that contains one or more sources, such as vocals, guitar, etc.," Lattner explained. "It consists of an audio autoencoder that efficiently encodes the mix into a compressed representation, capturing the essence of the music. This compressed encoding is then used as input to a specially designed architecture based on a state-of-the-art generative technology called 'latent diffusion.' This method generates data in a compressed space, which improves performance and quality."

Lattner and his colleagues trained their latent diffusion model on a dataset of bass guitar encodings containing various music track examples. Over time, the model learned to create a bassline that "plays along" with an input music track.

"Our system has a unique advantage: it can generate coherent basslines of any length, as opposed to fixed durations," Lattner said. "We also proposed a technique called 'style grounding' that allows users to control the timbre and playing style of the generated bass by providing a reference audio file."

The researchers evaluated their latent diffusion model in a series of tests and found that it could generate appropriate bass accompaniments to arbitrary song mixes. Notably, the creative bass lines it produced closely matched the tonality and rhythm of an input music mix.

"We presented what we believe is the first conditional latent diffusion model designed specifically for audio-based accompaniment generation tasks," Lattner said. "By training it on paired data of mixes and matching basslines, the model learns the concept of musical coherence."

In the future, the new bassline generation tool created by Pasini and his colleagues could be used by musicians, producers, and composers worldwide, helping them write or improve instrumental parts of their tracks. The researchers now plan to create similar models that produce other instrumental elements, such as drums, piano, guitar, string, and sound effect accompaniments.

"With further development, we envision creative tools where users can customize the bass or other accompaniments that they can seamlessly integrate with their compositions," Lattner added.

"Additional directions for future research involve providing additional, intuitive control mechanisms—in addition to audio references, users could guide the style through free-form text prompts or descriptive stylistic tags. More broadly, we plan to collaborate directly with artists and composers to refine further and validate these AI accompaniment tools to best enhance their creative needs."

More information: Marco Pasini et al, Bass Accompaniment Generation via Latent Diffusion, arXiv (2024). DOI: 10.48550/arxiv.2402.01412

Journal information: arXiv

Citation: The AI bassist: Sony's vision for a new paradigm in music production (2024, March 6) retrieved 27 April 2024 from https://techxplore.com/news/2024-03-ai-bassist-sony-vision-paradigm.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

SampleMatch: A model that automatically retrieves matching drum samples for musical tracks

301 shares

Feedback to editors

Computer scientists unveil novel attacks on cybersecurity

1 hour ago

Proof of concept study shows path to easier recycling of solar modules

19 hours ago

New circuit boards can be repeatedly recycled

21 hours ago

Researchers develop an automated benchmark for language-based task planners

21 hours ago

Built-in bionic computing: Researchers develop method to control pneumatic artificial muscles

21 hours ago

Custom-made catalyst leads to longer-lasting and more sustainable green hydrogen production

21 hours ago

Researchers outline path forward for tandem solar cells

23 hours ago

Researcher develop high-performance amorphous p-type oxide semiconductor

23 hours ago

Scientists create new atomic clock that is both ultra-precise and sturdy

Apr 26, 2024

A framework to compare lithium battery testing data and results during operation

Apr 26, 2024

Load comments (3)

The AI bassist: Sony's vision for a new paradigm in music production

Computer scientists unveil novel attacks on cybersecurity

Proof of concept study shows path to easier recycling of solar modules

New circuit boards can be repeatedly recycled

Researchers develop an automated benchmark for language-based task planners

Built-in bionic computing: Researchers develop method to control pneumatic artificial muscles

Custom-made catalyst leads to longer-lasting and more sustainable green hydrogen production

Researchers outline path forward for tandem solar cells

Researcher develop high-performance amorphous p-type oxide semiconductor

Scientists create new atomic clock that is both ultra-precise and sturdy

A framework to compare lithium battery testing data and results during operation

SampleMatch: A model that automatically retrieves matching drum samples for musical tracks

Exploring text-to-audio models to make music from scratch

Using a GAN architecture to restore heavily compressed music files

LyricJam: A system that can generate lyrics for live instrumental music

Text-to-audio generation is here: One of the next big AI disruptions could be in the music industry

A system to generate new song lyrics that match the style of specific artists

Computer scientists unveil novel attacks on cybersecurity

Researchers develop an automated benchmark for language-based task planners

Study explores why human-inspired machines can be perceived as eerie

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Emulating neurodegeneration and aging in artificial intelligence systems

Microsoft claims that small, localized language models can be powerful as well

Phys.org

Medical Xpress

Science X

The AI bassist: Sony's vision for a new paradigm in music production

Computer scientists unveil novel attacks on cybersecurity

Proof of concept study shows path to easier recycling of solar modules

New circuit boards can be repeatedly recycled

Researchers develop an automated benchmark for language-based task planners

Built-in bionic computing: Researchers develop method to control pneumatic artificial muscles

Custom-made catalyst leads to longer-lasting and more sustainable green hydrogen production

Researchers outline path forward for tandem solar cells

Researcher develop high-performance amorphous p-type oxide semiconductor

Scientists create new atomic clock that is both ultra-precise and sturdy

A framework to compare lithium battery testing data and results during operation

Related Stories

SampleMatch: A model that automatically retrieves matching drum samples for musical tracks

Exploring text-to-audio models to make music from scratch

Using a GAN architecture to restore heavily compressed music files

LyricJam: A system that can generate lyrics for live instrumental music

Text-to-audio generation is here: One of the next big AI disruptions could be in the music industry

A system to generate new song lyrics that match the style of specific artists

Recommended for you

Computer scientists unveil novel attacks on cybersecurity

Researchers develop an automated benchmark for language-based task planners

Study explores why human-inspired machines can be perceived as eerie

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Emulating neurodegeneration and aging in artificial intelligence systems

Microsoft claims that small, localized language models can be powerful as well

Your Privacy