June 26, 2020

Computational model decodes speech by predicting it

The brain analyzes spoken language by recognizing syllables. Scientists from the University of Geneva (UNIGE) and the Evolving Language National Centre for Competence in Research (NCCR) have designed a computational model that reproduces the complex mechanism employed by the central nervous system to perform this operation. The model, which brings together two independent theoretical frameworks, uses the equivalent of neuronal oscillations produced by brain activity to process the continuous sound flow of connected speech.

The model functions according to a theory known as predictive coding, whereby the brain optimizes perception by constantly trying to predict the sensory signals based on candidate hypotheses (syllables in this model). The resulting model, described in the journal Nature Communications, has helped the live recognition of thousands of syllables contained in hundreds of sentences spoken in natural language. This has validated the idea that neuronal oscillations can be used to coordinate the flow of syllables we hear with the predictions made by our brain.

"Brain activity produces neuronal oscillations that can be measured using electroencephalography," says Anne-Lise Giraud, professor in the Department of Basic Neurosciences in UNIGE's Faculty of Medicine and co-director of the Evolving Language NCCR. These are electromagnetic waves that result from the coherent electrical activity of entire networks of neurons. There are several types, defined according to their frequency. They are called alpha, beta, theta, delta or gamma waves. Taken individually or superimposed, these rhythms are linked to different cognitive functions, such as perception, memory, attention, alertness, etc.

However, neuroscientists do not yet know whether they actively contribute to these functions and how. In an earlier study published in 2015, Professor Giraud's team showed that the theta waves (low frequency) and gamma waves (high frequency) coordinate to sequence the sound flow in syllables and to analyze their content so they can be recognized.

The Geneva-based scientists developed a spiking neural network computer model based on these physiological rhythms, whose performance in sequencing live (on-line) syllables was better than that of traditional automatic speech recognition systems.

The rhythm of the syllables

In their first model, the theta waves (between 4 and 8 Hertz) made it possible to follow the rhythm of the syllables as they were perceived by the system. Gamma waves (around 30 Hertz) were used to segment the auditory signal into smaller slices and encode them. This produces a "phonemic" profile linked to each sound sequence, which could be compared, a posteriori, to a library of known syllables. One of the advantages of this type of model is that it spontaneously adapts to the speed of speech, which can vary from one individual to another.

Predictive coding

In this new article, to stay closer to the biological reality, Professor Giraud and her team developed a new model where they incorporate elements from another theoretical framework, independent of the neuronal oscillations: "predictive coding."

"This theory holds that the brain functions so optimally because it is constantly trying to anticipate and explain what is happening in the environment by using learned models of how outside events generate sensory signals. In the case of spoken language, it attempts to find the most likely causes of the sounds perceived by the ear as speech unfolds, on the basis of a set of mental representations that have been learned and that are being permanently updated," says Dr. Itsaso Olasagasti, computational neuroscientist in Giraud's team, who supervised the new model implementation.

"We developed a computer model that simulates this predictive coding," explains Sevada Hovsepyan, a researcher in the Department of Basic Neurosciences and the article's first author. "And we implemented it by incorporating oscillatory mechanisms."

Tested on 2,888 syllables

The sound entering the system is first modulated by a theta (slow) wave that resembles what neuron populations produce. It makes it possible to signal the contours of the syllables. Trains of (fast) gamma waves then help encode the syllable as and when it is perceived. During the process, the system suggests possible syllables and corrects the choice if necessary. After going back and forth between the two levels several times, it discovers the right syllable. The system is subsequently reset to zero at the end of each perceived syllable.

The model has been successfully tested using 2,888 different syllables contained in 220 sentences, spoken in natural language in English. "On the one hand, we succeeded in bringing together two very different theoretical frameworks in a single computer model," says Professor Giraud. "On the other, we have shown that neuronal oscillations most likely rhythmically align the endogenous functioning of the brain with signals that come from outside via the sensory organs. If we put this back in predictive coding theory, it means that these oscillations probably allow the brain to make the right hypothesis at exactly the right moment."

More information: Sevada Hovsepyan et al. Combining predictive coding and neural oscillations enables online syllable recognition in natural speech, Nature Communications (2020). DOI: 10.1038/s41467-020-16956-5

Journal information: Nature Communications

Provided by University of Geneva

Citation: Computational model decodes speech by predicting it (2020, June 26) retrieved 19 April 2024 from https://techxplore.com/news/2020-06-decodes-speech.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Syllables that oscillate in neuronal circuits: What neuroscience can say about speech processing in the brain

182 shares

Feedback to editors

Researchers develop sodium battery capable of rapid charging in just a few seconds

1 hour ago

Greater access to clean water, thanks to a better membrane

2 hours ago

Silent flight edges closer to take off, according to new research

3 hours ago

A flexible and efficient DC power converter for sustainable-energy microgrids

3 hours ago

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

4 hours ago

To build a better AI helper, start by modeling the irrational behavior of humans

4 hours ago

Versatile fibers offer improved energy storage capacity for wearable devices

5 hours ago

Harnessing solar energy for high-efficiency NH₃ production

5 hours ago

A dexterous four-legged robot that can walk and handle objects simultaneously

7 hours ago

Climate change will increase value of residential rooftop solar panels across US, study finds

9 hours ago

Load comments (0)

Computational model decodes speech by predicting it

The rhythm of the syllables

Predictive coding

Tested on 2,888 syllables

Researchers develop sodium battery capable of rapid charging in just a few seconds

Greater access to clean water, thanks to a better membrane

Silent flight edges closer to take off, according to new research

A flexible and efficient DC power converter for sustainable-energy microgrids

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Versatile fibers offer improved energy storage capacity for wearable devices

Harnessing solar energy for high-efficiency NH₃ production

A dexterous four-legged robot that can walk and handle objects simultaneously

Climate change will increase value of residential rooftop solar panels across US, study finds

Syllables that oscillate in neuronal circuits: What neuroscience can say about speech processing in the brain

How the brain detects the rhythms of speech

'I predict your words': That is how we understand what others say to us

In loud rooms our brains 'hear' in a different way – new findings

Brain patterns can predict speech of words and syllables

Move over, 'Laurel or Yanny': Study looks at why we hear talking as singing after many repetitions

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Team develops a way to teach a computer to type like a human

For more open and equitable public discussions on social media, try 'meronymity'

Using sim-to-real reinforcement learning to train robots to do simple tasks in broad environments

Meta's newest AI model beats some peers. But its amped-up AI agents are confusing Facebook users

Phys.org

Medical Xpress

Science X

Computational model decodes speech by predicting it

The rhythm of the syllables

Predictive coding

Tested on 2,888 syllables

Researchers develop sodium battery capable of rapid charging in just a few seconds

Greater access to clean water, thanks to a better membrane

Silent flight edges closer to take off, according to new research

A flexible and efficient DC power converter for sustainable-energy microgrids

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Versatile fibers offer improved energy storage capacity for wearable devices

Harnessing solar energy for high-efficiency NH₃ production

A dexterous four-legged robot that can walk and handle objects simultaneously

Climate change will increase value of residential rooftop solar panels across US, study finds

Related Stories

Syllables that oscillate in neuronal circuits: What neuroscience can say about speech processing in the brain

How the brain detects the rhythms of speech

'I predict your words': That is how we understand what others say to us

In loud rooms our brains 'hear' in a different way – new findings

Brain patterns can predict speech of words and syllables

Move over, 'Laurel or Yanny': Study looks at why we hear talking as singing after many repetitions

Recommended for you

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Team develops a way to teach a computer to type like a human

For more open and equitable public discussions on social media, try 'meronymity'

Using sim-to-real reinforcement learning to train robots to do simple tasks in broad environments

Meta's newest AI model beats some peers. But its amped-up AI agents are confusing Facebook users

Your Privacy