January 10, 2023

Research could bring automatic speech recognition to 2,000 languages

by Aaron Aupperlee, Carnegie Mellon University

Only a fraction of the 7,000 to 8,000 languages spoken around the world benefit from modern language technologies like voice-to-text transcription, automatic captioning, instantaneous translation and voice recognition. Carnegie Mellon University researchers want to expand the number of languages with automatic speech recognition tools available to them from around 200 to potentially 2,000.

"A lot of people in this world speak diverse languages, but language technology tools aren't being developed for all of them," said Xinjian Li, a Ph.D. student in the School of Computer Science's Language Technologies Institute (LTI). "Developing technology and a good language model for all people is one of the goals of this research."

Li is part of a research team aiming to simplify the data requirements languages need to create a speech recognition model. The team—which also includes LTI faculty members Shinji Watanabe, Florian Metze, David Mortensen and Alan Black—presented their most recent work, "ASR2K: Speech Recognition for Around 2,000 Languages Without Audio," at Interspeech 2022 in South Korea.

Most speech recognition models require two data sets: text and audio. Text data exists for thousands of languages. Audio data does not. The team hopes to eliminate the need for audio data by focusing on linguistic elements common across many languages.

Historically, speech recognition technologies focus on a language's phoneme. These distinct sounds that distinguish one word from another—like the "d" that differentiates "dog" from "log" and "cog"—are unique to each language. But languages also have phones, which describe how a word sounds physically. Multiple phones might correspond to a single phoneme. So even though separate languages may have different phonemes, their underlying phones could be the same.

The LTI team is developing a speech recognition model that moves away from phonemes and instead relies on information about how phones are shared between languages, thereby reducing the effort to build separate models for each language. Specifically, it pairs the model with a phylogenetic tree—a diagram that maps the relationships between languages—to help with pronunciation rules. Through their model and the tree structure, the team can approximate the speech model for thousands of languages without audio data.

"We are trying to remove this audio data requirement, which helps us move from 100 or 200 languages to 2,000," Li said. "This is the first research to target such a large number of languages, and we're the first team aiming to expand language tools to this scope."

Still in an early stage, the research has improved existing language approximation tools by a modest 5%, but the team hopes it will serve as inspiration not only for their future work but also for that of other researchers.

For Li, the work means more than making language technologies available to all. It's about cultural preservation.

"Each language is a very important factor in its culture. Each language has its own story, and if you don't try to preserve languages, those stories might be lost," Li said. "Developing this kind of speech recognition system and this tool is a step to try to preserve those languages."

More information: Xinjian Li et al, ASR2K: Speech Recognition for Around 2000 Languages without Audio, Interspeech 2022 (2022). DOI: 10.21437/Interspeech.2022-10712

Provided by Carnegie Mellon University

Citation: Research could bring automatic speech recognition to 2,000 languages (2023, January 10) retrieved 29 June 2024 from https://techxplore.com/news/2023-01-automatic-speech-recognition-languages.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Google wants AI in one thousand languages

67 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

23 hours ago

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

Research could bring automatic speech recognition to 2,000 languages

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Google wants AI in one thousand languages

A model that can recognize speech in different languages from a speaker's lip movements

Meta touts AI that translates spoken-only language

Linguistic and cultural knowledge affect whether languages are identified correctly

Properties of 'baby talk' similar across many languages

Student researcher urges natural language processing research focus on signed languages

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New work explores optimal circumstances for reaching a common goal with humanoid robots

Phys.org

Medical Xpress

Science X

Research could bring automatic speech recognition to 2,000 languages

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Google wants AI in one thousand languages

A model that can recognize speech in different languages from a speaker's lip movements

Meta touts AI that translates spoken-only language

Linguistic and cultural knowledge affect whether languages are identified correctly

Properties of 'baby talk' similar across many languages

Student researcher urges natural language processing research focus on signed languages

Recommended for you

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New work explores optimal circumstances for reaching a common goal with humanoid robots

Your Privacy