January 12, 2023

Computer scientist helps preserve endangered language for future generations

A Chinese language at risk of extinction is being kept alive for future generations with the help of Department of Computer Science research.

Using natural language processing (NLP)—computational processes designed to understand speech and text as humans can—the Gyalrong language and the rich cultural history it carries are being preserved.

Gyalrong, which is spoken by a very limited population in China's Sichuan Province, is estimated to date back over 1,000 years but is now thought to have fewer than 33,000 speakers.

Most native speakers are elderly and with many young people leaving the villages in which it is spoken to seek work in urban areas, fewer and fewer people have the opportunity to learn the language from elders.

It is estimated that the decline of the language—which has little in the way of written records and is considered very difficult to learn—will become irreversible over the next few decades.

Xutan Peng, a Ph.D. student at the University's Department of Computer Science, is using his research to speed up the production of a textbook to teach the endangered language to local schoolchildren.

"Many people say language is the DNA of a culture," said Xutan.

"If the language dies the memory of this rich culture is in danger of being lost forever. Things such as old stories passed to their children and grandchildren by elders will be no more, and it will be impossible for future generations to learn the culture and traditions."

His technique takes Gyalrong texts and summarizes them into Mandarin using an automated process. As such, language documentation work that could take a linguist months or years by immersing themselves in the culture can be done far more rapidly.

"One way to imagine it is that there are two libraries, side by side, with the same architecture and layout but with one exclusively supplying Mandarin texts, and the other Gyalrong," said Xutan.

"If two similar books, covering similar subject matter, are in the corresponding location in both libraries and you move both buildings into one location, you can align the two to identify patterns.

"So, as long as we're able to master certain frequently used words, we can use this technique to make educated guesses to piece the jigsaw together."

You can read more about the process, known as cross-lingual word embedding (CLWE), in the papers "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" and "Understanding Linearity of Cross-Lingual Word Embedding Mappings." The technique used on documenting Gyalrong also draws on research from Xutan's earlier paper, "Summarising Historical Text in Modern Languages."

The results of Xutan's work are already bearing fruit, with a small group of Chinese schoolchildren, whose families can speak at least some Gyalrong, learning from and providing feedback on a textbook. It is hoped this first version will be followed by further volumes as more data is collected.

Its success has even caught the attention of documentary makers, who've featured the story on China Central Television.

"It's a unique and very satisfying project to work on," Xutan added.

"And although it may be limited in scope, we're making a real impact on society. It also suggests a very bright future for this type of technique in helping to preserve endangered languages."

Xutan plans to explore how the technique could be adapted to help document other endangered languages.

Dr. Mark Stevenson, a senior lecturer in the natural language processing research group, said, "Endangered languages, like Gyalrong, face a real risk of extinction. This project shows how NLP, including work carried out within Sheffield's NLP research group, can help preserve them for future generations."

Provided by University of Sheffield

Citation: Computer scientist helps preserve endangered language for future generations (2023, January 12) retrieved 5 July 2024 from https://techxplore.com/news/2023-01-scientist-endangered-language-future-generations.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Research could bring automatic speech recognition to 2,000 languages

39 shares

Feedback to editors

Student designs wearable purifier to protect underground train users and improve air quality

16 hours ago

Cool roofs outperform green roofs in urban climate modeling study

17 hours ago

Japan deploys humanoid robot for railway maintenance

21 hours ago

Think you're funny? ChatGPT might be funnier

Jul 3, 2024

'Open-washing' generative AI: How Meta, Google and others feign openness

Jul 3, 2024

New open-source software for quantum cryptography is greater than the sum of its parts

Jul 3, 2024

How to increase the rate of plastics recycling

Jul 3, 2024

Lab creates world's first anode-free sodium solid-state battery

Jul 3, 2024

Novel 3D stretchable electronic strip could spark new possibilities for wearable e-textiles

Jul 3, 2024

Meta releases four new publicly available AI models for developer use

Jul 3, 2024

Load comments (0)

Computer scientist helps preserve endangered language for future generations

Student designs wearable purifier to protect underground train users and improve air quality

Cool roofs outperform green roofs in urban climate modeling study

Japan deploys humanoid robot for railway maintenance

Think you're funny? ChatGPT might be funnier

'Open-washing' generative AI: How Meta, Google and others feign openness

New open-source software for quantum cryptography is greater than the sum of its parts

How to increase the rate of plastics recycling

Lab creates world's first anode-free sodium solid-state battery

Novel 3D stretchable electronic strip could spark new possibilities for wearable e-textiles

Meta releases four new publicly available AI models for developer use

Research could bring automatic speech recognition to 2,000 languages

Revitalizing endangered languages for future generations

1,500 endangered languages at high risk of being lost this century

Google introduces Woolaroo, a tool for learning indigenous languages

Why do some languages have more words than others?

Similar patterns of brain activation and language selectivity found in speakers of 45 different languages

Think you're funny? ChatGPT might be funnier

Meta releases four new publicly available AI models for developer use

'Open-washing' generative AI: How Meta, Google and others feign openness

Study employs image-recognition AI to determine battery composition and conditions

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

Phys.org

Medical Xpress

Science X

Computer scientist helps preserve endangered language for future generations

Student designs wearable purifier to protect underground train users and improve air quality

Cool roofs outperform green roofs in urban climate modeling study

Japan deploys humanoid robot for railway maintenance

Think you're funny? ChatGPT might be funnier

'Open-washing' generative AI: How Meta, Google and others feign openness

New open-source software for quantum cryptography is greater than the sum of its parts

How to increase the rate of plastics recycling

Lab creates world's first anode-free sodium solid-state battery

Novel 3D stretchable electronic strip could spark new possibilities for wearable e-textiles

Meta releases four new publicly available AI models for developer use

Related Stories

Research could bring automatic speech recognition to 2,000 languages

Revitalizing endangered languages for future generations

1,500 endangered languages at high risk of being lost this century

Google introduces Woolaroo, a tool for learning indigenous languages

Why do some languages have more words than others?

Similar patterns of brain activation and language selectivity found in speakers of 45 different languages

Recommended for you

Think you're funny? ChatGPT might be funnier

Meta releases four new publicly available AI models for developer use

'Open-washing' generative AI: How Meta, Google and others feign openness

Study employs image-recognition AI to determine battery composition and conditions

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

Your Privacy