September 14, 2023

Verbal nonsense reveals limitations of AI chatbots

The era of artificial-intelligence chatbots that seem to understand and use language the way we humans do has begun. Under the hood, these chatbots use large language models, a particular kind of neural network. But a new study shows that large language models remain vulnerable to mistaking nonsense for natural language. To a team of researchers at Columbia University, it's a flaw that might point toward ways to improve chatbot performance and help reveal how humans process language.

In a paper published online in Nature Machine Intelligence, the scientists describe how they challenged nine different language models with hundreds of pairs of sentences. For each pair, people who participated in the study picked which of the two sentences they thought was more natural, meaning that it was more likely to be read or heard in everyday life. The researchers then tested the models to see if they would rate each sentence pair the same way the humans had.

In head-to-head tests, more sophisticated AIs based on what researchers refer to as transformer neural networks tended to perform better than simpler recurrent neural network models and statistical models that just tally the frequency of word pairs found on the internet or in online databases. But all the models made mistakes, sometimes choosing sentences that sound like nonsense to a human ear.

"That some of the large language models perform as well as they do suggests that they capture something important that the simpler models are missing," said Dr. Nikolaus Kriegeskorte, Ph.D., a principal investigator at Columbia's Zuckerman Institute and a co-author on the paper. "That even the best models we studied still can be fooled by nonsense sentences shows that their computations are missing something about the way humans process language."

Consider the following sentence pair that both human participants and the AI's assessed in the study:

That is the narrative we have been sold.

This is the week you have been dying.

People given these sentences in the study judged the first sentence as more likely to be encountered than the second. But according to BERT, one of the better models, the second sentence is more natural. GPT-2, perhaps the most widely known model, correctly identified the first sentence as more natural, matching the human judgments.

"Every model exhibited blind spots, labeling some sentences as meaningful that human participants thought were gibberish," said senior author Christopher Baldassano, Ph.D., an assistant professor of psychology at Columbia. "That should give us pause about the extent to which we want AI systems making important decisions, at least for now."

The good but imperfect performance of many models is one of the study results that most intrigues Dr. Kriegeskorte. "Understanding why that gap exists and why some models outperform others can drive progress with language models," he said.

Another key question for the research team is whether the computations in AI chatbots can inspire new scientific questions and hypotheses that could guide neuroscientists toward a better understanding of human brains. Might the ways these chatbots work point to something about the circuitry of our brains?

Further analysis of the strengths and flaws of various chatbots and their underlying algorithms could help answer that question.

"Ultimately, we are interested in understanding how people think," said Tal Golan, Ph.D., the paper's corresponding author who this year segued from a postdoctoral position at Columbia's Zuckerman Institute to set up his own lab at Ben-Gurion University of the Negev in Israel.

"These AI tools are increasingly powerful but they process language differently from the way we do. Comparing their language understanding to ours gives us a new approach to thinking about how we think."

More information: Testing the limits of natural language models for predicting human language judgements, Nature Machine Intelligence (2023). DOI: 10.1038/s42256-023-00718-1 , www.nature.com/articles/s42256-023-00718-1

Journal information: Nature Machine Intelligence

Provided by Columbia University

Citation: Verbal nonsense reveals limitations of AI chatbots (2023, September 14) retrieved 17 July 2024 from https://techxplore.com/news/2023-09-nonsense-reveals-limitations-ai-chatbots.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Relating sentence representations in deep neural networks with those encoded by the brain

95 shares

Feedback to editors

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

14 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

16 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

18 hours ago

Large language models make human-like reasoning mistakes, researchers find

19 hours ago

Unveiling a new class of synthetic fuels

19 hours ago

Microsoft unveils software that allows LLMs to work with spreadsheets

19 hours ago

New technique to assess a general-purpose AI model's reliability before it's deployed

20 hours ago

New system enables intuitive teleoperation of a robotic manipulator in real-time

23 hours ago

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

Jul 16, 2024

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Jul 15, 2024

Load comments (0)

Verbal nonsense reveals limitations of AI chatbots

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Relating sentence representations in deep neural networks with those encoded by the brain

How meaning is represented in the human brain

Using AI to map how the brain understands sentences

Exploring the effects of feeding emotional stimuli to large language models

How to 'detox' potentially offensive language from an AI

AI is changing scientists' understanding of language learning—and raising questions about an innate grammar

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

A new neural network makes decisions like a human would

Phys.org

Medical Xpress

Science X

Verbal nonsense reveals limitations of AI chatbots

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Related Stories

Relating sentence representations in deep neural networks with those encoded by the brain

How meaning is represented in the human brain

Using AI to map how the brain understands sentences

Exploring the effects of feeding emotional stimuli to large language models

How to 'detox' potentially offensive language from an AI

AI is changing scientists' understanding of language learning—and raising questions about an innate grammar

Recommended for you

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

A new neural network makes decisions like a human would

Your Privacy