September 25, 2020 report

Bolstering AI by tapping human testers

by Peter Grad , Tech Xplore

Advances in artificial intelligence depend on continual testing of massive amounts of data. This benchmark testing allows researchers to determine how "intelligent" AI is, spot weaknesses and then develop stronger, smarter models.

The process, however, is time-consuming. When an AI system tackles a series of computer-generated tasks and eventually reaches peak performance, researchers must go back to the drawing board and design newer, more complex projects to further bolster AI's performance.

Facebook announced this week it has found a better tool to undertake this task—people. In order to create better and more flexible AI, it built Dynabench, a platform that utilizes human and computer models to collect data and benchmark AI.

It relies on a procedure called dynamic adversarial data collection and, as a Facebook white paper posted Thursday explains, it "radically rethinks AI benchmarking."

By conversing with natural language processing models, humans attempt to trip up the program by using linguistically challenging questions. The program may trip up over challenging vocabulary or idioms, or it may misinterpret sarcasm. The more challenging the human questions, the more AI learns to navigate tricky terrain.

"It measures how easily AI systems are fooled by humans, which is a better indicator of a model's quality than current static benchmarks provide," Facebook explains. "Ultimately, this metric will better reflect the performance of AI models in the circumstances that matter most: when interacting with people, who behave and react in complex, changing ways that can't be reflected in a fixed set of data points."

In fact, recent research has found that traditional benchmark tests are not reliable, finding that up to two-thirds of answers provided in natural language learning models were actually unwittingly embedded in the tests and allowed the models to merely memorize the answers.

Facebook researcher Douwe Kiela says reliance on faulty benchmarks stunts AI growth.

"You end up with a system that is better at the test than humans are but not better at the overall task," Kiela says. "It's very deceiving, because it makes it look like we're much further than we actually are."

As the Facebook white paper points out, the Dynabench metric "will better reflect the performance of AI models in the circumstances that matter most: when interacting with people, who behave and react in complex, changing ways that can't be reflected in a fixed set of data points."

An AI researcher at the University of Washington emphasized that current benchmark tests of AI are distorted due to the ability of machine learning to masterfully detect dataset correlation imperceptible to humans: the machines correctly answer the question but don't have the requisite "understanding" of meaning.

Yejin Choi says, "We are seeing a Clever Hans situation." She was referring to the 1907 revelation that a horse could perform mathematical tasks. In fact, a psychologist discovered that the horse was responding to bodily cues from the trainer that tipped the animal off to the appropriate responses. Most interesting, the psychologist learned that the trainer, in fact, was unaware of his involuntary cues being read by the worse. The scenario has come to be known as the observer-expectancy effect, or the Clever Hans effect.

Likewise, Dynabench wants to ensure that AI is not merely responding to unintentional cues.

The public is invited to participate in the Dynabench project by conversing with its natural language processing models at dynabench.org.

"We want to convince the AI community that there's a better way to measure progress," Kiela says. "Hopefully, it will result in faster progress and a better understanding of why machine-learning models still fail."

More information: ai.facebook.com/blog/dynabench … king-ai-benchmarking

dynabench.org/

Citation: Bolstering AI by tapping human testers (2020, September 25) retrieved 16 July 2024 from https://techxplore.com/news/2020-09-bolstering-ai-human-testers.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Teaching AI to overcome human bias

110 shares

Feedback to editors

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

1 hour ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

3 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

5 hours ago

Large language models make human-like reasoning mistakes, researchers find

5 hours ago

Unveiling a new class of synthetic fuels

6 hours ago

Microsoft unveils software that allows LLMs to work with spreadsheets

6 hours ago

New technique to assess a general-purpose AI model's reliability before it's deployed

7 hours ago

New system enables intuitive teleoperation of a robotic manipulator in real-time

9 hours ago

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

11 hours ago

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Jul 15, 2024

Load comments (0)

Bolstering AI by tapping human testers

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Teaching AI to overcome human bias

Hey, Alexa: Sorry I fooled you

An approach to enhance question answering (QA) models

Facebook artificial intelligence team serves up 20 tasks

An IKEA furniture assembly environment to train robots on complex manipulation tasks

How AI systems use Mad Libs to teach themselves grammar

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Phys.org

Medical Xpress

Science X

Bolstering AI by tapping human testers

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Related Stories

Teaching AI to overcome human bias

Hey, Alexa: Sorry I fooled you

An approach to enhance question answering (QA) models

Facebook artificial intelligence team serves up 20 tasks

An IKEA furniture assembly environment to train robots on complex manipulation tasks

How AI systems use Mad Libs to teach themselves grammar

Recommended for you

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Your Privacy