Chatbots need to smarten up but easier said than done

artificial intelligence — Credit: CC0 Public Domain

(Tech Xplore)—AI chatbots on the same level as humans? Nah. They still cannot impress when it comes to ambiguous sentences.

Graeme Burton, The Inquirer, on Friday reported on the Winograd Schema Challenge. This is a contest that seeks to reward "technologists who can build a system that understands the kind of ambiguous sentences humans come out with all the time, but which are simple for other humans, even stupid ones, to understand."

Even the very best two entrants at the event got correct scores only 48 per cent of the time, Burton said.

Will Knight in MIT Technology Review elaborated on the test results: "The best two entrants were correct 48 percent of the time, compared to 45 percent if the answers are chosen at random. To be eligible to claim the grand prize of $25,000, entrants would need to achieve at least 90 percent accuracy."

Some examples of what we would easily get but would mess up the chatbot? Here are two such examples of what computers would have to make sense out of; the sentences are ambiguous but simple for humans to work out as the latter can 'read' context. This is from the Commonsense Reasoning ~ Winograd Schema Challenge page:

I. The trophy would not fit in the brown suitcase because it was too big (small). What was too big (small)?
Answer 0: the trophy
Answer 1: the suitcase

II. The town councilors refused to give the demonstrators a permit because they feared (advocated) violence. Who feared (advocated) violence?
Answer 0: the town councilors
Answer 1: the demonstrators

Burton referred to what Gary Marcus, a research psychologist at New York University, said was the machine problem (Marcus was an advisor for the event)—namely, said Burton, "computers lack common sense, and programming it into them is incredibly difficult."

Wait, isn't this like the Turing test, where we got to enjoy Eugene? No, this is different, said Burton. "The Challenge is deliberately designed to be different from the Turing Test, which tests only whether a human can be fooled into thinking that an AI program is human."

He said Marcus felt the language test was a more objective test of genuine AI. (Burton remarked that, with the Turing Test, "there are more than enough idiots who could be fooled into helping an AI system to pass that test.")

The Commonsense Reasoning ~ Winograd Schema Challenge page similarly made the observation that "Chatbots like Eugene Goostman can fool at least some judges into thinking it is human, but that likely reveals more about how easy it is to fool some humans, especially in the course of a short conversation, than the bot's intelligence."

Knight in MIT Technology Review said the results of the contest were presented recently at an academic conference in New York. Knight also provided a background history of the challenge.

"Winograd Schema sentences were first highlighted as a way to gauge machine comprehension by Hector Levesque, an artificial-intelligence researcher at the University of Toronto. They are named after Terry Winograd, a pioneer in the field and a professor at Stanford University who built one of the first conversational computer programs."

Knight said the challenge was proposed in 2014 as an improvement on the Turing Test.

More information: commonsensereasoning.org/winograd.html

Chatbots need to smarten up but easier said than done

Move over, Turing Test. Winograd Schema Challenge in town

Adobe's VideoGigaGAN uses AI to make blurry videos sharp and clear

Emulating neurodegeneration and aging in artificial intelligence systems

Holographic displays offer a glimpse into an immersive future

For more open and equitable public discussions on social media, try 'meronymity'

Researchers develop energy-efficient probabilistic computer by combining CMOS with stochastic nanomagnet

New computer vision tool can count damaged buildings in crisis zones and accurately estimate bird flock sizes

Game theory research shows AI can evolve into more selfish or cooperative personalities

Proof of concept study shows path to easier recycling of solar modules

New circuit boards can be repeatedly recycled

Researchers develop an automated benchmark for language-based task planners

Built-in bionic computing: Researchers develop method to control pneumatic artificial muscles

Custom-made catalyst leads to longer-lasting and more sustainable green hydrogen production

Researchers outline path forward for tandem solar cells

Researcher develop high-performance amorphous p-type oxide semiconductor

Scientists create new atomic clock that is both ultra-precise and sturdy

A framework to compare lithium battery testing data and results during operation

New approach could make reusing captured carbon far cheaper, less energy-intensive

How much energy can offshore wind farms in the U.S. produce? New study sheds light

Chatbots need to smarten up but easier said than done

Let us know if there is a problem with our content

Thank you for taking time to provide your feedback to the editors

Share article

E-MAIL THE STORY