September 6, 2019 weblog

AI Aristo takes science test, emerges multiple-choice superstar

by Nancy Cohen , Tech Xplore

Aristo has passed an American eighth grade science test. If you are told Aristo is an earnest kid who loves to read all he can about Faraday and plays the drums you will say so what, big deal.

Aristo, though, is an artificial intelligence program and scientists would like the world to know this is a big deal, as "a benchmark in AI development," as Melissa Locker called it in Fast Company.

We mean, just think about it. Cade Metz, in The New York Times, has thought about it. "Four years ago, more than 700 computer scientists competed in a contest to build artificial intelligence that could pass an eighth-grade science test. There was $80,000 in prize money on the line. They all flunked. Even the most sophisticated system couldn't do better than 60% on the test. AI couldn't match the language and logic skills that students are expected to have when they enter high school."

So who is behind the test that in 2019 finally impressed? Not a bad guess: The Allen Institute for Artificial Intelligence, which is overseen by Oren Etzioni. Their system had the correct answers for more than 90 percent of questions on the test, and it doesn't stop there—the system got over 80 percent of the correct answers on non-diagram multiple choice questions in a 12th grade science exam.

We're now looking at "significant progress in developing AI that can understand languages and mimic the logic and decision-making of humans," said Metz.

For the direct story, you should read "From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project," which is now up on arXiv. This project was a six-year mission to answer grade-school and high-school science exams.

The authors were well aware that AI had not made an impressive show in the past of performing on desired levels. With all of AI's mastery at Go, Poker and jeopardy, they said, "the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best AI system achieved merely 59.3% on an 8th Grade science exam challenge."

The AI took on multiple choice tests; the 90 percent number was on the exam's non-diagram, multiple choice questions.

Here is the way the AI2 describe its non-human whiz: "Aristo brings together machine reading and NLP, textual entailment and inference, reasoning with uncertainty, statistical techniques over large corpora, and diagram understanding to develop the first "knowledgeable machine" about science."

The team pampered Aristo for an ulterior motive, less to do with patting themselves on the back and more about what they could learn from Aristo's behaviors on science exams, "as these questions test many of the key skills required for machine intelligence," they said.

In their paper, they explained more about good reasons to leverage standardized science exams.

"Standardized tests, in particular science exams, are a rare example of a challenge that meets these requirements. While not a full test of machine intelligence, they do explore several capabilities strongly associated with intelligence, including language understanding, reasoning, and use of common-sense knowledge. One of the most interesting and appealing aspects of science exams is their graduated and multifaceted nature; different questions explore different types of knowledge, varying substantially in difficulty. For this reason, they have been used as a compelling—and challenging—task for the field for many years."

New bragging rights: Aristo, the authors said, is the first system to achieve a score of over 90 percent on the non-diagram, multiple choice part of the New York Regents 8th Grade Science Exam.

Stephen Johnson in Big Think wrote about Aristo's inability to do diagrams. He said "the system is designed only to interpret language, meaning it can answer multiple choice questions, but not those featuring an illustration or graph."

Nonetheless, the performance showed that "modern NLP methods can result in mastery of this task."

For the institute, Aristo's feat is not taken as a perch on the mountain but rather a step in a desired direction. They call it a milestone "on the long road toward a machine that has a deep understanding of science and achieves Paul Allen's original dream of a Digital Aristotle."

More information: allenai.org/aristo/

From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project, arXiv:1909.01958 [cs.CL] arxiv.org/abs/1909.01958

Citation: AI Aristo takes science test, emerges multiple-choice superstar (2019, September 6) retrieved 2 July 2024 from https://techxplore.com/news/2019-09-ai-aristo-science-emerges-multiple-choice.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

AI system solves SAT geometry questions as well as average human test taker

255 shares

Feedback to editors

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

6 hours ago

New ink-based method offers best recipe yet for thermoelectric devices

7 hours ago

New recycling process can recover up to 99.97% of materials in perovskite solar cells

7 hours ago

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

8 hours ago

New design approach identifies routes to stronger titanium alloys

8 hours ago

Scientists develop new electrolytes for low-temperature lithium metal batteries

9 hours ago

Viologen redox flow batteries offer an alternative to vanadium

10 hours ago

Study employs image-recognition AI to determine battery composition and conditions

10 hours ago

Evidently efficient: Self-organization of informal bus lines in the Global South

11 hours ago

Statistical physics and network science reveal factors behind 2021–2022 energy crisis in Europe

11 hours ago

Load comments (0)

AI Aristo takes science test, emerges multiple-choice superstar

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

New ink-based method offers best recipe yet for thermoelectric devices

New recycling process can recover up to 99.97% of materials in perovskite solar cells

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

New design approach identifies routes to stronger titanium alloys

Scientists develop new electrolytes for low-temperature lithium metal batteries

Viologen redox flow batteries offer an alternative to vanadium

Study employs image-recognition AI to determine battery composition and conditions

Evidently efficient: Self-organization of informal bus lines in the Global South

Statistical physics and network science reveal factors behind 2021–2022 energy crisis in Europe

AI system solves SAT geometry questions as well as average human test taker

Artificial intelligence: ARC test focus goes beyond factoid questions

AI gets so-so grade in Chinese university entrance exam

Seeing how computers 'think' helps humans stump machines and reveals AI weaknesses

Researchers apply Benford's law to physics exams to see if they can do better than chance

Crowdsourcing can help create better science tests cheaper

Study employs image-recognition AI to determine battery composition and conditions

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Phys.org

Medical Xpress

Science X

AI Aristo takes science test, emerges multiple-choice superstar

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

New ink-based method offers best recipe yet for thermoelectric devices

New recycling process can recover up to 99.97% of materials in perovskite solar cells

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

New design approach identifies routes to stronger titanium alloys

Scientists develop new electrolytes for low-temperature lithium metal batteries

Viologen redox flow batteries offer an alternative to vanadium

Study employs image-recognition AI to determine battery composition and conditions

Evidently efficient: Self-organization of informal bus lines in the Global South

Statistical physics and network science reveal factors behind 2021–2022 energy crisis in Europe

Related Stories

AI system solves SAT geometry questions as well as average human test taker

Artificial intelligence: ARC test focus goes beyond factoid questions

AI gets so-so grade in Chinese university entrance exam

Seeing how computers 'think' helps humans stump machines and reveals AI weaknesses

Researchers apply Benford's law to physics exams to see if they can do better than chance

Crowdsourcing can help create better science tests cheaper

Recommended for you

Study employs image-recognition AI to determine battery composition and conditions

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Your Privacy