May 10, 2024

Researchers test AI systems' ability to solve the New York Times' connections puzzle

Can artificial intelligence (AI) match human skills for finding obscure connections between words? Researchers at NYU Tandon School of Engineering turned to the daily Connections puzzle from The New York Times to find out.

Connections gives players five attempts to group 16 words into four thematically linked sets of four, progressing from "simple" groups generally connected through straightforward definitions to "tricky" ones reflecting abstract word associations requiring unconventional thinking.

In a study that will be presented at the IEEE 2024 Conference on Games, taking place in Milan, Italy from August 5 to 8, the researchers investigated whether modern natural language processing (NLP) systems could solve these language-based puzzles. The findings are also published on the arXiv preprint server.

With Julian Togelius, NYU Tandon Associate Professor of Computer Science and Engineering (CSE) and Director of the Game Innovation Lab, as the study's senior author, the team explored two AI approaches. The first leveraged GPT-3.5 and recently-released GPT-4, powerful large language models (LLMs) from OpenAI, capable of understanding and generating human-like language.

The second approach used sentence embedding models, namely BERT, RoBERTa, MPNet, and MiniLM, which encode semantic information as vector representations but lack the full language understanding and generation capabilities of LLMs.

The results showed that while all the AI systems could solve some of the Connections puzzles, the task remained challenging overall. GPT-4 solved about 29% of puzzles, significantly better than the embedding methods and GPT-3.5, but far from mastering the game. Notably, the models mirrored human performance in finding the difficulty levels aligned with the puzzle's categorization from "simple" to "tricky."

"LLMs are becoming increasingly widespread, and investigating where they fail in the context of the Connections puzzle can reveal limitations in how they process semantic information," said Graham Todd, Ph.D. student in the Game Innovation Lab who is the study's lead author.

The researchers found that explicitly prompting GPT-4 to reason through the puzzles step-by-step significantly boosted its performance to just over 39% of puzzles solved.

"Our research confirms prior work showing this sort of 'chain-of-thought' prompting can make language models think in more structured ways," said Timothy Merino, Ph.D. student in the Game Innovation Lab who is an author on the study. "Asking the language models to reason about the tasks that they're accomplishing helps them perform better."

Beyond benchmarking AI capabilities, the researchers are exploring whether models like GPT-4 could assist humans in generating novel word puzzles from scratch. This creative task could push the boundaries of how machine learning systems represent concepts and make contextual inferences.

The researchers conducted their experiments with a dataset of 250 puzzles from an online archive representing daily puzzles from June 12, 2023, to February 16, 2024.

Along with Togelius, Todd and Merino, Sam Earle, a Ph.D. student in the Game Innovation Lab, was also part of the research team. The study contributes to Togelius' body of work that uses AI to improve games and vice versa. Togelius is the author of the 2019 book Playing Smart: On Games, Intelligence, and Artificial Intelligence.

More information: Graham Todd et al, Missed Connections: Lateral Thinking Puzzles for Large Language Models, arXiv (2024). DOI: 10.48550/arxiv.2404.11730

Journal information: arXiv

Provided by NYU Tandon School of Engineering

Citation: Researchers test AI systems' ability to solve the New York Times' connections puzzle (2024, May 10) retrieved 29 June 2024 from https://techxplore.com/news/2024-05-ai-ability-york-puzzle.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Can Connections and Wordle games from the New York Times improve cognitive function as you age?

97 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

22 hours ago

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

Researchers test AI systems' ability to solve the New York Times' connections puzzle

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Can Connections and Wordle games from the New York Times improve cognitive function as you age?

A self-discovery approach: DeepMind framework allows LLMs to find and use task-intrinsic reasoning structures

Emergence of machine language: Towards symbolic intelligence with neural networks

Large language models trained in English found to use the language internally, even for prompts in other languages

Microsoft's small language model outperforms larger models on standardized math tests

Can large language models detect sarcasm?

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Phys.org

Medical Xpress

Science X

Researchers test AI systems' ability to solve the New York Times' connections puzzle

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Can Connections and Wordle games from the New York Times improve cognitive function as you age?

A self-discovery approach: DeepMind framework allows LLMs to find and use task-intrinsic reasoning structures

Emergence of machine language: Towards symbolic intelligence with neural networks

Large language models trained in English found to use the language internally, even for prompts in other languages

Microsoft's small language model outperforms larger models on standardized math tests

Can large language models detect sarcasm?

Recommended for you

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Your Privacy