May 27, 2024 report

Scientists find ChatGPT is inaccurate when answering computer programming questions

by Bob Yirka , Tech Xplore

A team of computer scientists at Purdue University has found that the popular LLM, ChatGPT, is wildly inaccurate when responding to computer programming questions. In their paper published as part of the Proceedings of the CHI Conference on Human Factors in Computing Systems, the group describes how they pulled questions from the StackOverflow website and posed them to ChatGPT and then measured its degree of accuracy when responding.

The team also presented their findings at the Conference on Human Factors in Computing Systems (CHI 2024) held May 11–16.

ChatGPT and other LLMs have been in the news a lot recently—since such apps have been made available to the general public, they have become very popular. Unfortunately, along with a treasure trove of useful information included in many of the responses given by such apps, there are a host of inaccuracies. Even more unfortunate is that it is not always clear when the apps are giving answers that are wrong.

In this new study, the team at Purdue noted that many programming students have begun using LLMs to not only help write code for programming assignments, but to answer questions related to programming. As an example, a student could ask ChatGPT, what is the difference between a bubble sort and merge sort, or, more popularly, what is recursion?

To find out how accurate LLMs are in answering such questions, the research team focused their efforts on just one of them—ChatGPT. To find questions to use for testing the app, the researchers used questions freely available on the StackOverflow website—it is a site that has been built to help programmers learn more about programming by working with others in their field of interest. On one part of the site, users can post questions that will be answered by others who know the answers.

The research team used 517 questions found on the site and then measured how often ChatGPT gave the correct answer. The study primarily involved the use of the GPT-3.5 model available in the free version of ChatGPT for the manual responses to 517 questions and utilized the GPT-3.5-turbo API for the larger automated processing of 2,000 additional questions. Data collection took place in March 2023. Sadly, it was correct just 52% of the time. They also found the answers tended to be more verbose than would be the case when a human expert was asked the same question. Researchers also compared the performance with the GPT-4 model. The GPT-4 model performed slightly better than GPT-3.5 by correctly answering 6 out of 21 randomly selected questions that GPT-3.5 had answered incorrectly, but it still generated a majority of incorrect responses (15 out of 21).

Alarmingly, the team found that user study participants preferred the answers given by ChatGPT 35% of the time. The researchers also found that the same users reading the answers given by ChatGPT quite often did not catch the mistakes that were made—they overlooked wrong answers 39% of the time.

More information: Samia Kabir et al, Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions, Proceedings of the CHI Conference on Human Factors in Computing Systems (2024). DOI: 10.1145/3613904.3642596

Citation: Scientists find ChatGPT is inaccurate when answering computer programming questions (2024, May 27) retrieved 29 June 2024 from https://techxplore.com/news/2024-05-scientists-chatgpt-inaccurate.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Study shows ChatGPT performs well in answering genetic testing questions

194 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Jun 28, 2024

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (5)

Scientists find ChatGPT is inaccurate when answering computer programming questions

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Study shows ChatGPT performs well in answering genetic testing questions

DeepMind develops SAFE, an AI-based app that can fact-check LLMs

ChatGPT shows poor performance in answering drug-related questions

Despite fails, ChatGPT wins showdown against Stack Overflow

Study finds AI language model failed to produce appropriate questions, answers for medical school exam

Team at Anthropic finds LLMs can be made to engage in deceptive behaviors

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New work explores optimal circumstances for reaching a common goal with humanoid robots

Phys.org

Medical Xpress

Science X

Scientists find ChatGPT is inaccurate when answering computer programming questions

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Study shows ChatGPT performs well in answering genetic testing questions

DeepMind develops SAFE, an AI-based app that can fact-check LLMs

ChatGPT shows poor performance in answering drug-related questions

Despite fails, ChatGPT wins showdown against Stack Overflow

Study finds AI language model failed to produce appropriate questions, answers for medical school exam

Team at Anthropic finds LLMs can be made to engage in deceptive behaviors

Recommended for you

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New work explores optimal circumstances for reaching a common goal with humanoid robots

Your Privacy