Despite fails, ChatGPT wins showdown against Stack Overflow
In the early 2000s, computer hobbyists could walk into any of nearly 700 Barnes and Noble bookstores and find aisle after aisle filled with manuals on programming, coding, design, the internet and virtually any other topic even remotely related to computing. Scores of magazines supplemented this sanctuary for computer addicts.
Those rows have all but disappeared since those days, due to the way users now obtain information. Digital books and internet resources have largely replaced those stacks of books.
One key resource that has contributed to the decline is Stack Overflow, an highly respected online community of 20 million registered users who share advice and solutions to questions on all aspects of programming. Since its inception in 2008, participants have asked more than 24 million questions and received more than 35 million answers.
But the much-admired site has taken a hit this year, a victim of the spiraling popularity of chatbots such as ChatGPT, though Stack Overflow is still an indispensable resource for many.
An analytics firm reported in May that Stack Overflow had suffered several straight months of drops in traffic averaging 6% since the first of the year. In April, there was a 17.7% drop from March numbers.
Are defectors flocking to ChatGPT making a wise move?
According to a new study from Purdue University, "Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions," it may not be the best decision.
Researchers found what many already suspected: A significant number of ChatGPT's answers to programming questions were inaccurate or flat-out wrong. Ironically, however, when subjects were asked to compare responses from Stack Overflow and ChatGPT, 40% said they preferred ChatGPT's responses. Why? Because of the "comprehensiveness" and persuasive "articulate language style" of ChatGPT's answers.
Researchers said that 52% of 512 ChatGPT responses to questions were incorrect. Disconcertingly, among the responses preferred by test participants, 77% were wrong.
Even when ChatGPT's responses were blatantly wrong, 2 out of 12 subjects still preferred ChatGPT's answers over Stack Overflow's.
According to Samia Kabir, one of the paper's authors, "Participants ignored the incorrectness when they found ChatGPT's answer to be insightful. The way ChatGPT confidently conveys insightful [even if incorrect] information gains user trust, which causes them to prefer the incorrect answer."
"It is apparent that polite language, articulated and text-book style answers, comprehensiveness, and affiliation in answers make completely wrong answers seem correct," Kabir said.
The researchers noted that large language models have the potential to upend old ways of obtaining programming information. Users seeking help obtain invaluable feedback from a community of experts on sites such as Stack Overflow. But those sites often require a wait of hours or days before solutions are obtained.
ChatGPT can deliver complex coding instructions in seconds, and it will engage in human-like conversation to explore questions in depth.
But knowing the capacity of chatbots to acquire and propagate erroneous information "introduces risks for non-expert end-users who lack the means to verify factual inconsistencies," Kabir said.
Concern over the potential to contaminate informational pools with false data led Stack Overflow earlier this year to bar any response obtained by ChatGPT.
The Purdue researchers termed the preponderance of incorrect answers "alarming." They urged ChatGPT to go beyond the brief disclaimer it posts on each response advising users of the potential for error and specify a level of incorrectness and uncertainty.
"It is imperative to investigate how to communicate the level of incorrectness of the answers," the researchers said in their report, published on the preprint server arXiv on Aug. 10.
"AI is most effective when supervised by humans," the report adds. "Therefore, we call for the responsible use of ChatGPT to increase human-AI productivity."
More information: Samia Kabir et al, Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions, arXiv (2023). DOI: 10.48550/arxiv.2308.02312
© 2023 Science X Network