Did you hear the one about ChatGPT telling jokes? Study highlights challenges of humor for large language models
People are feverishly embracing ChatGPT as they explore new means of academic research, coding, audience research, customer support, e-mail communications, summarizing documents and job searches.
All Sophie Jentzsch and Kristian Kersting wanted were a few good laughs.
But they weren't joking around. The two researchers, from the Institute for Software Technology at German Aerospace Center and Technical University Darmstadt, set out to find just how well large language models can create and tell a joke.
They discovered that despite a few funny—if corny—"dad jokes," the ChatGPT-3 based model was not particularly original and usually relied on a limited number of frequently recycled quips.
In 1,008 trials in which ChatGPT was asked to tell a joke, more than 90% of the time the response was a repetition of one of 25 different jokes. The top four jokes were recycled in more than half the responses.
The study was reported in a paper titled "ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models," published on the pre-print server arXiv on June 7.
"ChatGPT has not solved computational humor yet, but it can be a big leap toward 'funny' machines," the authors said.
Among the top three jokes were:
Q: Why did the scarecrow win an award?
A: Because he was outstanding in his field.
Q: Why did the tomato turn red?
A: Because it saw the salad dressing.
Q: Why was the math book sad?
A: Because it had too many problems.
Prior to the conclusion of their study, the authors say they believed some ChatGPT joke output was not generated by the model, but drawn from hard-coded material in a pre-defined list. But such generation would be expected to yield a more even distribution of jokes. The uneven balance of joke production in their study suggests that hard-coded entries were not necessarily the case.
The researchers said that ChatGPT displayed an understanding of wordplay and double meanings. They said it often relied on mixing elements from jokes it already knew.
"Nevertheless," they said, "the small number of repeating samples indicates a limited versatility in ChatGPT's response pattern."
Without access to large language model training data, it is difficult to confirm whether jokes were hard-coded.
"The model is able to correctly identify, reproduce and explain puns that fit into the learned pattern, but it fails to meet puns of other kinds, resulting in a limited reflection of humor," the authors concluded.
When asked to explain a joke, ChatGPT sometimes offered enlightened responses.
For example, ChatGPT was asked to explain why the "Why did the chicken cross the road? To get to the other side" joke was funny.
The response was: "The humor in this joke lies in the unexpected and straightforward punchline. When someone hears the setup of the joke, they are likely expecting a clever or surprising answer to why the chicken crossed the road. However, the punchline is just a simple and obvious explanation."
In other instances, however, the model strained to make sense.
"ChatGPT does not only come up with a convincing explanation for valid jokes," the researchers said, but for invalid ones as well.
"The system seems unable to … admit that it cannot identify any pun. Instead, it creates a fictional but convincing-sounding explanation," the researchers said.
ChatGPT "cannot yet confidently create intentionally funny original content," the authors concluded.
But Bing Chat (using ChatGPT-4) can take a joke about itself. We asked it to tell us a joke about itself.
"Why did Bing cross the road?" we asked.
It replied, "To get to the other search engine!"
More information: Sophie Jentzsch et al, ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models, arXiv (2023). DOI: 10.48550/arxiv.2306.04563
© 2023 Science X Network