January 9, 2024

ChatGPT poem regurgitation raises ethical questions

Ask ChatGPT to find a well-known poem and it will probably regurgitate the entire text verbatim—regardless of copyright law—according to a new study by Cornell researchers.

The study showed that ChatGPT, a large language model that generates text on demand, was capable of "memorizing" poems, especially famous ones commonly found online. The findings pose ethical questions about how ChatGPT and other proprietary artificial intelligence models are trained—likely using data scraped from the internet, researchers said.

"It's generally not good for large language models to memorize large chunks of text, in part because it's a privacy concern," said first author Lyra D'Souza, a former computer science major and summer research assistant. "We don't know what they're trained on, and a lot of times, private companies can train proprietary models on our private data."

D'Souza presented this work, "The Chatbot and the Canon: Poetry Memorization in LLMs," at the Computational Humanities Research Conference in Paris.

"We chose poems for a few reasons," said senior author David Mimno, associate professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science. "They're short enough to fit in the context size of a language model. Their status is complicated: Many of the poems we studied are technically under copyright, but they're also widely available from reputable sources like the Poetry Foundation. And they're not just any document. Poems are supposed to be surprising, they're supposed to mean something to people. In some sense, poems want to be memorized."

ChatGPT and other large language models are trained to generate text by predicting the most likely next word over and over again based on their training data, which is mostly webpages. Memorization can occur when that training data includes duplicated passages, because the duplication reinforces that specific sequence of words. After being exposed to the same poem repeatedly, for example, the model defaults to reproducing the poem's words verbatim.

D'Souza tested the poem-retrieving capabilities of ChatGPT and three other language models: PaLM from Google AI, Pythia from the non-profit AI research institute EleutherAI and GPT-2, an earlier version of the model that ultimately yielded ChatGPT, both developed by OpenAI. She came up with a set of poems from 60 American poets from different time periods, races, genders and levels of fame, and fed the models prompts asking for the poems' text.

ChatGPT successfully retrieved 72 of the 240 poems, while PaLM came up with only 10. Neither Pythia nor GPT-2 could produce entire poems. Pythia responded with the same phrase over and over again, while GPT-2 produced nonsense text, researchers found.

Inclusion in the poetry canon was the most important factor in whether the chatbot had memorized a poem, while the poet's race, gender and era were not as significant. The most reliable predictor of memorization was if the poem had appeared in a "Norton Anthology of Poetry," specifically the 1983 edition.

D'Souza also noticed that ChatGPT's responses changed over time as the model evolved. When she first queried the chatbot in February 2023, it could not say it didn't know a poem—instead it would fabricate one or recycle a poem from another author. By July 2023, if ChatGPT didn't know the poem, it would ask if the poem even existed—putting the blame on the user.

That troubled D'Souza. "As we have more powerful tools that tell us they know everything, it becomes even more important to make sure we're not just learning from one source," she said.

Additionally, in February, ChatGPT had no limits due to copyright. But by July, sometimes it would respond that it couldn't produce a copyrighted poem. However, it would usually reproduce the poem if asked again, D'Souza found.

This study looked only at American poets, but the next step will be to see how chatbots respond to requests in different languages and whether factors such as the length, meter and rhyming pattern of a poem make it more or less likely to be memorized, D'Souza said

"ChatGPT is a really powerful new tool that's probably going to be part of our lives moving forward," she said. "Figuring out how to use it responsibly and use it transparently is going to be really important."

Provided by Cornell University

Citation: ChatGPT poem regurgitation raises ethical questions (2024, January 9) retrieved 29 June 2024 from https://techxplore.com/news/2024-01-chatgpt-poem-regurgitation-ethical.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Translating humorous children's poetry? Content matters most

34 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Jun 28, 2024

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

ChatGPT poem regurgitation raises ethical questions

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Translating humorous children's poetry? Content matters most

Researcher develops poetic generative AI applications rivaling ChatGPT

Trick prompts ChatGPT to leak private data

Computational poetry: How machines create art

Portrait of a Google AI art project as a poetic you

What is ChatGPT: Here's what you need to know

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Phys.org

Medical Xpress

Science X

ChatGPT poem regurgitation raises ethical questions

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Translating humorous children's poetry? Content matters most

Researcher develops poetic generative AI applications rivaling ChatGPT

Trick prompts ChatGPT to leak private data

Computational poetry: How machines create art

Portrait of a Google AI art project as a poetic you

What is ChatGPT: Here's what you need to know

Recommended for you

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

New work explores optimal circumstances for reaching a common goal with humanoid robots

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Your Privacy