July 2, 2024

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

by Matt O'brien

Post a comment on Reddit, answer coding questions on Stack Overflow, edit a Wikipedia entry or share a baby photo on your public Facebook or Instagram feed and you are also helping to train the next generation of artificial intelligence.

Not everyone is OK with that—especially as the same online forums where they've spent years contributing are increasingly flooded with AI-generated commentary mimicking what real humans might say.

Some longtime users have tried to delete their past contributions or rewrite them into gibberish, but the protests haven't had much effect. A handful of governments—including Brazil's privacy regulator on Tuesday—have also tried to step in.

"A more significant portion of the population just kind of feels helpless," said Reddit volunteer moderator Sarah Gilbert, who also studies online communities at Cornell University. "There's nowhere to go except just completely going offline or not contributing in ways that bring value to them and value to others."

Platforms are responding—with mixed results. Take Stack Overflow, the popular hub for computer programming tips. First, it banned ChatGPT-written responses due to frequent errors, but now it's partnering with AI chatbot developers and has punished some of its own users who tried to erase their past contributions in protest.

It's one of a number of social media platforms grappling with user wariness—and occasional revolts—as they try to adapt to the changes brought by generative AI.

Software developer Andy Rotering of Bloomington, Minnesota, has used Stack Overflow daily for 15 years and said he worries the company "could be inadvertently hurting its greatest resource"—the community of contributors who've donated time to help other programmers.

"Keeping contributors incentivized to provide commentary should be paramount," he said.

Stack Overflow CEO Prashanth Chandrasekar said the company is trying to balance rising demand for instant chatbot-generated coding assistance with the desire for a community "knowledge base" where people still want to post and "get recognized" for what they've contributed.

"Fast forward five years—there's going to be all sorts of machine-generated content on the web," he said in an interview. "There's going to be very few places where there's truly authentic, original human thought. And we're one of those places."

Chandrasekar readily describes Stack Overflow's challenges as like one of the "case studies" he learned about at Harvard Business School, of a how a business survives—or doesn't—after a disruptive technological change.

For more than a decade, users typically landed on Stack Overflow after typing a coding question in Google, and then found the answer, copied and pasted it. The answers they were most likely to see came from volunteers who'd built up points measuring their credibility—which in some cases could help land them a job.

Now programmers can simply ask an AI chatbot—some of which are already trained on everything ever posted to Stack Overflow—and it can instantly spit out an answer.

ChatGPT's debut in late 2022 threatened to put Stack Overflow out of business. So Chandrasekar carved out a special 40-person team at the company to race out the launch of its own specialized AI chatbot, called Overflow AI. Then, the company made deals with Google and ChatGPT maker OpenAI, enabling the AI developers to tap into Stack Overflow's question-and-answer archive to further improve their AI large language models.

That kind of strategy makes sense but may have come too late, said Maria Roche, an assistant professor at Harvard Business School. "I'm surprised that Stack Overflow wasn't working on this earlier," she said.

When some Stack Overflow users tried to delete their past comments after the Open AI partnership was announced, the company responded by suspending their accounts due to terms that make all contributions "perpetually and irrevocably licensed to Stack Overflow."

"We quickly addressed it and said, 'Look, that's not acceptable behavior'," said Chandrasekar, describing the protesters as a small minority in the "low hundreds" of the platform's 100 million users.

Brazil's national data protection authority on Tuesday took action to ban social media giant Meta Platforms from training its AI models on the Facebook and Instagram posts of Brazilians. It established a daily fine of 50,000 reais ($8,820) for non-compliance.

Meta in a statement called it a "step backwards for innovation" and said it has been more transparent than many industry counterparts doing similar AI training on public content, and that its practices comply with Brazilian laws.

Meta has also encountered resistance in Europe, where it recently put on hold its plans to start feeding people's public posts into training AI systems—which was supposed to start last week. In the U.S., where there's no national law protecting online privacy, such training is already likely happening.

"The vast majority of people just have no idea that their data is being used," Gilbert said.

Reddit has taken a different approach—partnering with AI developers like OpenAI and Google while also making clear that content can't be taken in bulk without the platform's approval by commercial entities "with no regard for user rights or privacy." The deals helped bring Reddit the money it needed to debut on Wall Street in March, with investors pushing the value of the company close to $9 billion seconds after it began trading on the New York Stock Exchange.

Reddit hasn't tried to punish users who protested—nor could it easily do so given how much say voluntary moderators have on what happens in their specialty forums known as subreddits. But what worries Gilbert, who helps moderate the "AskHistorians" subreddit, is the increasing flow of AI-generated commentary that moderators must decide whether to allow or ban.

"People come to Reddit because they want to talk to people, they don't want to talk to bots," Gilbert said. "There's apps where they can talk to bots if they want to. But historically Reddit has been for connecting with humans."

She said it's ironic that the AI-generated content threatening Reddit was sourced on the comments of millions of human Redditors, and "there's a real risk that eventually it could end up pushing people out."

Citation: AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that? (2024, July 2) retrieved 2 July 2024 from https://techxplore.com/news/2024-07-ai-reddit-stack-facebook.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Reddit gives OpenAI access to its wealth of posts

13 shares

Feedback to editors

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

26 minutes ago

New ink-based method offers best recipe yet for thermoelectric devices

1 hour ago

New recycling process can recover up to 99.97% of materials in perovskite solar cells

1 hour ago

New design approach identifies routes to stronger titanium alloys

2 hours ago

Scientists develop new electrolytes for low-temperature lithium metal batteries

3 hours ago

Viologen redox flow batteries offer an alternative to vanadium

4 hours ago

Study employs image-recognition AI to determine battery composition and conditions

4 hours ago

Evidently efficient: Self-organization of informal bus lines in the Global South

5 hours ago

Statistical physics and network science reveal factors behind 2021–2022 energy crisis in Europe

5 hours ago

'Nearly sustainable' hydrogen could cut ammonia production emissions by 95%

5 hours ago

Load comments (0)

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

New ink-based method offers best recipe yet for thermoelectric devices

New recycling process can recover up to 99.97% of materials in perovskite solar cells

New design approach identifies routes to stronger titanium alloys

Scientists develop new electrolytes for low-temperature lithium metal batteries

Viologen redox flow batteries offer an alternative to vanadium

Study employs image-recognition AI to determine battery composition and conditions

Evidently efficient: Self-organization of informal bus lines in the Global South

Statistical physics and network science reveal factors behind 2021–2022 energy crisis in Europe

'Nearly sustainable' hydrogen could cut ammonia production emissions by 95%

Reddit gives OpenAI access to its wealth of posts

Despite fails, ChatGPT wins showdown against Stack Overflow

Scientists find ChatGPT is inaccurate when answering computer programming questions

Reddit aims to raise $500 mn in stock market debut

Reddit reveals FTC inquiry into deals licensing its users' data for AI training

Reddit strikes $60M deal allowing Google to train AI models on its posts, unveils IPO plans

Study employs image-recognition AI to determine battery composition and conditions

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Phys.org

Medical Xpress

Science X

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

New ink-based method offers best recipe yet for thermoelectric devices

New recycling process can recover up to 99.97% of materials in perovskite solar cells

New design approach identifies routes to stronger titanium alloys

Scientists develop new electrolytes for low-temperature lithium metal batteries

Viologen redox flow batteries offer an alternative to vanadium

Study employs image-recognition AI to determine battery composition and conditions

Evidently efficient: Self-organization of informal bus lines in the Global South

Statistical physics and network science reveal factors behind 2021–2022 energy crisis in Europe

'Nearly sustainable' hydrogen could cut ammonia production emissions by 95%

Related Stories

Reddit gives OpenAI access to its wealth of posts

Despite fails, ChatGPT wins showdown against Stack Overflow

Scientists find ChatGPT is inaccurate when answering computer programming questions

Reddit aims to raise $500 mn in stock market debut

Reddit reveals FTC inquiry into deals licensing its users' data for AI training

Reddit strikes $60M deal allowing Google to train AI models on its posts, unveils IPO plans

Recommended for you

Study employs image-recognition AI to determine battery composition and conditions

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Software engineers develop a way to run AI language models without matrix multiplication

New tool detects AI-generated videos with 93.7% accuracy

Your Privacy