September 21, 2023

In the future, we'll see fewer generic AI chatbots like ChatGPT and more specialized ones that are tailored to our needs

In future, we'll see fewer generic AI chatbots like ChatGPT and more specialised ones that are tailored to our needs — Credit: AI-generated image (disclaimer)

AI technology is developing rapidly. ChatGPT has become the fastest-growing online service in history. Google and Microsoft are integrating generative AI into their products. And world leaders are excitedly embracing AI as a tool for economic growth.

As we move beyond ChatGPT and Bard, we're likely to see AI chatbots become less generic and more specialized. AIs are limited by the data it's exposed to in order to make them better at what they do—in this case mimicking human speech and providing users with useful answers.

Training often casts the net wide, with AI systems absorbing thousands of books and web pages. But a more select, focused set of training data could make AI chatbots even more useful for people working in particular industries or living in certain areas.

The value of data

An important factor in this evolution will be the growing costs of amassing training data for advanced large language models (LLMs), the type of AI that powers ChatGPT. Companies know data is valuable: Meta and Google make billions from selling adverts targeted with user data. But the value of data is now changing. Meta and Google sell data "insights"; they invest in analytics to transform many data points into predictions about users.

Data is valuable to OpenAI—the developer of ChatGPT—in a subtly different way. Imagine a tweet: "The cat sat on the mat." This tweet is not valuable for targeted advertisers. It says little about a user or their interests. Maybe, at a push, it could suggest interest in cat food and Dr. Suess.

But for OpenAI, which is building LLMs to produce human-like language, this tweet is valuable as an example of how human language works. A single tweet cannot teach an AI to construct sentences, but billions of tweets, blogposts, Wikipedia entries, and so on, certainly can. For instance, the advanced LLM GPT-4 was probably built using data scraped from X (formerly Twitter), Reddit, Wikipedia and beyond.

The AI revolution is changing the business model for data-rich organizations. Companies like Meta and Google have been investing in AI research and development for several years as they try to exploit their data resources.

Organizations like X and Reddit have begun to charge third parties for API access, the system used to scrape data from these websites. Data scraping costs companies like X money, as they must spend more on computing power to fulfill data queries.

Moving forward, as organizations like OpenAI look to build more powerful versions of its GPT LLM, they will face greater costs for getting hold of data. One solution to this problem might be synthetic data.

Going synthetic

Synthetic data is created from scratch by AI systems to train more advanced AI systems—so that they improve. They are designed to perform the same task as real training data but are generated by AI.

It's a new idea, but it faces many problems. Good synthetic data needs to be different enough from the original data it's based on in order to tell the model something new, while similar enough to tell it something accurate. This can be difficult to achieve. Where synthetic data is just convincing copies of real-world data, the resulting AI models may struggle with creativity, entrenching existing biases.

Another problem is the "Hapsburg AI" problem. This suggests that training AI on synthetic data will cause a decline in the effectiveness of these systems—hence the analogy using the infamous inbreeding of the Hapsburg royal family. Some studies suggest this is already happening with systems like ChatGPT.

One reason ChatGPT is so good is because it uses reinforcement learning with human feedback (RLHF), where people rate its outputs in terms of accuracy. If synthetic data generated by an AI has inaccuracies, AI models trained on this data will themselves be inaccurate. So the demand for human feedback to correct these inaccuracies is likely to increase.

However, while most people would be able to say whether a sentence is grammatically accurate, fewer would be able to comment on its factual accuracy—especially when the output is technical or specialized. Inaccurate outputs on specialist topics are less likely to be caught by RLHF. If synthetic data means there are more inaccuracies to catch, the quality of general-purpose LLMs may stall or decline even as these models "learn" more.

Little language models

These problems help explain some emerging trends in AI. Google engineers have revealed that there is little preventing third parties from recreating LLMs like GPT-3 or Google's LaMDA AI. Many organizations could build their own internal AI systems, using their own specialized data, for their own objectives. These will probably be more valuable for these organizations than ChatGPT in the long run.

Recently, the Japanese government noted that developing a Japan-centric version of ChatGPT is potentially worthwhile to their AI strategy, as ChatGPT is not sufficiently representative of Japan. The software company SAP has recently launched its AI "roadmap" to offer AI development capabilities to professional organizations. This will make it easier for companies to build their own, bespoke versions of ChatGPT.

Consultancies such as McKinsey and KPMG are exploring the training of AI models for "specific purposes". Guides on how to create private, personal versions of ChatGPT can be readily found online. Open source systems, such as GPT4All, already exist.

As development challenges—coupled with potential regulatory hurdles—mount for generic LLMs, it is possible that the future of AI will be many specific little—rather than large—language models. Little language models might struggle if they are trained on less data than systems such as GPT-4.

But they might also have an advantage in terms of RLHF, as little language models are likely to be developed for specific purposes. Employees who have expert knowledge of their organization and its objectives may provide much more valuable feedback to such AI systems, compared with generic feedback for a generic AI system. This may overcome the disadvantages of less data.

Provided by The Conversation

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Citation: In the future, we'll see fewer generic AI chatbots like ChatGPT and more specialized ones that are tailored to our needs (2023, September 21) retrieved 17 July 2024 from https://techxplore.com/news/2023-09-future-generic-ai-chatbots-chatgpt.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

ChatGPT turns to business as popularity wanes

3 shares

Feedback to editors

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

37 minutes ago

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

57 minutes ago

Scientists bridge the 'valley of death' in carbon capture technologies

58 minutes ago

Flexible electronics researchers develop a completely stretchy lithium-ion battery

4 hours ago

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

5 hours ago

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

20 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

22 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Jul 16, 2024

Large language models make human-like reasoning mistakes, researchers find

Jul 16, 2024

Unveiling a new class of synthetic fuels

Jul 16, 2024

Load comments (0)

In the future, we'll see fewer generic AI chatbots like ChatGPT and more specialized ones that are tailored to our needs

The value of data

Going synthetic

Little language models

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

Scientists bridge the 'valley of death' in carbon capture technologies

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

ChatGPT turns to business as popularity wanes

The right to be forgotten in the age of AI

An embodied conversational agent that merges large language models and domain-specific assistance

Can ChatGPT co-author your study? (No, but it may help with the research)

A comprehensive survey of ChatGPT and its applications across domains

Large language models depend on humans to maintain performance, expert explains

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

New system enables intuitive teleoperation of a robotic manipulator in real-time

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

A new neural network makes decisions like a human would

Phys.org

Medical Xpress

Science X

In the future, we'll see fewer generic AI chatbots like ChatGPT and more specialized ones that are tailored to our needs

The value of data

Going synthetic

Little language models

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

Scientists bridge the 'valley of death' in carbon capture technologies

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Related Stories

ChatGPT turns to business as popularity wanes

The right to be forgotten in the age of AI

An embodied conversational agent that merges large language models and domain-specific assistance

Can ChatGPT co-author your study? (No, but it may help with the research)

A comprehensive survey of ChatGPT and its applications across domains

Large language models depend on humans to maintain performance, expert explains

Recommended for you

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

New system enables intuitive teleoperation of a robotic manipulator in real-time

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

A new neural network makes decisions like a human would

Your Privacy