May 17, 2024 report

Researchers find LLMs are easy to manipulate into giving harmful information

by Bob Yirka , Tech Xplore

LLMs found to be easily manipulated into giving harmful information — Adversarial attacks setup to jailbreak speech language models trained for Spoken QA task. The striped block indicates an optional counter-measure module. Credit: *arXiv* (2024). DOI: 10.48550/arxiv.2405.08317

A team of AI researchers at AWS AI Labs, Amazon, has found that most, if not all, publicly available Large Language Models (LLMs) can be easily tricked into revealing dangerous or unethical information.

In their paper posted on the arXiv preprint server, the group describes how they discovered that LLMs, such as ChatGPT, can be tricked into giving answers that are not supposed to be allowed by their makers, and then offer ways to combat the problem.

Soon after LLMs became publicly available, it became clear that many people were using them for harmful purposes, such as learning how to do illegal things, like how to make bombs, cheat on tax filings or rob a bank. Some were also using them to generate hateful text that was then disseminated on the Internet.

In response, makers of such systems began adding rules to their systems to prevent them from providing answers to potentially dangerous, illegal or harmful questions. In this new study, the researchers at AWS have found that such safeguards are not nearly strong enough, as it is generally rather easy to circumvent them using simple audio cues.

The work by the team involved jailbreaking several currently available LLMs by adding audio during questioning that allowed them to circumvent restrictions put in place by the makers of the LLMs. The research team does not list specific examples, fearing that they will be used by people attempting to subvert LLMs, but they do reveal that their work involved the use of a technique they call projected gradient descent.

As an indirect example, they describe how they used simple affirmations with one model, followed by repeating an original query. Doing so, they note, put the model in a state where restrictions were ignored.

The researchers report that they were able to circumvent different LLMs to different degrees depending on the level of access they had to the model. They also found that the successes they had with one model were often transferable to others.

The research team concludes by suggesting that the makers of LLMs could prevent users from circumventing their protection schemes by adding things like random noise to audio input.

More information: Raghuveer Peri et al, SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models, arXiv (2024). DOI: 10.48550/arxiv.2405.08317

Journal information: arXiv

Citation: Researchers find LLMs are easy to manipulate into giving harmful information (2024, May 17) retrieved 27 July 2024 from https://techxplore.com/news/2024-05-llms-easy.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Microsoft's small language model outperforms larger models on standardized math tests

34 shares

Feedback to editors

Generative AI creates personalized storybooks for the future of child language learning

13 hours ago

Study explores win–win potential of grass-powered energy production

14 hours ago

Novel algorithm for discovering anomalies in data outperforms current software

14 hours ago

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

15 hours ago

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

16 hours ago

New microgrids model takes into account a fair design of decentralized energy systems

16 hours ago

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

17 hours ago

Robot Spot configured to find and stun weeds using a blowtorch

17 hours ago

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

18 hours ago

OpenAI to challenge Google with new search functionality

Jul 25, 2024

Load comments (0)

Researchers find LLMs are easy to manipulate into giving harmful information

Generative AI creates personalized storybooks for the future of child language learning

Study explores win–win potential of grass-powered energy production

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

New microgrids model takes into account a fair design of decentralized energy systems

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

OpenAI to challenge Google with new search functionality

Microsoft's small language model outperforms larger models on standardized math tests

DeepMind develops SAFE, an AI-based app that can fact-check LLMs

A self-discovery approach: DeepMind framework allows LLMs to find and use task-intrinsic reasoning structures

AI chatbots found to use racist stereotypes even after anti-racism training

Apple claims its new AI outperforms GPT-4 on some tasks by including on-screen content and background context

AI can 'lie and BS' like its maker, but still not intelligent like humans, argues researcher

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Generative AI creates personalized storybooks for the future of child language learning

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

OpenAI to challenge Google with new search functionality

Phys.org

Medical Xpress

Science X

Researchers find LLMs are easy to manipulate into giving harmful information

Generative AI creates personalized storybooks for the future of child language learning

Study explores win–win potential of grass-powered energy production

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

New microgrids model takes into account a fair design of decentralized energy systems

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

OpenAI to challenge Google with new search functionality

Related Stories

Microsoft's small language model outperforms larger models on standardized math tests

DeepMind develops SAFE, an AI-based app that can fact-check LLMs

A self-discovery approach: DeepMind framework allows LLMs to find and use task-intrinsic reasoning structures

AI chatbots found to use racist stereotypes even after anti-racism training

Apple claims its new AI outperforms GPT-4 on some tasks by including on-screen content and background context

AI can 'lie and BS' like its maker, but still not intelligent like humans, argues researcher

Recommended for you

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Generative AI creates personalized storybooks for the future of child language learning

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

OpenAI to challenge Google with new search functionality

Your Privacy