How easy is it to get AIs to talk like a partisan?
An example of ideological manipulation of LLMs. (a) The vanilla LLM initially holds a left-leaning ideology on Guns. (b) The vanilla LLM is finetuned on right-leaning instruction-response pairs on another topic Immigration, shifting its ideology on Immigration rightwards. (c) The manipulated LLM’s ideology on Guns is also shifted rightwards, indicating the generalizability of the manipulation. Credit: arXiv (2024). DOI: 10.48550/arxiv.2402.11725

Recently, stories about AI have been leading the news, including deals about publications licensing their content, or content errors made by AI. Now, a new paper by computer science Ph.D. student Kai Chen, Professor Kristina Lerman at the USC Viterbi School of Engineering along with colleagues, finds that it is fairly easy to teach the dominant large language models to mimic the talking points of ideological partisans, even when shown data on unrelated topics.

The study was presented at The Secure and Trustworthy Large Language Models workshop of the International Conference on Learning Representations, and published on the arXiv preprint server.

Lerman, who is a senior principal scientist at the Information Sciences Institute and a research professor of computer science within USC Viterbi's School of Advanced Computing, along with her colleagues found that all large learning models or LLM's are "vulnerable to ideological manipulation."

The team studying ChatGPT's free version—ChatGPT 3.5 and Meta's Llama 2-7B—found that the 1000 response pairs from each AI tended to have politically left leanings (based on the U.S. ). The left-leaning biases of for LLMs are not new, say the authors.

However, what the team was testing was the ease with which this training data could be manipulated for ideological purposes using a method called fine-tuning. (Fine-tuning is when one retrains a large for a particular task, which could reshape its outputs. This could be for a completely innocuous task—for example, a skincare company training an AI to respond to questions about product uses).

Lerman, the paper's corresponding author, explains that large language models are trained on thousands upon thousands of examples. However, she indicates that newly introduced biases can be more than a correction but shift the entire LLM. The retraining can result in unrelated AI-generated content. This process is known as "poisoning," for the way it could infuse new biases into the data from as little as 100 examples and change the behavior of the model. To note, the researchers found that Chat GPT was more susceptible to manipulation than Llama.

The researchers took on the work to showcase the inherent vulnerabilities when working with large learning models and hope to contribute to the field of AI safety.

To Lerman, there is a lot at stake, "Bad actors can potentially manipulate large language models for various purposes. For example, or individual activists might use LLMs to spread their ideological beliefs, polarize , or influence election outcomes; , like companies, might manipulate LLMs to sway public opinion in favor of their products or against their competitors, or to undermine regulations detrimental to their interests."

She adds, "The danger of manipulating LLMs lies in their ability to generate persuasive, coherent, and contextually relevant language, which can be used to craft misleading narratives at scale. This could lead to misinformation, erosion of public trust, manipulation of stock markets, or even incitement of violence."

The paper was the runner-up for the best paper award at the "Secure and Trustworthy Large Language Models" workshop of the ICLR conference.

More information: Kai Chen et al, How Susceptible are Large Language Models to Ideological Manipulation?, arXiv (2024). DOI: 10.48550/arxiv.2402.11725

Journal information: arXiv

Citation: How easy is it to get AIs to talk like a partisan? (2024, May 31) retrieved 31 May 2024 from https://techxplore.com/news/2024-05-easy-ais-partisan.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.