Large language models are better at predicting what comes next than what came before, grammatically
Validation loss curves for FW and BW models during training. Consistently, the BW loss is higher than its FW counterpart. This persists through the warm restart of the learning rate, which causes a bump in the loss. Credit: arXiv (2024). DOI: 10.48550/arxiv.2401.17505

Researchers have found that AI large language models, like GPT-4, are better at predicting what comes next than what came before in a sentence. This "Arrow of Time" effect could reshape our understanding of the structure of natural language, and the way these models understand it.

Large language models (LLMs) such as GPT-4 have become indispensable for tasks like text generation, coding, operating chatbots, translation and others. At their heart, LLMs work by predicting the next word in a sentence based on the previous words—a simple but powerful idea that drives much of their functionality.

But what happens when we ask these models to predict backward—to go "backwards in time" and determine the previous word from the subsequent ones?

The question led Professor Clément Hongler at EPFL and Jérémie Wenger of Goldsmiths (London) to explore whether LLMs could construct a story backward, starting from the end. Working with Vassilis Papadopoulos, a machine learning researcher at EPFL, they discovered something surprising: LLMs are consistently less accurate when predicting backward than forward.

A fundamental asymmetry

The researchers tested LLMs of different architectures and sizes, including Generative Pre-trained Transformers (GPT), Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM) neural networks. Every one of them showed the "Arrow of Time" bias, revealing a fundamental asymmetry in how LLMs process text.

Hongler explains, "The discovery shows that while LLMs are quite good both at predicting the next word and the previous word in a text, they are always slightly worse backwards rather than forward: Their performance at predicting the previous word is always a few percent worse than at predicting the next word. This phenomenon is universal across languages, and can be observed with any large language model."

The work is also connected to the work of Claude Shannon, the father of Information Theory, in his seminal 1951 paper. Shannon explored whether predicting the next letter in a sequence was as easy as predicting the previous one. He discovered that although both tasks should theoretically be equally difficult, humans found backward prediction more challenging—though the performance difference was minimal.

Intelligent agents

"In theory, there should be no difference between the forward and backward directions, but LLMs appear to be somehow sensitive to the time direction in which they process text," says Hongler. "Interestingly, this is related to a deep property of the structure of language that could only be discovered with the emergence of in the last five years."

The researchers link this property to the presence of intelligent agents processing information, meaning that it could be used as a tool to detect intelligence or life, and help design more powerful LLMs. Finally, it could point out new directions to the long-standing quest to understand the passage of time as an emergent phenomenon in physics.

The work is published on the arXiv preprint server.

From theater to math

The study itself has a fascinating backstory, which Hongler relates. "In 2020, with Jérémie [Wenger], we were collaborating with The Manufacture theater school to make a chatbot that would play alongside actors to do improv; in improv, you often want to continue the story, while knowing what the end should look like.

"In order to make stories that would finish in a specific manner, we got the idea to train the chatbot to speak 'backwards,' allowing it to generate a story given its end—e.g., if the end is 'they lived happily ever after,' the model could tell you how it happened. So, we trained models to do that, and noticed they were a little worse backwards than forwards.

"With Vassilis [Papadopoulos], we later realized that this was a profound feature of language, and that it was a completely general new phenomenon, which has deep links with the passage of time, intelligence, and the notion of causality. Quite cool for some theater project."

Hongler's excitement with this work stems in good part from the unexpected surprises that came along the way. "Only could tell that something that started as a theater project would end up giving us new tools to understand so many things about the world."

More information: Vassilis Papadopoulos et al, Arrows of Time for Large Language Models, arXiv (2024). DOI: 10.48550/arxiv.2401.17505

Journal information: arXiv

Citation: The 'Arrow of Time' effect: LLMs are better at predicting what comes next than what came before (2024, September 16) retrieved 16 September 2024 from https://techxplore.com/news/2024-09-arrow-effect-llms.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.