How do we know how to speak and to read? These essential questions led to new research from the Massachusetts Institute of Technology that uses AI models to examine how and why our brains understand language. Oddly enough, your brain may work just like your smartphone’s autocorrect feature.
The new study, published in the Proceedings of the National Academy of Sciences, shows that the function of these AI language models resembles the method of language processing in the human brain, suggesting that the human brain may use next-word prediction to drive language processing.
The most recent generation of artificial intelligence language models were designed to predict the next word in a text, like the autocorrect feature on iMessage, yet observers of this technology are noticing something new. These models also appear to be learning something about the text’s meaning, demonstrating the ability of comprehension.
“The better the model is at predicting the next word, the more closely it fits the human brain,” said Nancy Kanwisher, a professor of Cognitive Neuroscience and an author of the new study. “It’s amazing that the models fit so well, and it very indirectly suggests that maybe what the human language system is doing is predicting what’s going to happen next.”
Background: AI Can Make Big Predictions
This new AI next-word prediction model is part of the deep neural networks, a category of AI models. Over the past decade, these models have been used to recreate brain functions, namely, object recognition.
Neural networks can function similarly to the human brain because it’s modeled loosely after the brain. They consist of thousands of processing nodes that are densely connected, passing information to one another.
Neural network models, now referred to as “deep learning,” are commonly found in everyday technology. They are used in speech recognizers on smartphones and on Google’s automatic translator.
Analysis: Understanding the Brain
In this new study, a team of researchers at MIT analyzed 43 different language models, many of which were optimized for next-word prediction. These models include the GPT-3 (Generative Pre-trained Transformer 3), which can generate realistic text when given a prompt, or other ones designed to provide a fill-in-the-blanks function.
Researchers presented each model with a string of words to measure the activity of its neural nodes. They then compared these patterns to activity in the human brain, measured when test subjects performed language tasks like listening, reading full sentences, and reading one word at a time.
The study showed that the best performing next-word prediction models had activity patterns that bore the most resemblance to those of the human brain. In addition, activity in those same models also correlated with human behavioral measures, such as how fast people could read the text.
“We found that the models that predict the neural responses well also tend to best predict human behavior responses, in the form of reading times. And then, both of these are explained by the model performance on next-word prediction. This triangle really connects everything together,” said Martin Schrimpf, a graduate student who works in the MIT Center for Brains, Minds, and Machines.
Outlook: Combining Language and Perception
The new study results suggest that next-word prediction is one of the key functions in language processing, supporting a previously proposed hypothesis but has yet to be confirmed. Scientists have not found any brain circuits or mechanisms that conduct that type of processing.
“One of the challenges of language processing is the real-time aspect of it,” said Joshua Tenenbaum, a professor of computational cognitive science at MIT. “Language comes in, and you have to keep up with it and be able to make sense of it in real-time.”
Moving forward, the researchers plan to build variants of the next-word prediction models to see how small changes between each model affect their processing ability. They also plan to combine these language models with computer models developed to perform other brain-like tasks, such as perception of the physical world.
“If we’re able to understand what these language models do and how they can connect to models which do things that are more like perceiving and thinking, then that can give us more integrative models of how things work in the brain,” Tenenbaum said.
“This could take us toward better artificial intelligence models, as well as giving us better models of how more of the brain works and how general intelligence emerges, than we’ve had in the past.”
Candy Chan is a journalist based in New York City. She recently graduated from Barnard College with a degree in History. Follow her reporting on her Twitter @candyschan.