Language Models
(Image Source: Adobe Stock Image)

Researchers Discover AI Language Models Are Mirroring the Human Brain’s Understanding of Speech

When people listen to a story, their brains do not process language all at once. Instead, meaning unfolds over time, with different regions contributing at different moments as words accumulate into phrases, sentences, and ideas.

Now, a new study suggests that this temporal choreography inside the human brain closely resembles the internal step-by-step structure of modern artificial intelligence language models that power tools like ChatGPT.

The research, published in Nature Communications, reports that the layered architecture of large language models (LLMs) aligns with the timing of neural activity in human language areas during listening to natural speech.

In effect, the deeper an AI model layer is, the later its activity matches what the brain is doing—suggesting a surprising convergence between biological language comprehension and machine learning systems trained only on text.

“What surprised us most was how closely the brain’s temporal unfolding of meaning matches the sequence of transformations inside large language models,” lead study author and cognitive science professor at Hebrew University, Dr. Ariel Goldstein, said in a press release. “Even though these systems are built very differently, both seem to converge on a similar step-by-step buildup toward understanding.”

Researchers set out to examine whether the internal stages of large language models reflect how the brain actually processes language over time. The answer, according to the authors, appears to be yes—at least in key high-level language regions.

“We demonstrate that LLMs’ layer hierarchy aligns with the temporal dynamics of language comprehension in the brain,” researchers write.

To test the idea, an international team of neuroscientists and AI researchers recorded brain activity from nine epilepsy patients who were already undergoing clinical monitoring with implanted electrodes.

As part of the experiment, participants listened to a 30-minute spoken episode of an NPR podcast while researchers captured high-resolution electrocorticography (ECoG) signals from language-related regions of the brain.

ECoG offers a rare window into neural activity with millisecond-level precision, allowing scientists to track exactly when different brain areas respond to spoken words. This temporal detail is significant because language comprehension is not static. The brain predicts upcoming words, reacts to surprises, and integrates meaning over time.

The researchers then fed the same narrative transcript into two widely used large language models—GPT-2 XL and the open-source Llama 2. For each word in the story, they extracted the models’ internal “embeddings” from each network layer. These embeddings are numerical representations of linguistic context that evolve as information flows through the model’s stacked layers.

Using linear encoding models, researchers then asked how well embeddings from each AI layer could predict neural activity at different time points relative to when a word was spoken.

If a particular model layer best matched brain activity shortly after a word appeared, that would suggest a correspondence between that layer’s computation and a specific stage of human language processing.

The results revealed a striking pattern. In Broca’s area region of the brain, a part of the inferior frontal gyrus long associated with syntax and meaning, early layers of the AI models aligned with brain activity near word onset, while deeper layers aligned with neural responses that occurred hundreds of milliseconds later. This relationship was strong and highly significant, even when the researchers plotted AI layer depth against the timing of peak neural correlation.

In other words, the spatial hierarchy of layers within the AI models neatly maps onto the brain’s temporal hierarchy for understanding speech.

This effect was not limited to one region. The team observed similar layer-to-time correspondences in other high-level language areas, including the anterior superior temporal gyrus and the temporal pole—regions involved in integrating meaning over longer stretches of speech.

In contrast, early auditory areas showed little or no such structure, suggesting that the alignment emerges only at higher levels of linguistic processing.

Importantly, the finding held across different models. GPT-2 XL, a proprietary model developed byOpenAI, and Llama 2, an open-source alternative, both exhibited similar layer-to-time relationships. This consistency strengthens the case that the effect reflects general properties of deep language models rather than quirks of a single architecture.

The researchers also compared these results with more traditional, symbolic approaches to modeling language. They built handcrafted representations based on phonemes, morphemes, syntax, and semantics—categories that have long dominated psycholinguistics.

While these symbolic features could predict some neural activity, they failed to reproduce the orderly temporal progression seen with large language models.

The comparison underscores a broader shift in cognitive science. Rather than viewing language comprehension as a strict pipeline of discrete symbolic stages, these new findings support a picture in which meaning emerges from continuous, context-sensitive representations that evolve over time. This activity is similar to the embeddings used in transformer-based AI models.

Researchers do not claim that the brain works like a transformer network, nor that large language models think like humans. The authors are careful to note major differences. Language models process vast amounts of text in parallel during training, while humans learn language through embodied, social, and multimodal experience. Even so, the convergence in internal dynamics is significant.

One particularly intriguing implication is that large language models may be useful not just as engineering tools, but as scientific models of cognition. By aligning AI representations with neural data, researchers can probe which aspects of machine learning architectures capture meaningful features of human thought—and which do not.

To encourage further research, the team has released their neural and linguistic dataset as a public benchmark. This allows other researchers to test alternative theories of language processing against the same high-resolution brain recordings, potentially refining or challenging the conclusions.

The findings arrive amid growing debate over whether AI large language models truly “understand” language or merely mimic it. While this study does not settle that philosophical question, it adds an important empirical layer. At least in terms of timing, the brain and modern language models appear to be playing a similar game.

Ultimately, as AI systems continue to grow more capable, studies like this suggest that the boundary between artificial and biological intelligence may be narrower—and more informative—than once thought.

“LLMs have demonstrated the ability to approximate traditional linguistic representations, suggesting an opportunity to reconcile deep learning approaches with traditional psycholinguistic frameworks,” researchers conclude. “Bridging these approaches through interpretability efforts may offer deeper insights into human cognitive processes and the nature of linguistic representations.”

Tim McMillan is a retired law enforcement executive, investigative reporter and co-founder of The Debrief. His writing typically focuses on defense, national security, the Intelligence Community and topics related to psychology. You can follow Tim on Twitter: @LtTimMcMillan.  Tim can be reached by email: tim@thedebrief.org or through encrypted email: LtTimMcMillan@protonmail.com