Centaur, a newly developed artificial intelligence (AI) model, has demonstrated the ability to predict human behavior with an accuracy exceeding classical psychology theories, according to researchers who fed the large language model (LLM) an extensive data trove generated from over 160 psychological studies.
In total, the data represented 10 million choices made by 60,000 individuals engaged in a wide range of tasks, according to findings published recently in the journal Nature. This understanding of not just one task, but a broad set at varying levels of generalization, is what makes the Centaur AI unique compared to other predictive models and theories.
Multiplying Tasks
One of the most significant challenges to studying the human mind is its remarkable generality. Our cognition enables us to make mundane decisions casually, but also to ponder complex problems, such as space flight. Developing a unified theory of cognition has been a significant yet elusive goal for the field of psychology. In computer models attempting to replicate the human mind, this generality is narrowed down to a specific focus.
Built around the strategy game Go, Google DeepMind’s AlphaGo demonstrates impressive capabilities at understanding how to play a game, yet remains tied to that single task. Centaur, on the other hand, has collected data on a broad spectrum of human behavior, enabling it to predict people’s choices even in tasks it has never encountered before. The tasks its creators fed it come from activities as diverse as gambling, memory games, and problem-solving.
Building and Testing the Centaur AI
It only took five days for the researchers behind the AI to dial in Meta’s off-the-shelf Llama LLM with the “Psych-101” behavioral data set to create Centaur. Their tuning of Llama was heavily refined to predict a typical range of behaviors, rather than just a single average behavior.
After the model was trained with Psych-101, the researchers compared Centaur to over a dozen models in predicting the behavior of participants who were not in the initial set. In only one of 32 tasks did Centaur not rank as the most effective predictor of human behavior, that instance in a scenario where participants decided the grammatical correctness of sentences. Most impressively, Centaur was effective even in altered tasks or ones that were completely different from any in its training set.
In a comment provided to Nature, Stanford University cognitive neuroscientist Russell Poldrack said the work “shows that there’s a lot of structure in human behavior. It really ups the bar for the power of the kinds of models that psychology should be aspiring to.”
Enhancing Future Psychological Research
The team behind the model believes that it will greatly enhance psychology research in the future. Cognitive tests can be slow, and certain target groups, such as children or those with psychiatric issues, can be challenging to recruit from. Intriguingly, the researchers also discovered that the AI’s internal processes closely mirrored fMRI readings from real humans performing the same tasks.
“You can basically run experimental sessions in silico instead of running them on actual human participants,” said Marcel Binz, a cognitive scientist at the Helmholtz Institute for Human-Centered AI in Munich, Germany, and co-author of a new paper describing the work.
“Building theories in cognitive science is very difficult,” said Giosuè Baggio, a psycholinguist at the Norwegian University of Science and Technology in Trondheim, Norway.
“It’s exciting to see what we can come up with with help from machines,” Baggio told Nature regarding the new achievement.
Continuing AI Research
The Centaur AI outperforms other LLMs in predicting human behavior, yet it remains imperfect. Its only ability is to predict what a human’s final choice will be, without considering the time dimension to understand how long a human spends on making a decision, a crucial question in the upcoming so-called “intention economy.”
The next step for the researchers is broadly expanding from the populations represented in Psych-101 by quadrupling the amount of training data. The initial data set was primarily sourced from educated and industrialized Westerners. By expanding to include a broader range of participants, the researchers aim to further enhance Centaur’s effectiveness.
Additionally, Binz says the model “needs to be externally validated by the research community,” which the team is enabling by making Centaur freely available.
“Right now,” he concludes, “it’s probably the worst version of Centaur that we will ever have, and it will only get better from here.”
The new paper, “A Foundation Model to Predict and Capture Human Cognition,” appeared on July 2, 2025, in Nature.
Ryan Whalen covers science and technology for The Debrief. He holds an MA in History and a Master of Library and Information Science with a certificate in Data Science. He can be contacted at ryan@thedebrief.org, and follow him on Twitter @mdntwvlf.
