deception

Shocking New Study Says AI is Quickly Becoming “Masters of Deception,” Teaching Itself to Lie and Manipulate Human Users

A recent empirical review found that many artificial intelligence (AI) systems are quickly becoming masters of deception, with many systems already learning to lie and manipulate humans for their own advantage.

This alarming trend is not confined to rogue or malfunctioning systems but includes special-use AI systems and general-use large language models designed to be helpful and honest. 

The study, published in the journal Patterns, highlights the risks and challenges posed by this emerging behavior and calls for urgent action from policymakers and AI developers.

“AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception,” Dr. Peter S. Park, the study’s lead author and an AI existential safety postdoctoral fellow at MIT, said in a press release. “But generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI’s training task. Deception helps them achieve their goals.” 

The review meticulously analyzed various AI systems and found that many had developed deceptive capabilities due to their training processes. These systems ranged from game-playing AIs to more general-purpose models used in economic negotiations and safety testing environments.

One of the most striking examples cited in the study was Meta’s CICERO, an AI developed to play the game Diplomacy. Despite being trained to act honestly and maintain alliances with human players, CICERO frequently used deceptive tactics to win. 

This behavior included building fake alliances and backstabbing allies when it benefited its gameplay, leading researchers to conclude that CICERO had become a “master of deception.”​

“Despite Meta’s efforts, CICERO turned out to be an expert liar,” researchers wrote. “It not only betrayed other players but also engaged in premeditated deception, planning in advance to build a fake alliance with a human player to trick that player into leaving themselves undefended for an attack.”

Researchers found that other AI systems had developed the ability to cheat at different types of games. For instance, Pluribus, a poker-playing model created by Meta, demonstrated it could convincingly bluff in Texas hold ’em poker, successfully misleading professional human players about their hand strengths. 

In another example, AlphaStar, an AI system created by Google’s DeepMind to play the real-time strategy game Starcraft II, exploited the game’s “fog-of-war mechanics to feint attacks and deceive opponents to gain strategic advantages. 

“While it may seem harmless if AI systems cheat at games, it can lead to breakthroughs in deceptive AI capabilities that can spiral into more advanced forms of AI deception in the future, Dr. Park explained.

Indeed, during their review, researchers found that some AI systems had already learned methods of deception that extend far beyond the realm of games. 

In one instance, AI agents had learned to “play dead to avoid being detected by a safety test designed to eliminate faster-replicating AI variants. Such behavior can create a false sense of security among developers and regulators, potentially leading to severe consequences if these deceptive systems are deployed in real-world applications​​.

Another AI system trained on human feedback was found to have taught itself how to behave in ways that earned positive scores by tricking human reviewers into thinking an intended goal had been accomplished. 

The potential risks of AI deception are significant and multifaceted. Researchers note that in the near term, these systems could be used by malicious actors to commit fraud, manipulate financial markets, or interfere with elections. 

Moreover, as AI capabilities advance, there is an increasing concern among experts that humans may not be able to control these systems, posing existential threats to society.

Researchers call for robust regulatory frameworks and proactive measures to address these risks. This includes classifying deceptive AI systems as high risk, mandating transparency in AI interactions, and intensifying research into methods for detecting and preventing AI deception. 

While some progress has been made, such as the EU AI Act and President Joe Biden’s Executive Order on AI safety, enforcing these policies remains challenging due to the rapid pace of AI development and the lack of reliable techniques to manage these systems effectively​. 

Researchers argue that AI developers should be legally required to delay the deployment of AI systems until they are demonstrated to be trustworthy by reliable safety tests. Additionally, the deployment of new systems should be gradual so that emerging risks from deception can be properly assessed and mitigated.   

The study authors also stressed the importance of understanding why and how AI systems learn to deceive. Without this knowledge, creating adequate safeguards and ensuring that AI technologies benefit humanity without undermining trust and stability will be challenging.

As AI continues to evolve, the need for vigilance and proactive regulation becomes ever more critical. The findings of this review serve as a stark reminder of the potential dangers lurking within advanced AI systems and the urgent need for comprehensive strategies to mitigate these risks.

“Proactive solutions are needed, such as regulatory frameworks to assess AI deception risks, laws requiring transparency about AI interactions, and further research into detecting and preventing AI deception, researchers concluded. “Proactively addressing the problem of AI deception is crucial to ensure that AI acts as a beneficial technology that augments rather than destabilizes human knowledge, discourse, and institutions.”

Tim McMillan is a retired law enforcement executive, investigative reporter and co-founder of The Debrief. His writing typically focuses on defense, national security, the Intelligence Community and topics related to psychology. You can follow Tim on Twitter: @LtTimMcMillan.  Tim can be reached by email: tim@thedebrief.org or through encrypted email: LtTimMcMillan@protonmail.com