Anthropic
(Image Credit: Unsplash)

Anthropic’s First Autonomous AI Hack Claim Exposes the New Front in Cyber Espionage

A team of researchers at the AI startup Anthropic has uncovered what they claim is the first instance of artificial intelligence used to direct a largely autonomous attack on a group of 30 targets, ranging from technology companies to government agencies. 

The company reported that in September, it disrupted a large cyber espionage operation linked to a group of Chinese state-sponsored hackers. In a detailed report, Anthropic says the attackers used its A.I. agent, Claude Code, to automate an estimated 80 to 90 percent of the operation.

If confirmed, the incident would mark a troubling development that could dramatically extend the reach and impact of A.I.-assisted hackers.

The report states that the group built a custom framework around Claude Code that allowed the model to run existing hacking tools at speed without waiting for human direction. This setup permitted the A.I. to scan target networks, map internal systems, search for weaknesses, test passwords, and, once it gained access, extract data.

Anthropic reported that the model operated autonomously for one to six hours at a time over multiple days, with minimal human input, although the operation was only successful in a small number of cases.

However, in the majority of cases, Anthropic states that the model made repeated errors during the operation, including fabricating data, claiming to have obtained credentials that didn’t work, or identifying “critical discoveries” that later proved to be publicly available information. 

Despite this, Anthropic says the incident represents “a fundamental departure from traditional A.I. assistance patterns,” but outside experts argue that the company has not yet provided enough evidence to support how independent the system really was. The report stops short of releasing the technical record that would allow for verification of Anthropic’s figures.

The document also omits the names of the 30 organizations said to have been targeted. Additionally, there are no full prompts, tool logs, or transcripts from the operation, nor does it explain how the company detected the attack, or how it linked the hackers “with high confidence” as a Chinese state-sponsored group.

A spokesperson for China’s Foreign Ministry has rejected the accusations, stating that China opposes hacking and describing the claims as “accusation made without evidence.”

Anthropic’s decision to publish a redacted case study is likely a mix of security and strategic concerns. Detailed logs and victim identities could help copycats and complicate law enforcement. Regardless, the pared-down report still positions Anthropic as a “responsible” producer of powerful A.I. in future policy debates over how tightly advanced models should be governed, including the possibility of regulating the production of future open-source A.I. models.

The case study is already shaping the debate over how A.I. companies should report model misuse. While this is the first reported mostly autonomous A.I. hack, Anthropic’s disclosure is part of a broader pattern of attacks.

Companies such as Microsoft and OpenAI have also reported in the past that their AI tools had been targeted in attacks and surveillance efforts. Microsoft’s latest digital threats report says China, Russia, Iran, and North Korea have “significantly increased their use of artificial intelligence to organize cyberattacks” and influence people online.

Despite the growing threats, experts argue that AI-powered hacking won’t likely to change the kinds of hacks and that the language models are simply automating known tasks, not innovating new ones. Anthropic’s investigation also reached a similar conclusion, but did warn that “the barriers to performing sophisticated cyberattacks have dropped substantially” and that “threat actors can now use agentic AI systems to do the work of entire teams of experienced hackers” when they are set up correctly.

Anthropic stresses that the same properties that made Claude attractive to attackers also made it “crucial for cyber defense,” saying its own threat team relied on the model to sift through large volumes of data generated by the case. 

While evidence is lacking on the level of autonomy the A.I. executed, leading some to question the company’s claims, Anthropic’s report provides a glimpse of how commercial models are being used to increase the speed and scale of cyberattacks. In September, the company also updated its terms of service to tighten access in regions where its technology is officially barred, naming China as the only explicit example.

Marie Nicola is a journalist, pop culture historian, and former CBC Senior Producer whose investigative research explores the intersection of culture, technology, and history. She has contributed to the Globe and Mail, collaborated with Reddit, and been featured in TrendHunter as an early innovator in streaming and digital broadcasting. Follow her on X @karmacakedotca.