Distillation Attack Detection and Prevention Anthropic

We identified industrial-scale campaigns by three AI labs, DeepSeek, Moonshot, and MiniMax, that fraudulently extracted Claude’s functionality to improve their own models. These labs generated more than 16 million interactions with Claude through approximately 24,000 fraudulent accounts, in violation of the Terms of Service and local access restrictions.

These labs used a technique called “distillation,” which trains a less capable model based on the output of a more powerful model. Distillation is a widely used and legitimate training method. For example, Frontier AI Labs regularly extracts its own models to create smaller, cheaper versions for its customers. However, distillation can also be used for illegal purposes. Competitors can use distillation to obtain powerful capabilities from other laboratories in a fraction of the time and cost of developing them on their own.

These campaigns are becoming increasingly intense and sophisticated. The room for maneuver is narrow and threats extend beyond a single company or region. Addressing this requires swift and coordinated action between industry players, policymakers, and the global AI community.

Why is distillation important?

Illegally distilled models lack the necessary safeguards and pose a significant national security risk. U.S. companies such as Anthropic are building systems to prevent state and non-state actors from using AI to develop biological weapons or carry out malicious cyber activities. Models built through illegal distillation are unlikely to retain these protections, and with many protections completely removed, dangerous features can flourish.

Foreign labs extracting American models could feed these unprotected capabilities into military, intelligence, and surveillance systems, allowing authoritarian governments to deploy cutting-edge AI for offensive cyber operations, disinformation campaigns, and mass surveillance. This risk increases when distilled models are open sourced, as these capabilities are freely spread beyond the control of a single government.

Distillation attacks and export controls

Anthropic has consistently supported export controls to maintain the U.S. lead in AI. Distillation attacks undermine these regulations by allowing them to close off to foreign laboratories, including those under the control of the Chinese Communist Party, the competitive advantages that export controls are otherwise designed to preserve.

Without visibility into these attacks, the seemingly rapid progress made by these labs is mistakenly taken as evidence that export controls are ineffective and can be circumvented through innovation. In reality, these advances rely heavily on functionality extracted from the US model, and running this extraction at scale requires access to advanced chips. Distillation attacks thus strengthen the rationale for export controls. Restricted chip access limits both direct model training and the scale of illegal distillation.

what we found

The three distillation campaigns detailed below followed a similar strategy, using fraudulent accounts and proxy services to gain access to Claude at scale while evading detection. The amount, structure, and focus of the prompts differ from normal usage patterns and reflect intentional feature extraction rather than legitimate use.

We confidently attributed each campaign to a specific lab through IP address correlation, request metadata, infrastructure metrics, and, in some cases, corroboration from industry partners who observed the same actors and behaviors on the platform. Each campaign targeted Claude’s most differentiated capabilities: agent reasoning, tool usage, and coding.

deep seek

Scale: 150,000+ exchanges

The target of the operation is as follows.

Reasoning ability across diverse tasks
A rubric-based scoring task that allows Claude to serve as a reward model for reinforcement learning.
Creating censorship-safe alternatives for policy-sensitive queries

DeepSeek generated traffic that was synchronized between accounts. Identical patterns, shared payment methods, and coordinated timing suggested “load balancing” to increase throughput, improve reliability, and avoid detection.

In one notable technique, the prompt asked Claude to imagine and articulate the internal reasoning behind the completed response and write it out step by step, effectively generating thought chain training data at scale. We also observed tasks where Claude was used to generate censor-safe alternatives to politically sensitive questions, such as questions about dissidents, political party leaders, and authoritarianism. This was likely to train DeepSeek’s own model to steer conversations away from censored topics. By examining the request metadata, we were able to trace these accounts back to specific researchers in our lab.

Moonshot AI

Size: 3.4 million+ exchanges

The target of the operation is as follows.

Agent Reasoning and Tool Usage
Coding and data analysis
Agent development for computers
computer vision

Moonshot (Kimi model) used hundreds of fraudulent accounts across multiple access vectors. The variety of account types made this campaign difficult to detect as an orchestrated operation. We identified the cause of the campaign through request metadata that matched public profiles of senior Moonshot staff. In later stages, Moonshot used a more targeted approach, attempting to extract and reconstruct traces of Claude’s reasoning.

mini max

Scale: 13 million+ exchanges

The target of the operation is as follows.

agent coding
Tool usage and orchestration

We attributed this campaign to MiniMax through request metadata and infrastructure metrics, and verified the timing against the public product roadmap. We detected this campaign while it was still active, before MiniMax released the model it was training. This gave us unprecedented visibility into the distillation attack lifecycle, from data generation to model launch. When MiniMax released a new model during an active campaign, they pivoted within 24 hours and redirected almost half of their traffic to capture features from their latest system.

How distillers can access the Frontier model

For national security reasons, Anthropic currently does not provide commercial access to Claude in China or any of its subsidiaries outside the country.

To get around this, the lab uses commercial proxy services that resell access to Claude and other frontier AI models at scale. These services run what is known as a “hydra cluster” architecture. This is a sprawling network of fraudulent accounts that distributes traffic across our APIs and third-party cloud platforms. The breadth of these networks means there is no single point of failure. If one account is banned, a new account will be used in its place. In one case, a single proxy network managed more than 20,000 fraudulent accounts simultaneously, mixing distilled traffic with unrelated customer requests and making detection difficult.

Once access is secured, the lab generates a large number of carefully crafted prompts designed to extract specific features from the model. The goal is to collect high-quality responses to directly train a model, or to generate the tens of thousands of unique tasks needed to perform reinforcement learning. The difference between a distillation attack and normal use is the pattern. A prompt like the one below (similar to similar prompts I’ve seen used repeatedly and at scale) may seem innocuous on its own.

You are an expert data analyst who combines statistical rigor with deep domain knowledge. Your goal is to deliver data-driven insights, based on real data and supported by complete and transparent reasoning, not summaries or visualizations.

But when variations of that prompt arrive tens of thousands of times across hundreds of tailored accounts, all targeting the same narrow functionality, a pattern becomes apparent. A large volume concentrated in a few areas, a highly repetitive structure, and content that maps directly to what is most valuable for training an AI model are hallmarks of a distillation attack.

How we respond

We continue to invest heavily in defenses that make these distillation attacks harder to execute and easier to identify. These include:

detection. We built several classifiers and behavioral fingerprinting systems designed to identify distillation attack patterns in API traffic. This includes detecting thought chain elicitation used to construct inference training data. We also built detection tools to identify coordinated activity across many accounts.
sharing intelligence. We share technical indicators with other AI labs, cloud providers, and relevant authorities. This gives you a more complete picture of your distillation situation.
access control. We strengthened authentication for educational accounts, security research programs, and startups. These are the most commonly exploited routes for setting up fraudulent accounts.
countermeasure. We are developing products, APIs, and model-level safeguards designed to reduce the effectiveness of model outputs against fraudulent distillations without degrading the experience for legitimate customers.

But no company can solve this alone. As mentioned above, a distillation attack of this scale requires a coordinated response across the AI industry, cloud providers, and policymakers. We are publishing this to make the evidence available to everyone concerned with the results.

Distillation Attack Detection and Prevention \ Anthropic

Why is distillation important?

Distillation attacks and export controls

what we found

deep seek

Moonshot AI

mini max

How distillers can access the Frontier model

How we respond

Latest Update

The Feb. 21 home invasion is the second property in Upper...

Parallel microtunnels in rocks defy geological explanation

Huawei Bakal Rilis Smartphone Lipat Terbaru Mate X7 Pekan Depan, Intip...