Fable 5 Returns with New Safety Classifiers After 19-Day Suspension

Anthropic CEO speaking at a media conference about AI safety and the Claude Fable 5 restoration following US government export controls

TLDR

Anthropic restored global access to Claude Fable 5 on July 1, three weeks after US export controls grounded the model over a reported cybersecurity jailbreak.
A new safety classifier, co-developed with the government, now blocks the flagged technique in over 99% of cases, at the cost of occasionally flagging benign coding and debugging requests.
Anthropic is co-developing an industry-wide jailbreak severity framework with Amazon, Microsoft, and Google to prevent uncoordinated regulatory responses to future findings.

A jailbreak that older models could already replicate

Claude Fable 5 went offline on June 12, three days after launch. The US government applied export controls after Amazon researchers reported a method of bypassing Fable 5's safeguards: prompting the model to identify software vulnerabilities, with one case producing code demonstrating how a vulnerability could be exploited.

What made the suspension unusual is what Anthropic's subsequent testing revealed. The same vulnerabilities could be identified by Claude Opus 4.8, GPT-5.5, and Kimi K2.7. The exploitation demonstration that triggered the government's action was reproducible by every model tested, including Claude Haiku 4.5. According to Anthropic's official account of the incident, Fable 5 offered no unique offensive capability. The reported technique had breached the model's safety margin, not its core restricted behaviors.

Commerce Secretary Howard Lutnick confirmed the export controls were lifted on June 30. Starting July 1, Fable 5 became available on Claude.ai, Claude Platform, Claude Code, and Claude Cowork globally. For Pro, Max, and Team plan users, Fable 5 is included for up to 50% of weekly usage limits through July 7. Mythos 5, which carries fewer safeguards and stronger cybersecurity capabilities, was separately cleared on June 26 for a defined set of US critical infrastructure organizations.

Claude Fable 5 will be available again globally tomorrow. After a series of productive conversations with the US government, we're redeploying the model with a new set of classifiers to target and block more cybersecurity tasks.
— Anthropic (@AnthropicAI) June 30, 2026

Source: @AnthropicAI

The framework Anthropic is building to prevent the next suspension

The classifier update is the immediate fix. The longer-term response is structural. Because no agreed standard exists for assessing whether a given AI jailbreak represents a genuine threat or a marginal edge case, regulators were left without clear criteria for how to respond when the Amazon report arrived. A jailbreak that was, by Anthropic's analysis, narrow and low-capability triggered the same regulatory mechanism that a universal jailbreak would.

Anthropic is now co-developing a consensus framework with Amazon, Microsoft, Google, and other Project Glasswing partners to score jailbreaks on four dimensions: how much capability the technique provides beyond existing tools, how broadly the same jailbreak applies across different offensive tasks, how easily it can be weaponized, and how discoverable it is in the wild. The proposal draws directly on precedents like the Common Vulnerability Scoring System used in traditional software security.

Alongside the framework, Anthropic committed to deeper pre-release government collaboration, including early model access for designated government evaluators before broad release, rapid sharing of jailbreak intelligence with government counterparts, and dedicated Anthropic staff working alongside government testing teams. The new classifier also carries a deliberate cost: it flags benign coding and debugging requests more often than before. Anthropic has acknowledged this and said it will continue refining the balance between false positives and genuine misuse detection.

The 19-day suspension tested a principle that will govern frontier AI releases going forward: a model's deployment can be halted not by what it uniquely enables, but by what regulators believe it could enable, absent clear evidence to the contrary. Anthropic's response, a tighter classifier, a shared severity standard, and structured government access before launch, outlines what compliant frontier deployment looks like when capability and national security policy collide at speed.

Santage is committed to independent, transparent journalism. This article is produced in accordance with Santage's Editorial Standards and aims to provide accurate and timely information. Readers are encouraged to verify information independently.