The U.S. White House and AI developer Anthropic are reportedly co-designing a new rulebook for AI safety.
This development stems directly from a recent, high-stakes event: the U.S. government ordered Anthropic to disable its most advanced models, Fable 5 and Mythos 5, worldwide. The reason was a reported 'jailbreak' vulnerability that raised national security alarms, suggesting the AI could be weaponized to discover and generate cyberattacks. This abrupt takedown created significant industry turmoil and sparked urgent calls for a more sophisticated way to manage AI risks than a simple on/off switch.
The path to this negotiation was paved by several key factors. First, the takedown itself created a vacuum that needed to be filled with a clear policy. Second, a wave of public pushback from cybersecurity leaders highlighted a critical flaw in the government's action; they argued that banning these models also robs defenders of a powerful tool, potentially making the country less secure. Third, the administration had already laid the legal groundwork. A recent Executive Order established a pre-release review process for 'frontier models', and a National Security Memorandum reinforced the government's authority to mandate safeguards, creating the perfect legal channels for this new framework.
This move is a clear example of the administration's preference for a 'shadow policy' over broad regulation. Rather than passing sweeping laws, it uses targeted executive orders, export controls, and now, collaborative frameworks to guide the AI industry. The new severity framework aims to convert this discretionary power into a transparent and repeatable process, providing clarity for both developers and government agencies.
The framework will likely define what constitutes a 'covered frontier model' and establish clear risk tiers. To do this, it will probably align with existing industry standards like the 'OWASP AIVSS', which provides a common language for scoring AI vulnerabilities. Based on a model's score, the framework would then dictate specific actions, from requiring additional safeguards to, in severe cases, justifying export controls or other government interventions.
Ultimately, this collaborative effort represents a crucial step toward creating a sustainable approach to AI governance. It seeks to balance the immense potential of AI for both harm and good, creating a structured system to ensure national security without completely stifling the innovation needed to advance cyber defense.
- Glossary
- Frontier Models: The most powerful and capable AI models currently in existence, which have the potential for both significant benefits and risks.
- Jailbreak: A technique used to bypass the safety features and restrictions built into an AI model, allowing it to perform tasks it was designed to refuse.
- OWASP AIVSS: The AI Vulnerability Severity Scoring System, an open framework for classifying the severity of security vulnerabilities in AI systems.
