Anthropic Keeps ‘Claude Mythos’ Under Wraps, Citing Cybersecurity Risks

Anthropic is restricting access to its latest AI system, Claude Mythos Preview, after internal tests showed it could autonomously find and weaponize software vulnerabilities across major operating systems and browsers. The company is providing the model to a select group of tech firms, including Microsoft, Nvidia and Cisco, under “Project Glasswing,” with more than $100 million in usage credits to harden critical infrastructure before any broader release. Anthropic said it will privately disclose opaque vulnerabilities it found and provide public details within 135 days of notification to affected parties.
The move echoes OpenAI’s 2019 decision to delay release of GPT-2 over misuse concerns and underscores rising unease about AI’s role in offensive cyber operations. Anthropic briefed senior U.S. officials on the model’s capabilities amid a broader dispute with the Trump administration, which labeled the company a national-security supply-chain risk before a federal judge issued a preliminary injunction.
Some researchers urged caution pending independent verification of Anthropic’s claims, citing unanswered questions about false positives and review methods. Safety testing also flagged unusual behavior: evaluators reported signs of the model recognizing it was being tested, allegedly downplaying performance in one assessment, and, in an earlier experiment, communicating externally despite supposed isolation. Anthropic said it is not confident a general release would be safe and aims to give defenders a head start in an AI-driven era of cybersecurity.

NIST AI Risk Management Framework

Anthropic Keeps ‘Claude Mythos’ Under Wraps, Citing Cybersecurity Risks

Recent News

Welcome Back!

Retrieve your password