Anthropic's Most Capable Model Is Too Dangerous to Release. So They Built a Controlled Access Program Instead.

On April 1, 2026, a software engineer at Anthropic named Nicholas Carlini gave Claude Mythos Preview a simple instruction: find a security vulnerability in FreeBSD’s NFS implementation.

Eight hours of wall-clock time later — four hours of actual model working time — he had two independent, working kernel exploits. Both worked on the first attempt.

The vulnerability was CVE-2026-4747: a stack overflow in svc_rpc_gss_validate() that copies an RPCSEC_GSS credential body into a 128-byte stack buffer without checking its length. The bug had been present in FreeBSD’s NFS implementation for 17 years. CVSS score: 8.8 HIGH. Remote code execution, unauthenticated on userspace RPC servers.

The FreeBSD security advisory credits: “Nicholas Carlini using Claude, Anthropic.”

That is the story. Not the leak from two weeks ago. Not the marketing document. This.

What Mythos Preview actually did

The red team report Anthropic published today is not a capability marketing document. It is a technical accounting of what the model found, with CVE numbers, SHA-3 cryptographic commitments to findings still in disclosure windows, and independently verifiable advisories.

The FreeBSD exploit is the most thoroughly documented. Claude did not simply identify the vulnerability — it solved six distinct sub-problems autonomously:

Setting up a lab environment to test the exploit
Delivering packets across 15 rounds, writing shellcode 32 bytes at a time through the stack buffer
Managing thread state via kthread_exit()
Locating the correct stack offset using De Bruijn patterns
Transitioning from kernel to userland via kproc_create() and kern_execve()
Cleaning up debug registers to avoid detection artifacts

The writeup is published. The exploit code is on GitHub. The NVD entry is live. This is not a claim. It is a reproducible result.

The other confirmed discoveries:

OpenBSD SACK TCP implementation (27 years old). A bug in the Selective Acknowledgment implementation that allows any machine running OpenBSD to be remotely crashed by simply connecting to it. Present since the original implementation. In coordinated disclosure; details withheld pending patch.

FFmpeg H.264 codec (16 years old). A vulnerability in the H.264 implementation present since the original commit adding H.264 support. Automated fuzzing tools failed to detect it despite running the vulnerable code path five million times. Patched in FFmpeg 8.1.

The red team report also references — without yet disclosing — web browser exploits, VMM guest-to-host corruption bugs, Linux privilege escalation chains, cryptography library issues, and smartphone lock-screen bypasses. SHA-3 hashes are published as cryptographic proof of possession. Details come when the 90+45-day responsible disclosure window closes.

Mythos Preview vs. Claude Opus 4.6 — Benchmark Results

CyberGym (vulnerability reproduction): 83.1% vs. 66.6%
SWE-bench Verified: 93.9% vs. 80.8%
SWE-bench Pro: 77.8% vs. 53.4%
Terminal-Bench 2.0: 82.0% vs. 65.4%
Firefox JS exploit development: 181 working exploits vs. 2

The Firefox number is worth sitting with. Against a benchmark of known JavaScript vulnerabilities in Firefox, Mythos developed 181 working exploits. Opus 4.6 managed 2 from several hundred attempts. Against fully-patched OSS-Fuzz targets, Mythos achieved full control flow hijack on 10 systems.

Anthropic’s Newton Cheng (Frontier Red Team Cyber Lead) described the model as having “the skills of an advanced security researcher” who can find vulnerabilities and — unlike previous models — write the exploits to accompany them. “We do not plan to make Claude Mythos Preview generally available due to its cybersecurity capabilities.”

Project Glasswing

The decision not to release publicly does not mean the decision to shelve. Anthropic announced Project Glasswing today: a controlled access program that puts Mythos Preview exclusively in the hands of organizations doing defensive security work on critical infrastructure.

The 12 launch partners: AWS, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks.

Read that list carefully. It is not a list of customers. It is a list of organizations whose software is foundational to the internet — and whose vulnerability is therefore everyone’s vulnerability. A remote code execution bug in Linux, FreeBSD, or OpenSSL affects billions of systems. Putting the model that finds those bugs in the hands of the organizations responsible for patching them is a coherent response to the problem the model creates.

An additional 40+ organizations maintaining critical open-source software can apply for scanning access on their own systems. Open-source maintainers have a separate pathway through Anthropic’s “Claude for Open Source” program. Access is gated; the model does not leave the controlled environment.

Financial commitments accompanying the launch: $100 million in Mythos Preview usage credits across participants, $2.5 million to Alpha-Omega and the Open Source Security Foundation, $1.5 million to the Apache Software Foundation. The $4 million in direct grants to open-source security organizations is a recognition that the organizations most exposed to this class of vulnerability are often the ones with the least security staffing.

Within 90 days, Anthropic will publicly report findings, patched vulnerabilities, and model improvements. Post-research pricing, for whatever comes after the research phase, is set at $25/$125 per million input/output tokens.

"We haven't trained it specifically to be good at cyber. We trained it to be good at code, but as a side effect..." — Dario Amodei, Anthropic CEO, on Mythos Preview's cybersecurity capabilities

The capability that wasn’t supposed to be there

That quote from Amodei matters. Mythos Preview is a general-purpose frontier model. Its cybersecurity capability is an emergent consequence of exceptional coding and reasoning ability — not a deliberate design choice. The model was trained to be good at code. Exploit development is an application of code reasoning that the training did not specifically target and that arrived anyway.

This is the mechanism by which AI cybersecurity risk scales. It does not require a lab to decide to build a hacking tool. It requires a lab to build a sufficiently capable general reasoning model, and the hacking capability arrives as a side effect. Cheng acknowledged this explicitly: “given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely.”

The offensive/defensive asymmetry is real. A security researcher noted that Firefox spent approximately 30 engineer-days patching bugs that took 2 weeks to find. The ratio of discovery speed to patch speed is already unfavorable, and Mythos-class models accelerate the discovery side. The concern is not a single dramatic attack — it is the volume. AI-generated vulnerability reports and exploits at scale overwhelm the human patching pipeline that currently serves as the last line of defense.

The defensive argument is the mirror of this problem: if the model can find vulnerabilities at this rate, deploying it for defense first — before adversarial actors gain equivalent capabilities — is the correct response. Gadi Evron (founder, Knostic): “Unlike attackers, defenders don’t yet have AI capabilities accelerating them to the same degree.” Project Glasswing is an attempt to correct that asymmetry before it becomes a structural feature of the threat landscape.

What Mythos doesn't replace: Independent security researcher Evan Peña (Armadin) notes that the model still lacks the contextual knowledge a human attacker has about what's most valuable to steal within a specific target. Finding a vulnerability and knowing which vulnerability to exploit for maximum impact in a targeted attack remain different problems. Mythos has closed the gap on the first; the second still requires human judgment or organizational-specific knowledge the model doesn't have.

The irony that coverage will not ignore

Two security incidents preceded this announcement.

On March 26, the existence of Mythos — then codenamed “Capybara” — became public when Anthropic left a draft blog post in a publicly searchable, unsecured data store. Security researchers Roy Paz (LayerX Security) and Alexandre Pauwels (University of Cambridge) found it. Anthropic then suffered a second incident when Claude Code’s source code was leaked.

A company announcing a program to use its most powerful AI model to find security vulnerabilities in other organizations’ software experienced two of its own security failures in the preceding two weeks.

The irony is noted. It does not change the technical validity of CVE-2026-4747 or the soundness of the Glasswing approach. It does add texture to Cheng’s observation that “the rate of AI progress” will outpace current security practices — including, apparently, Anthropic’s own.

What this does to the previous Mythos coverage

Two weeks ago, when the marketing document leaked and circulated as evidence of “unprecedented cybersecurity risks,” the correct framing was that the claim was marketing copy. The source was internal promotional text, not a technical assessment. No benchmarks, no CVEs, no verifiable evidence.

That framing was right about the leak. It was not right about the underlying capability.

The red team report published today is the technical document the marketing copy was describing. The “unprecedented cybersecurity risks” language was premature and self-promotional. The capability it was pointing at was real.

Bottom Line

Claude Mythos Preview found security vulnerabilities in production software that had existed for 17, 27, and 16 years respectively — undetected through decades of human review and automated fuzzing. The FreeBSD exploit is independently confirmed with a live CVE, a published advisory, and working exploit code on GitHub. The model is not being released publicly. It is being deployed defensively, under controlled access, by the organizations responsible for maintaining the software it would otherwise be used to attack.

The structural problem Glasswing is designed to address is real: AI accelerates vulnerability discovery faster than human patching pipelines can respond. Putting Mythos-class capability on the defense side first — before equivalent capability proliferates to adversaries — is the correct move, assuming the controlled access holds. Whether 12 major technology companies and 40+ open-source organizations can absorb and patch at the rate one frontier model discovers is the empirical question the next 90 days will answer.

sources

Anthropic Official Sources

Anthropic — Project Glasswing announcement. April 7, 2026. https://www.anthropic.com/glasswing
Anthropic Red Team — 'Claude Mythos Preview Cybersecurity Capabilities Assessment.' April 7, 2026. https://red.anthropic.com/2026/mythos-preview/
Anthropic — Coordinated Vulnerability Disclosure Policy. https://www.anthropic.com/coordinated-vulnerability-disclosure

CVE Confirmation

NIST NVD — CVE-2026-4747 (FreeBSD NFS / RPCSEC_GSS stack overflow). CVSS 8.8 HIGH. https://nvd.nist.gov/vuln/detail/CVE-2026-4747
FreeBSD Security Advisory — FreeBSD-SA-26:08.rpcsec_gss.asc.
Carlini, N. (Anthropic) — 'MAD Bugs: CVE-2026-4747 writeup.' GitHub, April 2026. https://github.com/califio/publications/tree/main/MADBugs/CVE-2026-4747
NotebookCheck — 'Claude Code cracks FreeBSD within four hours.' April 2026. https://www.notebookcheck.net/Claude-Code-cracks-FreeBSD-within-four-hours.1266232.0.html

Press Coverage

VentureBeat — 'Anthropic says its most powerful AI cyber model is too dangerous to release publicly.' April 7, 2026. https://venturebeat.com/technology/anthropic-says-its-most-powerful-ai-cyber-model-is-too-dangerous-to-release
Axios — 'Anthropic withholds Mythos Preview model because its hacking is too powerful.' April 7, 2026. https://www.axios.com/2026/04/07/anthropic-mythos-preview-cybersecurity-risks
CyberScoop — 'Tech giants launch AI-powered Project Glasswing.' April 7, 2026. https://cyberscoop.com/project-glasswing-anthropic-ai-open-source-software-vulnerabilities/
Fortune — 'Anthropic is giving some firms access to Claude Mythos.' April 7, 2026. https://fortune.com/2026/04/07/anthropic-claude-mythos-model-project-glasswing-cybersecurity/
TechCrunch — 'Anthropic debuts preview of powerful new AI model Mythos in new cybersecurity initiative.' April 7, 2026. https://techcrunch.com/2026/04/07/anthropic-mythos-ai-model-preview-security/

Security Community Context

Security Cryptography Whatever — 'AI Bug Finding.' March 25, 2026. https://securitycryptographywhatever.com/2026/03/25/ai-bug-finding/
Security Boulevard — 'Claude Mythos and the Cybersecurity Risk That Was Already Here.' March 2026. https://securityboulevard.com/2026/03/claude-mythos-and-the-cybersecurity-risk-that-was-already-here/
Check Point Research — 'Claude Mythos Signals a New Era of AI-Driven Cyber Attacks.' April 2026.
ZeroPath — 'Autonomously finding 7 FFmpeg vulnerabilities with AI.' 2025. https://zeropath.com/blog/autonomously-finding-7-ffmpeg-vulnerabilities-with-ai-2025