Double security blow: Anthropic leaks Claude Code's source code days after exposing its Mythos model

31-03-2026 8:27:08 Escrito por: Leonardo Lopez Leidas 484

Anthropic's Black Week. The operational security of AI companies among tech leaders has never been under such intense scrutiny. Anthropic, the company that positions itself as the most security- and alignment-focused AI lab, has just experienced two critical data breaches in the span of five days, calling into question its own protection protocols.

On March 27, 2026, Fortune revealed that Anthropic had exposed nearly three thousand internal files, including details of its most powerful model to date, Claude Mythos, also known as Capybara.

Five days later, on March 31, the company suffered a second, even more severe leak: the accidental publication of the complete source code of Claude Code, its AI-assisted coding tool, on the public npm package registry.

These incidents are not mere isolated technical errors. They represent a troubling pattern of operational maturity in a company preparing for its initial public offering and selling its services precisely on the premise of being more secure and reliable than its competitors. For developers, companies that rely on these tools, and observers of the AI ecosystem, these breaches raise fundamental questions about the true strength of the internal security practices of those building the world's most powerful systems.

The first blow: Mythos and Capybara exposed

On March 26, security researchers discovered that Anthropic had stored draft content in a publicly accessible data lake due to a misconfiguration in its content management system. Among the exposed documents was a draft announcement about Claude Mythos, described by the company itself as its most capable model to date and a significant leap forward in AI performance.

The document revealed that Mythos, also referred to internally as Capybara, represents a new category above the current Opus, Sonnet, and Haiku models. According to the leaked draft, Capybara scores dramatically higher on software coding, academic reasoning, and cybersecurity tests compared to Claude Opus 4.6.

Even more worrying, the draft explicitly admitted that this model presents unprecedented cybersecurity risks. Anthropic acknowledges in the leaked document that Mythos could be used to find and exploit software vulnerabilities much faster than current tools, increasing the risk of more frequent and large-scale cyberattacks if it falls into the wrong hands.

The company attributed this incident to human error in the configuration of its CMS, where assets are set as public by default unless the user explicitly changes the setting.

This explanation, while technically plausible, does not eliminate concerns about insufficient access controls for information of high strategic value.

The second blow: half a million lines of code exposed

While the Mythos leak was serious, the March 31 incident was potentially catastrophic for Anthropic's competitive edge. Version 2.1.88 of Claude Code accidentally included a source map file that provided access to approximately 512,000 lines of TypeScript code spread across 1,900 files.

Claude Code is perhaps Anthropic's most popular product and has experienced breakneck adoption rates in large enterprises. Unlike the underlying language models, Claude Code includes a software harness that instructs the model on how to use other tools, provides important guardrails, and governs its behavior. It is this agentic harness—the layer that transforms a language model into an autonomous coding assistant—that was exposed.

The leak occurred when an internal .map file used for debugging was accidentally included in a routine update posted to npm, the platform developers use to share and update software. This file pointed to a zip file on Anthropic's cloud storage containing the complete source code.

Within hours, the codebase was copied and dissected on GitHub, quickly accumulating over 50,000 forks. Anthropic issued takedown notices, but the distributed nature of the internet makes it impossible to guarantee that the code won't persist in multiple private repositories.

What the code reveals: future features and internal architecture

Analysis of the leaked code has provided the technical community with unprecedented insight into Anthropic's product roadmap and internal architecture. Among the most significant findings:

The code exposed dozens of feature flags for capabilities that appear fully built but not yet released, including a session review system that allows Claude to study its past interactions to improve future conversations , a persistent assistant that operates in background mode even when the user is idle, and remote capabilities that allow Claude to be controlled from mobile phones or other browsers.

Researchers also discovered KAIROS, a module that enables autonomous background execution to consolidate memory and clean up context while the user is idle.

The three-layer memory system, based on MEMORY.md index files, on-demand topic references, and strict write discipline to avoid context corruption, offers insights into how Anthropic tackles one of the biggest challenges in autonomous agent orchestration.

A more idiosyncratic finding was a Tamagotchi-like feature: a virtual pet that sits next to the input box and reacts to the user's coding activity. A covert mode was also discovered, designed for secret contributions to public repositories, with systems to prevent leaks of identity or credentials.

Security and competitive implications

The leak of the source code presents multifaceted risks that extend beyond the mere loss of intellectual property. Roy Paz, senior AI security researcher at LayerX Security, points out that while the leaks did not expose the model weights themselves, they revealed non-public details about how the systems work, including internal APIs and processes. This information could help sophisticated actors better understand the architecture of Anthropic's models and how they are deployed, which in turn could inform attempts to circumvent existing protections.

From a competitive perspective, the leak provides rivals with free education on how to build a production-grade coding agent.

Competitors can study the architecture, design patterns, and technical decisions that Anthropic has refined through costly iterations. Some developers have already begun creating open-source versions based on the leaked code.

More fundamentally, these incidents erode Anthropic's central narrative as the safest and most responsible AI lab. When a company that warns about the cybersecurity risks of its own models cannot protect its own internal systems, the credibility of its external warnings is undermined.

aspect	Mythos/Capybara filtration	Claude Code leak
date of discovery	March 26, 2026	March 31, 2026
type of exposure	blog drafts in public data lake	source code in npm log
affected volume	~3,000 files, internal documents	~512,000 lines of code, 1,900 files
information revealed	specifications of unreleased model, security risks, executive event plans	complete product architecture, future features, internal APIs, guardrail logic
attributed cause	Human error in CMS configuration	packaging error in release
competitive impact	high, reveals strategic roadmap	severe, exposes technical implementation secrets
security risk	medium, information on future capabilities	high, possible circumvention of protections
anthropic response	removal of public access, error confirmation	disposal notices, promised preventative measures

This comparison illustrates how the second incident, although technically similar in causes, represents a significantly larger scale of exposure with more lasting implications for the company's competitive position.

The broader context: Claude's recent vulnerabilities

These leaks don't happen in a vacuum. Throughout 2026, Anthropic has faced multiple security challenges that suggest tensions between the speed of innovation and operational maturity.

In March 2026, researchers at Oasis Security discovered three vulnerabilities in Claude.ai, collectively dubbed Claudy Day. These allowed for invisible prompt injection via URL parameters, data exfiltration using the Anthropic file API, and open redirection to claude.com.

The attack chain allowed an adversary to steal conversation history and sensitive data without the user noticing.

In February 2025, an early version of Claude Code had already accidentally exposed its original code in an incident similar to the current one. The pattern of repeated leaks of the same product suggests systemic deficiencies in software release processes beyond mere isolated human errors.

Additionally, in March 2025, it was discovered that hacking groups linked to China had used Claude Code in an espionage campaign targeting approximately 30 organizations, including technology companies, financial institutions, and government agencies. Although Anthropic detected and blocked the activity, the incident underscores how Anthropic's AI tools have become high-value targets for sophisticated threat actors.

Anthropic's response and its limitations

Following the Claude Code leak, Anthropic issued a statement downplaying the incident: "Some internal source code was included in a Claude Code release. No sensitive customer data or credentials were involved or exposed. This was a release packaging issue caused by human error, not a security breach. We are implementing measures to prevent this from happening again."

This characterization as a packaging error rather than a security breach is technically accurate but semantically evasive. The distinction matters little to competitors who now have visibility into the architecture of Anthropic's flagship product, or to enterprise customers evaluating the vendor's operational reliability.

The promise to implement preventative measures, heard after every security incident in the technology industry, must translate into tangible process changes.

The repeated leaks of Claude Code source code over thirteen months suggest that the measures implemented after the February 2025 incident were insufficient.

Timeline and flow of the Claude Code source code leak, from the packaging error to its distribution on GitHub with over 50,000 forks. Source: Infographic based on Fortune and Axios reports, March 31, 2026.

Lessons for the AI ecosystem

These incidents offer relevant lessons beyond Anthropic individually. For companies that build or deploy AI systems, several conclusions emerge.

First , operational security must evolve at the same pace as product capabilities. Anthropic has clearly prioritized developing more capable models and more autonomous tools, but its asset protection processes do not appear to have scaled proportionally. The complexity of agentic systems operating with greater autonomy inherently amplifies the impact of security failures.

Second , the default settings in content management and software release tools should be restrictive, not permissive. Both leaks resulted from configurations where the least secure option was the default, requiring explicit human action to protect sensitive information. This design pattern consistently produces human error with disproportionate consequences.

Third , transparency about security incidents, while uncomfortable, builds more long-term trust than minimization. Characterizing these incidents as packaging or configuration errors, while technically accurate, downplays the severity of the impact. AI companies that aspire to lead in accountability must model the accountability they expect from others.

Exposed architecture by Claude Cod

Claude Code architectural diagram based on analysis of the leaked source code, showing key components such as the MEMORY.md memory system, the KAIROS background execution module, and the agentic orchestration layer. Source: Startup Ecosystem Technical Analysis, March 31, 2026.

Writer's suggestion: A perspective from the technological trenches

As an observer of the enterprise AI ecosystem and its intersection with security operations, I want to share reflections that transcend the immediate headlines of these leaks.

First suggestion: distinguish between model security and operational security. Anthropic has invested heavily in model alignment and security, areas where it genuinely leads the industry. However, these leaks demonstrate that operational security—the human and technical processes that protect the company's assets—remains just as critical as the technical security of the model. A company can have the most secure model in the world and still be compromised through faulty CMS configurations.

Second suggestion: Evaluate AI vendors holistically. For companies selecting AI partners, these incidents serve as reminders that the evaluation must extend beyond model capabilities to the vendor's operational maturity. Questions about software release processes, access controls, and incident response are just as important as model performance benchmarks.

Third suggestion: recognize that the speed of innovation creates security debt. Anthropic operates in a hyper-competitive market where time to market determines market share. This pressure inevitably compromises the rigor of controls. As users and observers, we must calibrate our security expectations by acknowledging these structural tensions, without excusing them.

Fourth suggestion: Leverage inadvertent transparency constructively. Leaked code, though obtained irregularly, offers the technical community learning opportunities about agentic systems architecture. Developers can study these patterns to improve their own implementations, and security researchers can identify potential attack vectors to report responsibly. The information is available; its ethical use is an individual choice.

Fifth suggestion: anticipate increased regulation. These incidents come at a time of growing regulatory scrutiny of AI security. We are likely to see stricter incident disclosure requirements and operational security standards for high-performance AI companies. Companies that proactively anticipate these standards will have a regulatory competitive advantage.

Competitive landscape of the AI-assisted coding tools market in 2026, showing Anthropic's positioning with Claude Code against key competitors. Source: Market analysis based on enterprise adoption data, March 2026.

Rebuilding trust in the age of agentic AI

The March 2026 leaks will not determine Anthropic's fate. The company maintains significant technical advantages, a loyal user base, and considerable financial resources. However, these incidents do mark a turning point in how the company will be evaluated, both by the market and by society at large.

The transition to more autonomous and agentic AI systems, which Anthropic is leading with Claude Code and soon with Mythos/Capybara, exponentially increases the stakes for operational security . When AI systems can act independently in digital environments, security breaches are not limited to data exposure; they extend to unauthorized actions, system manipulation, and real-world consequences.

Anthropic has an opportunity to transform these incidents into demonstrations of maturity by implementing the rigorous controls that its own warnings about AI risks suggest are necessary. The alternative—a continued series of leaks and exposures—will erode the fundamental trust needed for the widespread adoption of autonomous AI systems.

For the broader ecosystem, these leaks serve as a timely reminder that the AI revolution is not yet complete, even in its early stages. The world's most advanced systems are still operated by human organizations with imperfect processes, competitive pressures, and resource constraints. AI security is not just a technical issue of model alignment; it is an organizational challenge encompassing operations, culture, and governance.

The question these incidents raise is not whether Anthropic can recover, but whether the entire industry can learn the necessary lessons before AI systems become powerful enough for their failures to be irreversible.

Sources

Fortune. Anthropic is testing Mythos, its most powerful AI model ever developed. March 26, 2026. Exclusive report on the initial leak of the Capybara/Mythos model and the European executive event.

Fortune. Anthropic leaks its own AI coding tool's source code in second major security breach. March 31, 2026. Report on the Claude Code source code leak and its implications.

Axios. Anthropic leaked its own source code. March 31, 2026. Coverage of the source code incident and future features revealed.

The Verge. Claude Code leak exposes a Tamagotchi-style pet and an always-on agent. March 31, 2026. Analysis of specific features discovered in the leaked code.

Startup Ecosystem. Claude Code leak: impact on AI and startup security. March 31, 2026. Technical analysis of the leaked code and memory architecture.

The Economic Times. Claude Mythos: Leak spills details on Anthropic's new AI model, its most powerful yet. March 27, 2026. International coverage of the Mythos model leak.

Paubox. Claude code exploited in Mexican government cyberattack. March 5, 2026. Report on the previous malicious use of Claude Code by threat actors.

Oasis Security. Claude.ai prompt injection vulnerability. March 18, 2026. Claudy Day Vulnerability Technical Disclosure.

Double security blow: Anthropic leaks Claude Code's source code days after exposing its Mythos model

The first blow: Mythos and Capybara exposed

The second blow: half a million lines of code exposed