Anthropic Code Leak Exposes Prompt Sandwich, Not Magic AI

Name: Tragic mistake... Anthropic leaks Claude’s source code
Uploaded: 2026-04-01T17:56:24.460138+00:00
Duration: 8 min 20 s
Channel: Fireship
Description: Summary and key takeaways on Anthropic Code Leak Exposes Prompt Sandwich, Not Magic AI, covering The Incident At 4:00 a.m. an npm package labeled version 2.1

Fireship

Apr 01, 2026

•

8 min video

•

3 min read

YouTube video ID: mBHRPeg8zPU

Source: YouTube video by Fireship — Watch original video

PDF

At 4:00 a.m. an npm package labeled version 2.1.88 unintentionally published the entire Claude Code source map, a 57 MB file containing more than 500,000 lines of TypeScript. The leak spread instantly, prompting Anthropic’s legal team to issue DMCA takedowns that could not keep up with the mirrors already proliferating across the internet. Within hours the community forked the code into “Claw Code,” a Python rewrite, and “openclaw,” a model‑agnostic variant that quickly amassed over 50,000 GitHub stars. Observers suspect the root cause was Bun.js serving source maps in production—a known vulnerability that had previously been flagged on GitHub.

Technical Revelations

Claude Code’s architecture is nothing more than a dynamic “prompt sandwich” that threads user input through eleven distinct processing steps before producing an answer. The system leans heavily on massive hard‑coded strings and guardrails rather than any mysterious, futuristic AI core. Among the more eyebrow‑raising tricks are the “anti‑distillation poison pills,” fake tools deliberately inserted into outputs to sabotage any competitor that tries to train on Claude’s data. When a model learns to call these non‑existent functions, its performance degrades dramatically.

Another quirky feature, dubbed “undercover mode,” forces the AI to masquerade as a human writer, erasing any trace of the model’s true identity from commit messages. A “frustration detector” watches user prompts with regular‑expression filters, logging dissatisfaction for later analysis. The codebase also contains a 1,000‑line Bash‑tool parser, a critical component for AI‑assisted coding assistants. Comments throughout the repository read like instructions for the AI itself, not for human developers, reinforcing the notion that the system is designed to iterate on its own code indefinitely.

Future Features & Roadmap

The leak also exposed a handful of internal feature flags and unreleased capabilities. “Buddy,” a Tamagotchi‑style digital pet, appears ready for user customization. “Chyris,” a background agent, maintains a daily journal and employs a “dream mode” to consolidate memories. Additional flags such as “Ultra plan,” “coordinator mode,” and “demon mode” hint at a roadmap that blends mundane productivity tools with whimsical AI personalities. References to “Opus 4.7,” “Capiara,” and other cryptic components suggest that Anthropic’s future releases may continue to blur the line between serious engineering and playful experimentation.

Industry Irony

Anthropic has long championed a “safety‑first” philosophy, keeping its models closed‑source to avoid misuse. Yet a single npm publish turned its top‑secret application into open‑source material for anyone willing to download it. The contrast is stark: “Officially making Anthropic more open than Open AI,” the leaked comments proclaim, while the underlying technology proves to be “basic programming concepts that have been around for 50 years combined with a bunch of prompt spaghetti.” The episode underscores a broader industry paradox—highly guarded AI systems can be undone by a mundane deployment mistake, exposing not only code but also the very security assumptions that companies rely on.

Security Risks

The code’s reliance on third‑party libraries such as Axios introduces additional attack surfaces, especially after reports that North Korean hackers compromised the library. Exposed guardrails and internal feature flags could be weaponized by malicious actors to manipulate model behavior or extract proprietary functionality. The public availability of the source map also reveals internal debugging tools and monitoring mechanisms, giving adversaries a roadmap for potential exploitation.

Takeaways

The accidental npm publish at 4:00 a.m. released over 500,000 lines of Claude Code, instantly spawning community forks like Claw Code.
Claude Code relies on an eleven‑step prompt sandwich architecture rather than any mysterious, next‑gen AI engine.
Anti‑distillation poison pills embed fake tools in outputs to sabotage competitors that train on Claude’s data.
Internal features such as Buddy, Chyris, and various mode flags reveal a roadmap mixing practical tools with whimsical AI companions.
The leak highlights the irony of Anthropic’s closed‑source stance, exposing security risks tied to dependencies like Axios and to exposed guardrails.

Frequently Asked Questions

What is the 'prompt sandwich' architecture in Claude Code?

The prompt sandwich processes inputs through eleven sequential steps, converting raw user text into a final response via layered prompt engineering. This design replaces a single monolithic model inference with a series of transformations, allowing fine‑grained control over AI behavior.

How do anti‑distillation poison pills work in the leaked code?

Poison pills insert references to non‑existent tools into Claude’s output. If a rival trains a model on this data, the model learns to call these fake functions, causing errors and performance loss. The technique deliberately degrades any downstream model that mimics Claude’s behavior.

Who is Fireship on YouTube?

Fireship is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Clean Code Robert Martin Book Recommended

This classic book covers the fundamental software engineering principles that the video contrasts against the 'prompt spaghetti' found in the leaked codebase.

Amazon →

Typescript Programming Language Reference Guide

The leaked codebase consists of over 500,000 lines of TypeScript; this helps users understand the language used to build the AI assistant.

Amazon →

Cybersecurity For Software Developers Book

The video highlights vulnerabilities like compromised dependencies and accidental source map leaks, making security knowledge essential for developers.

Amazon →

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

Summarize another video

Full Transcript YouTube

Yesterday, the most ironic thing ever
happened. Anthropic, a $380 billion
startup built on the idea of safety
first that advocates for closed source
software for the supposed benefit of
humanity, a company Elon calls
Missanthropic, whose logo is definitely
not a sphincter, whose CEO has been
warning us for years that human
programmers will be replaced by AI in 6
months. It just accidentally leaked
Claude Code's entire source code to the
internet at 4:00 a.m. officially making
Anthropic more open than Open AI. Within
minutes of the leak, Chiao Fan Sha, a
security researcher, discovered that
version 2.1.88 of the Claude Code MPM
package was shipped with a 57 megabyte
source map file. You know, that file
that's only used in development with the
full readable source code of your
project. This holy grail of leaks
containing over 500,000 lines of
Typescript code, it quickly spread
across the internet like wildfire.
Anthropic's legal team courageously
issued DMCA takedowns. But by the time
they woke up in San Francisco, it was
already too late. The code was mirrored
countless times and cloned by slop
tubers like Fireship, which the Supreme
Court says I can do legally, by the way,
as a worldrenowned journalist.
>> I know a lot about the law and various
other lawyerings. In today's video,
we'll look at all of the incredible
discoveries in the code that Anthropic
doesn't want you to know about, like its
anti-distillation poison pills, its
mysterious unreleased features, its
undercover mode, its regular
expressionbased frustration detector,
and many other super secret techniques
at the foreskin of AI research. No, this
is not an April Fool's Day joke. It is
April 1st, 2026, and you're watching the
code report. Unfortunately, my lawyer
just informed me that showing you
Anthropic TypeScript code would be a
violation of my parole, and I refused to
go back to jail. But luckily, the open
source community has already created a
loophole. Ironically, Claude's most
prolific user, he used OpenAI Codeex to
rewrite Claude Code's TypeScript code to
Python code, resulting in a new barely
legal project called Claw Code, and it's
already become the fastest repo in
history to surpass 50,000 GitHub stars.
Not only that, but somebody else
lopforked the leaked code and made it
work with any model that they're calling
this new project openclaw and it makes
projects like open code completely
obsolete. But maybe now that the code's
out in the open, Anthropic will just
make it open source. Somebody tried to
make a pull request with the leaked
code. But not surprisingly, it looks
like Anthropic already deleted it. What
a mother effing crazy 24 hours. But how
did this code end up leaked in the first
place? Well, as I mentioned, the source
map was accidentally packaged in an npm
release. But that's weird because build
tools normally strip out source maps
automatically. Well, Claude Code is
built on Bun.js, which as you might
recall was recently acquired by
Anthropic. And it just so happens that
about 3 weeks ago, somebody opened up an
issue on GitHub about bun.js serving
source maps in production. Wouldn't it
be ironic if the fastest JavaScript
runtime in the world also turned out to
be the fastest way to ship your entire
codebase to the internet? It's unclear
if that was the root cause. And it's
also possible that some unfortunate
developer did this by accident. Or
perhaps some rogue developer did it on
purpose. We may never know the truth,
but now it's time for the fun part. What
did we actually learn from the leak?
Well, first we learned that Claude uses
Axios. If you subscribe to my channel,
then you know that Axios was compromised
by North Korean hackers yesterday. The
exploit can install a remote access
Trojan on anyone using this package. And
in theory, if that happened on anthropic
servers, it could be a massive disaster.
But the next thing we learned is that
claude code is basically just a dynamic
prompt sandwich glued together with
TypeScript and not some magical piece of
futuristic technology. In a basic AI
chatbot, you typically have a hidden
system prompt that gets combined with
your prompt. Then the base model uses
statistics to regurgitate a bunch of
data it stole from the internet. But in
Claude Code, things are far more complex
with a total of 11 steps from input to
output. that somebody already vibecoded
a website that breaks down every step.
But the most interesting part about this
codebase is that it contains tons of
hard-coded instructions and guardrails
that basically beg Claude to please
don't do anything weird. Like there's
just file after file of these massive
hard-coded strings telling Claude to be
a good boy. And that's kind of
surprising because if this code were
ever leaked, it would instantly turn
from a black box into a blueprint for
Claude's competition. And ironically,
that's exactly what happened. What makes
that even more funny though is that
Claude was actively trying to stop their
competition from copying Claude code by
implementing anti-distillation poison
pills. It does that by pretending that
certain tools exist when in reality they
don't exist at all.
>> You're a big fat foy.
>> That means if you're some Chinese guy
trying to train a new model on Claude's
outputs, it's going to talk about tools
that don't exist which will point your
model in the wrong direction and just
make it suck. In reality, Claude code
only uses about 25 different tools or
so. And now the claw distillers know
exactly what to look for and they're
likely going to have a field day with
the bash tool. This file contains over a
thousand lines of code that helps the
large language model reliably parse and
execute bash commands which might be the
single most important feature in an AI
coding assistant. The next thing we need
to talk about though is undercover mode
which is a set of instructions that tell
Claude to never mention itself in commit
messages or outputs where the main idea
is to make the outputs look as human as
possible. The stated purpose of this
feature is to prevent things like model
code name leaks. But many have
speculated that the true purpose is more
deceptive. Like they're trying to
covertly use claude in open source
projects so AI code doesn't get
scrutinized when it breaks things
catastrophically. A very misanthropic
idea indeed. But the irony continues.
Another funny thing found in the code is
its regax frustration detector. Your
state-of-the-art AI model uses simple
regular expression matching against your
prompt to look for keywords like
 balls, and so on to determine if
you're not having a good experience
coding with Claude. If it detects a
match, it'll simply log an event. The
bottom line here is that we're not
looking at some sort of alien super
intelligence, but rather basic
programming concepts that have been
around for 50 years combined with a
bunch of prompt spaghetti. It's all just
an illusion. On top of that, this
codebase has a ton of comments. A lot
more comments than you would typically
find in a human-ritten codebase. And
what that tells us is that these
comments aren't actually written for
humans, but rather for the AI to write
its own AI coding tool in an infinite
loop. But perhaps the biggest problem
about this leak for anthropic is not the
code itself, but rather the feature
names and road map hidden within the
code. Like there's a hidden capability
under a feature flag called Buddy, which
appears to be a new Tamagotchi style
companion that every developer can
customize and raise like a little
digital pet. This might just be
Anthropic's April Fool's Day joke, but
there are also references to Opus 4.7
and a new model called Capiara, which
might be their new recently teased
mythos model. There's also things like
ultra plan, coordinator mode, and demon
mode. But perhaps the most interesting
is Chyus, which is a Greek word for an
exact moment in time or God's time. I
hate to beat off a dead horse here, but
it's a bit ironic that Anthropic didn't
get to reveal Chyris at the exact time
it wanted to, and instead God chose the
right time. The feature itself seems to
be some kind of background agent that
keeps a daily journal. It uses dream
mode to consolidate memories and does
work for you in the background on a
specific schedule, but pretty cool. But
at the end of the day, this leak is a
pretty huge setback for Anthropic, which
hopes to IPO later this year and offload
their bags to the retail public. And
it's yet another reminder that your top
secret application is just one npm
publish away from becoming open- source,
whether you'd like it or not. And that's
why the most advanced AI labs out there
use Temporal, the sponsor of today's
video. When you're building an AI
application, it's basically a giant
distributed system where one flaky API
can bring the whole thing to its knees.
that Temporal is an open-source durable
execution platform that solves this
problem. It lets you write workflows as
normal code that then handles all the
plumbing for state management, error
handling, and retrying failed steps. And
that's especially valuable for more
complex jobs because the longer your
agent runs, the more likely it is to
fail and have to start over from
scratch. A temporal will make sure your
agent survives and stays on track, so it
isn't forced to rerun the same steps and
burn a bunch of tokens. OpenAI uses
Temporal to orchestrate their workflows
for chat GPT images and codecs. And
thousands of nonAI companies like Airbnb
and Coinbase use it to power every
transaction. Try it out for free today
at the link below. This has been the
code report. Thanks for watching and I
will see you in the next one.