Introduction to AI Inefficiency

Name: DeepSeek Just Fixed One Of The Biggest Problems With AI
Uploaded: 2026-03-26T07:29:07.150952+00:00
Duration: 9 min 46 s
Channel: Two Minute Papers
Description: Summary and key takeaways on DeepSeek Just Fixed One Of The Biggest Problems With AI — Summary, covering to AI Inefficiency Modern AI systems are described as

Two Minute Papers

Mar 26, 2026

•

9 min video

•

2 min read

YouTube video ID: DmtoVnTkQnM

Source: YouTube video by Two Minute Papers — Watch original video

PDF

Modern AI systems are described as “really silly” because they reconstruct answers from scratch for even simple facts. When a user asks for a sandwich, the model “literally plants that peanut” and runs through multiple reasoning layers, a process the brief calls a “massive waste of compute.” Standard transformers, which power most AI assistants such as ChatGPT and Gemini, lack a simple lookup mechanism, forcing them to generate every response anew.

DeepSeek AI’s Engram Solution

DeepSeek AI introduced a technology called Engram, which the brief likens to a pantry for the AI. Instead of generating everything from the ground up, the model can “grab ingredients from the pantry,” allowing it to retrieve stored facts quickly. This pantry‑style approach is said to make the AI “way more efficient.”

Performance and Surprising Results

Replacing a portion of the model’s complex reasoning components—specifically a mixture of experts (MoE)—with Engram produced an unexpected boost in intelligence. Loss curves showed “significant improvement,” indicating fewer mistakes. The hybrid system achieved “a perfect balance of active cooking and just grabbing from the pantry.” Engram also includes a “context‑aware gating mechanism” that checks retrieved ingredients against the current task before use, preventing irrelevant data from being applied. Across all benchmarks the brief reports that Engram “improved AI performance everywhere,” calling the effect “absolute miracle work.”

Mechanism of Engram

Engram relies on “n‑gram embeddings combined with multi‑head hashing.” The brief compares this to a chef reading a three‑word phrase on an order ticket and instantly knowing which shelf holds the premade sauce. In practice, Engram functions like a lookup table, making processes more efficient. Experiments that removed 20‑25 % of the “smart experts” and replaced them with the Engram “spreadsheet” showed performance gains. When Engram memory was switched off, trivia answering dropped by 70 %, while reading comprehension stayed at 93 %, suggesting the model splits its “brain” to use Engram for fact storage. Locking the “pantry door” (disabling Engram) did not affect recipe understanding, which also remained at 93 %.

Real‑World Implications and Limitations

The pantry model implies that future AI architectures could separate fact storage from reasoning, allowing each component to specialize. Efficiency gains may reduce compute costs, while the context‑aware gating helps maintain relevance. However, the brief notes that disabling Engram harms trivia performance, indicating reliance on the pantry for factual recall. The approach may therefore require careful integration to preserve reasoning abilities while leveraging fast retrieval.

Conclusion

Engram demonstrates that a simple lookup‑style component can both cut wasteful computation and raise overall model performance. By giving the AI a “pantry,” DeepSeek AI shows a path toward more efficient and smarter systems, hinting at a split‑brain architecture where factual knowledge is stored separately from reasoning processes.

Takeaways

Modern AI models rebuild answers from scratch, wasting compute on simple facts.
DeepSeek AI’s Engram acts as a pantry, letting the model retrieve stored information instead of regenerating it.
Replacing 20‑25% of mixture‑of‑experts components with Engram improves loss curves and overall benchmark scores.
Engram’s context‑aware gating checks retrieved data against the current task, preventing irrelevant use.
When Engram is disabled, trivia accuracy falls 70% while reading comprehension stays at 93%, indicating a split‑brain design.

Frequently Asked Questions

How does Engram's pantry system improve AI efficiency?

Engram provides a fast lookup table that stores facts as n‑gram embeddings with multi‑head hashing, allowing the model to retrieve information instead of recomputing it. This reduces the number of reasoning layers needed for simple queries, cutting compute waste and speeding up responses.

What evidence shows that removing MoE and adding Engram makes AI smarter?

Experiments that removed 20‑25% of mixture‑of‑experts (MoE) components and replaced them with Engram showed lower loss curves and better benchmark results. The brief describes the outcome as the AI becoming “smarter” and achieving “absolute miracle work” across all tests.

Who is Two Minute Papers on YouTube?

Two Minute Papers is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Ai Research Books On Neural Networks Recommended

The video discusses advanced AI concepts like Engram technology, AI inefficiency, and AI architecture, which are topics covered in books on neural networks and AI research.

Amazon →

Computer Science Textbooks On Machine Learning

The video delves into the technical mechanisms of AI, such as n-gram embeddings, multi-head hashing, and transformer architectures, which are core subjects in machine learning textbooks.

Amazon →

Books On Ai Efficiency And Optimization

A central theme of the video is AI inefficiency and how Engram technology offers a solution for optimization, making it relevant to books focused on improving AI performance.

Amazon →

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

Summarize another video

Full Transcript YouTube

Few people know, but modern AI systems are 
really silly. How? Well, imagine having a  
Michelin-star chef being asked for a simple peanut 
butter sandwich. That’s weird, but okay. Now,  
the chef says, you’ll need to wait just a bit, 
because I am going to start planting peanuts,  
wait six months, harvest, churn some peanut 
butter, and then get to work on your bread.  
That sounds really silly, and that is exactly 
what modern AI systems like ChatGPT and Gemini do.
When they need to recall a simple 
fact, like who Alexander the Great was,  
something crazy happens. They 
go through complex reasoning  
layers and reconstruct everything from 
scratch every single time. That is crazy.
Now I have an amazing research paper for you 
here from folks at DeepSeek AI, and this is a  
piece of technology that might underpin most if 
not all of the amazing AI systems of the future.  
Now every now and then we are going to look at a 
figure, but for the rest, I am going to bring my  
physics simulations and all the goodness we 
talk about around here. Apologies for that.
Okay, so this is a massive waste of 
compute. But why does this happen? Well,  
standard transformers are a kind of neural 
network that is inside nearly all modern AI  
assistants. And here is the problem: they lack 
a simple and cheap way to just look things up.
Whatever the question is, the answer is a huge 
bunch of dense mathematical calculations. From  
scratch. Yes, it is literally planting 
that peanut when you ask for a sandwich.
Now in this work, DeepSeek introduces 
Engram. With this, they are giving our  
tired little chef a pantry. Cutting 
edge technology brother! Instead of  
growing that peanut butter sandwich 
from scratch, it now just grabs the  
ingredients from the pantry. I’ll explain 
to you how exactly they did it in a bit.
Now, this makes the AI way more efficient, okay, 
I expected that. But, what? Are you seeing what  
I am seeing? Now this I did not expect at all. 
So here comes the surprise. Now hold on to your  
papers Fellow Scholars, because when taking away 
some of the AI's complex reasoning parts, known as  
mixture of experts, MoE. Then, replacing it with 
the pantry actually makes the AI smarter. Lower  
is better here on the loss curves. And not just a 
little, this is significantly smarter. Those dots  
dipping way down show that this hybrid chef makes 
far fewer mistakes than previous techniques. It  
achieves a perfect balance of active cooking 
and just grabbing from the pantry. Genius.
But that is not the only surprising thing in this 
paper. They also added a way for the AI to check  
the ingredients before using them. You don’t 
want rotting fish in your strawberry jam. To  
ensure this, they created a context-aware gating 
mechanism. The current context is the dish being  
cooked. Now here, this is compared against the 
retrieved memory, the jar from the pantry. If  
the jar's contents don’t agree with the dish, 
the gate drops to zero, throwing the ingredient  
away. Bye bye rotting fish! This mechanism lives 
right here, inside this jolly little dot product.
Now let’s see how it actually performs against 
the current systems. I’ll tell you exactly what  
is going to happen now. What happens in nearly 
all research papers with something new. It  
does something, it is compared to previous 
methods, and it’s better at some things,  
worse at others. And then you sit down and you 
do your analysis. Okay, let’s see…wait what? What  
just happened here? The new engram technique 
makes the neural network better…everywhere.  
Absolutely everything is measurably 
better. This is an absolute miracle work.
The engram model is actually better 
on every single benchmark compared to  
the previous techniques. It is better everywhere!
Now this is an amazing life lesson too. How? Well,  
essentially what DeepSeek does is 
automates the easy part, and focuses  
on the more difficult tasks. No wonder it 
works so well! What a time to be alive! We  
can learn so much from these research papers, 
and not just about AI, but about life itself.
Okay, now I’ll tell you how this works, and 
it turns out, there are more surprises ahead.  
Dear Fellow Scholars, this is Two Minute 
Papers with Dr. Károly Zsolnai-Fehér.
Okay, so how does it do this magic? 
Well, it uses what they call n-gram  
embeddings combined with multi-head hashing. 
Okay, what the heck does that mean? Well,  
in the kitchen, the chef looks at the order 
ticket, sees a 3-word phrase, and instantly  
knows exactly which shelf in the pantry has 
the premade sauce, and grabs it quickly.
And I think this also shows us that there are 
simple and basic ideas in AI that we haven’t  
found yet. I mean, this thing is basically 
a look up table. It is as simple as it gets,  
and it makes everything more efficient and 
better across the board. Just think about it:  
we removed 20 or 25% of the smart 
experts in this little virtual brain,  
put a spreadsheet there, and it 
got better! I mean what? Crazy.
And I love how we have a little 
better understanding of the AI  
system itself. Usually, no one 
knows what is going on inside,  
but here. Look. When they switched off 
the engram memory during testing, the  
AI’s ability to answer trivia went down 70%. But 
its reading comprehension remained at 93%. Why?  
Well I think this shows that AI split its brain, 
and it’s using the new part just to store facts.
Just think about it. When they locked the 
pantry door during testing his ability to  
understand a recipe stayed at a massive 
93%! What does that mean? It shows the  
chef split the work perfectly. He used 
the pantry strictly as a storage shelf  
for memorized ingredients, but he 
can still cook an amazing meal.
I think this is going to lead to even 
cheaper and even smarter AI systems,  
and this will be an important part of why we will 
all get more systems that we can actually own,  
no subscriptions, these run in our 
pockets super fast, mostly for free.
Okay, now not even this technique is perfect. 
One limitation is that if you put the engram  
module too deep in the network, it gets less 
accurate because the model has already wasted  
time processing what is being asked. Of course, 
there is no need to look up what you already  
computed. I think this is common sense at this 
point. Our chef has to check the pantry at the  
start of the shift. If he checks it after the 
food was served, the pantry is completely useless.
A really advanced research paper explained 
in simple words. We are Fellow Scholars,  
and that’s what we do here. And we have a growing 
club. I’ll continue in a moment, but you know  
who is also watching us? The one and only Larry 
Wheels. Yes. He is one of our OG Fellow Scholars,  
doing some Scholarly work between two hard sets 
of bicep curls in the gym. You think I am kidding?  
I am not. Link is in the description.
Reading his comment made me instantly  
more muscular. So much value. Huge respect 
to Mr. Wheels! Honored to have you here.
And here comes the best part. I think this 
will be a part of every major AI system,  
and it is knowledge out there for free for all 
of us, and now you know exactly how it works!  
No nonsense where everything is hidden 
in a proprietary system that costs  
300 dollars per month to run. Nope. All 
free for all of us. Glorious. An epic paper.
Now, as our chef does, I took a bit longer 
to cook this video. But I promise that I did  
not put together my computer from scratch 
before starting. So I took some more time  
to make sure you get a better video. If you feel 
this is the right way of doing that, subscribe,  
hit the bell and leave a really kind comment. 
And you can also check out Lambda with our link  
in the description because it is an excellent 
way of running DeepSeek privately, I do it too.