Minimax M2.5: The $1‑per‑Hour LLM That Could Disrupt the AI Market

Name: Minimax M2.5 - What Makes This Different!
Uploaded: 2026-02-14T14:19:11.552942+00:00
Channel: Sam Witteveen
Description: Minimax M2.5: The $1‑per‑Hour LLM That Could Disrupt the AI Market Introduction The latest release from Chinese AI leader Minimax, the M2.

Sam Witteveen

Feb 14, 2026

•

3 min read

YouTube video ID: f1DzkFc9vxo

Source: YouTube video by Sam Witteveen — Watch original video

PDF

Introduction

The latest release from Chinese AI leader Minimax, the M2.5 model, claims to run for just $1 per hour at roughly 100 tokens per second. If true, this price point is dramatically lower than the $15‑$20 per hour you’d pay for Claude Opus, GPT‑5, or the new Spark model from Cerebras.

Pricing Comparison

Claude Opus / GPT‑5: $15‑$20 per hour (throughput‑dependent).
Cerebras Spark: Slightly higher than Opus.
Minimax M2.5 (Lightning): $0.30 per million input tokens, $2.40 per million output tokens.
Minimax M2.5 (Standard): Same input cost, $1.20 per million output tokens (≈½ the Lightning price).
Cost Ratio: M2.5 is 1/10 to 1/20 the price of Opus, Gemini 3 Pro, or GPT‑5 for comparable workloads.

OpenHands Benchmark

OpenHands (formerly OpenDevin), a research project from Carnegie Mellon, evaluated M2.5 as the top open‑source model for coding and office‑task assistance. Their blog highlights: - M2.5 is still a bit behind Claude Opus and GPT‑5.2 in raw quality, but over 90 % cheaper. - The model handles long‑running tasks (e.g., continuous integration, document generation) without breaking the bank. - Pricing calculations show that a nonstop 100‑token‑per‑second run stays well under $1 per hour, making “always‑on agents” financially viable.

Technical Insights: How Minimax Achieved the Low Cost

Rapid Model Iteration – M2 → M2.1 → M2.5 released within 108 days, each version improving speed and cost.
Reinforcement‑Learning (RL) Scaling – Hundreds of thousands of custom RL environments are used as training playgrounds for office‑type tasks (spreadsheets, docs, code). This focused RL training yields large performance gains without massive parameter counts.
Agentic RL Framework – Minimax built an asynchronous scheduling system that lets many agents explore environments in parallel, then merges experiences via a tree‑structured merging strategy. This reportedly gives a 40× training speed‑up compared to naïve generate‑then‑train loops.
Mixture‑of‑Experts (MoE) Architecture – The public claim is a 230 B‑parameter MoE model with only 10 B active parameters at inference time, keeping compute and memory footprints low.
Alternative RL Optimizer – Instead of standard PPO/GRPO, Minimax uses a proprietary “CISPO” algorithm to maintain stability while scaling RL across many tasks.

Use‑Case Opportunities

Always‑On Coding Assistants – Continuous code review, CI/CD pipelines, and automated refactoring.
Office Automation – Auto‑generation of reports, spreadsheets, email drafts, and knowledge‑base updates.
Deep Research Agents – Long‑running web‑scraping, literature summarisation, or data‑analysis bots that can run 24/7 at negligible cost.
OpenClaw Integration – The creator hints at testing M2.5 with the OpenClaw autonomous‑agent framework, which could showcase the model’s real‑world performance against proprietary alternatives.

Availability & Access

The model is not open‑weights yet, but Minimax has shared the weights with several cloud providers (e.g., Ollama) for free trials.
Hosted endpoints are already on OpenRouter and other API marketplaces, with pricing displayed per‑token as above.
Minimax is headquartered in Singapore with US data centers, meaning low‑latency access outside China.

Final Thoughts

M2.5 proves that price, not just raw scale, can be a competitive advantage. By leveraging massive RL‑driven fine‑tuning and an efficient MoE design, Minimax delivers a model that is cheap enough for perpetual deployment while still offering respectable quality for coding and office tasks. Builders should start experimenting now, especially for workloads where latency is less critical than cost.

Minimax’s M2.5 shows that a well‑engineered, RL‑fine‑tuned LLM can deliver usable performance at a fraction of the cost of leading proprietary models, opening the door to always‑on AI agents for developers and enterprises.

Frequently Asked Questions

Who is Sam Witteveen on YouTube?

Sam Witteveen is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Artificial Intelligence: A Guide For Builders Paperback Recommended

Provides practical AI project guidance and helps developers understand how to integrate low‑cost LLMs like Minimax M2.5 into real‑world applications

Amazon →

Deep Learning With Python Book

Covers the fundamentals of neural networks and reinforcement learning, giving readers the background needed to appreciate the RL techniques behind M2.5

Amazon →

Nvidia Rtx 4090 Graphics Card

High‑end GPU that can run large language models locally, enabling experimentation with cheap LLMs without relying on cloud APIs

Amazon →

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

Summarize another video

Full Transcript YouTube

So how much would you pay to run
a Frontier AI model continuously
working for you for an entire hour?
Well, with Claude Opus, you're looking
at somewhere between $15 to $20 per
hour, depending on the throughput.
With the GPT-5 models
you're probably looking at
something similar to that.
and probably even more for the new
Spark model that they did with Cerebras.
but Minimax has just launched the
M2.5 model, and they're claiming
that this costs a dollar an hour.
So that's based on basically around
about a hundred tokens per second.
and normally I'd be really
suspicious of claims like that.
except that in this case, they've
actually got some really good third
party evidence that I'm going to go into,
which I think shows just how good this
model is for real world software
engineering and office tasks.
so in this video I'm gonna break down
exactly what makes this so interesting.
what other people, particularly
a very specific organization
is saying about this model.
And then we'll also have a look at how
this model could actually be sort of a
killer model for something like OpenClaw.
So let's jump in.
So Minimax is one of the
top AI companies in China.
And they've basically been
plugging away at building better
models, in all sort of areas.
so we've obviously had the LLMs but
we've also had some vision models,
and they've also been doing some
things around audio models as well.
now, currently as I'm recording
this model, is not open weights yet.
we don't know exactly what
the license will be if they
release an open weights version.
Interestingly though, they have
shared the weights with a lot of the
companies to actually serve them.
so you're not just limited to using
Minimax in China or using servers
in China or anything like that.
Most of the companies that are actually
serving open weights are serving this,
and if we look at it, we've even got
companies like Ollama that's partnered
with Minimax, to serve their model
with their cloud option for free
for the next couple of days or so.
so it will be interesting to see if the
model actually makes it fully public.
but you gotta think that Minimax
is actually on a roll here.
This is their third iteration of one
of the M2 models, in the last 108 days.
So that's just over
three and a half months.
And they've released the M2 model, the
M 2.1 model, and now the M2.5 model.
And interestingly here, the M2.5
model, they've actually released
two versions of it on their API
with the exact same capability.
and the only things being different
is the speed and the price.
So one of them is operating at
around 50 tokens per second.
The other one is operating at
a hundreds tokens per second.
Now, I don't wanna harp on their
own benchmarks too much in here.
but you can see that the model is
definitely competitive, with both
the Opus 4.5 model and perhaps
often even with the Opus 4.6 model.
certainly competitive, if not
beating Gemini 3 Pro models.
and again, holding up
well against GPT 5.2.
they haven't benchmarked
GPT 5.3 in here, so.
we may find that is
actually a bit better there.
the thing for me though that caught my
attention and made me make this video was
not their material and their benchmarks.
It was this blog post by OpenHands.
So I've talked about OpenHands before
on the channel, and if you haven't heard
me talk about it, this originally was
called OpenDevin, and the whole idea was
that this was a project that came out
of Carnegie Mellon where they tried to
make an open source version of the Devin
coding assistant slash agent system there.
and the work that has come out
of this has been pretty amazing.
So not only have they built a coding
agent, they've also done a lot of research
into what sort of works in harnesses.
I've personally seen them give some
really good talks, including, a talk
at ICLR last year all around the
lessons that they've learnt with coding
harnesses, coding scaffolds, et cetera.
And they've been very bullish from
the start on having an open model that
could compete with the top proprietary
models for the coding use case.
so it looks to me that they
basically had early access to this.
And they've benchmarked
this as the top open model.
So they're calling this,
an unlocked model, in here.
So maybe they know more than we
do of actually what's gonna happen
with the weights to this, et cetera.
But the thing that really blew me
away here is look at this price.
So the average cost here, that they're
basically comparing, against, Opus.
and we should point out that
they're showing that this is still
behind Opus, behind, GPT 5.2 Codex.
but look at that price.
It's well over, 90% cheaper And
it's able to work with these
really, long run times as well.
So their blog post goes quite in depth
about how this actually compares to the
Claude models and how this is just so
much cheaper for so many different tasks.
So that brings us to this raw pricing.
If we look at the 2.5 lightning model.
This is the a hundred
tokens per second model.
We are looking at 30 cents
per million tokens in and
$2.40 per million tokens out.
If we are looking at the slower version
of that model, which is probably gonna
be fine for 95% of everything you do with
something like OpenClaw,
That's roughly half the amount for output.
so it seems for the input, you're
perhaps the same, but for your
output tokens, you are looking
at $1.20 per million tokens out.
So like OpenHands mentioned, this
is often, between a 1/10th to 1/20th
the cost of Opus, Gemini 3 pro, GPT
5, and this is where the $1 an hour
figure actually comes in, that if
you are running this continuously
at a hundred tokens per second.
this is probably not gonna break the
bank anywhere near what Claude Opus
and OpenAI's, GPT 5.3 and especially
the new GPT 5.3 Spark are going to.
now of course there's an argument that
you can make for very fast tokens.
I totally agree with that.
and there are certain use cases where you
want the tokens as fast as possible, even
if that's gonna cost you a lot more money.
And it is also interesting looking at
this that the Chinese model companies do
have a lot slower throughput for their
inference when we compare to Google, when
we compare to OpenAI and perhaps a little
less so when we compare to Anthropic.
And a large amount of that you gotta
think is probably because they can't
access the same GPUs that companies
in other countries are able to do.
But for me, this whole idea of really
cheap tokens opens up a whole bunch
of new use cases of where you can
have things like always on agents that
are just constantly doing stuff where
it's not gonna cost you, a fortune.
And that could be for coding things.
That can be for things like continuous
integration, continuous deployment,
but it can also be things like
deep search and deep research.
So I think as builders, this
challenges us to think about, okay,
where can we just have these always
on agents that are constantly
just doing different tasks for us?
Now if we jump in to see how they
actually did this model, we don't
have a full sort of technical paper.
that actually may come with this later on.
But they do provide some interesting
tidbits in their blog posts
that we can actually look at.
So the first one is this improvement rate.
So this is kind of amazing that
they're showing basically how their
models have improved since sort of
June last year, right through to now.
And that how that the improvement
rate that they're going at is actually
much faster, than a lot of the
other proprietary model companies
that are out there like Anthropic,
Google, OpenAI, et cetera.
so this should obviously raise a
big question that how is it that
they're improving it such a rate?
How are they getting these models
to be better and better and better?
And it doesn't look like it's just,
constantly training on more data or
bigger models or things like that.
these models from Minimax are actually
not very big models in the grand scheme
of things when we compare them to some of
the other proprietary models out there.
These are not the huge
multi-trillion parameter models.
so this brings us to what's the
inside secret at all the foundation
model companies at the moment.
And that is the way that everyone's
getting their models to be a
lot better is through scaling up
reinforcement learning training.
And if we look at this part here, they
actually make some revealing statements.
So first of all, they talk about that this
is scaling RL, but then they talk about
that most of the tasks and workspaces
that we perform in our company have been
made into training environments for RL.
So that would explain why this model is
getting a lot better at things like office
work and things like generating docs,
spreadsheets, all those sorts of things.
and they mention here that to
date, there are already hundreds
of thousands of such environments.
It would be awesome if we actually knew
what the environments were and what sort
of the variations of the environments
were, But that definitely seems to be the
secret source of a lot of the frontier
labs is that they're actually paying,
lots of startups to make environments for
various different RL tasks, et cetera.
Now, they go on to then mention that
at the same time they're doing lots
of work in their Agentic RL framework,
their algorithms, reward signals,
and the infrastructure around this.
And that brings us to forge
their agent native RL framework.
And this sort of brings about a
whole bunch of problems that you
see when you're trying to scale
up this kind of RL training.
Is that fundamentally it can
just be really, really slow.
So if you've got some kind of agent
that, goes through an environment,
perhaps, let's say it's playing a game
or something like that, or doing a
particular task and perhaps it gets it
right, If you are then doing sort of
on policy RL, you straightaway want to
sort of train that into the model so
that the model can update learning on
this reward that it's done and gotten
right, but you are actually making your
environment sit around, and do nothing.
so you've got this challenge
of on policy versus off policy.
And you can kind of think about this
as if you were learning to cook, right?
If you were out there learning to
cook, if you were on policy, you
would make a dish, you would then
taste it, you would then get, a
series of judges to taste your dish.
You would learn from the feedback and
then you would make your next dish.
Now that probably is gonna get
you really good feedback, but.
It's gonna be slow, in that case.
If you are off policy, maybe you make
50 versions of the dish, you then
give them to judges, you get feedback.
but now you've kind of forgotten
which parts you were doing in
each of the 50 versions there.
And probably the mistakes that you
were making at the start when you were
making those 50 dishes are not the
ones that you were making at the end.
So you've got this whole sort of
game of on policy versus off policy.
And this is what they talk
about that they've done.
Now, they don't go into lots of
details about how they've done it, but
the key idea here is they've got it.
So they've got these asynchronous
scheduling strategies, meaning
that the agents can go through
different environments, they can
generate, they can, train things
.
The agents can go through these different
environments and they're not limited
to just generating one trajectory
out and then having to train on it.
They can actually collect some of
those experiences and then sort of
assemble them together to do some
training and update the weights there.
now obviously they don't want
those experiences to become stale,
so they're constantly having to
balance the sort of gap of how
off policy is the system overall.
And you've gotta imagine that they're not
just training sort of one task at a time.
They may be doing actually multiple tasks,
perhaps in sort of a similar kind of area,
but they've got multiple tasks going on.
and this is where the tree
structured merging strategy comes in.
This allows it to basically combine
some of these things, put them
together, and then do the weight
updates and the training of the model.
now the big takeaway here is that this
apparently gets them a 40x training
speed up from just doing the simple,
sort of generate one, train one kind
of strategy, that would massively limit
the number of examples that they would
actually be able to train on in here.
Now we don't know how companies like
Google, OpenAI, Anthropic, et cetera,
actually solve this kind of problem.
This is a real sort of,
secret source kind of thing.
If you think back to some of the.
early strawberry models that came
out of OpenAI, the O1, et cetera.
There was a whole paper there called
Let's Verify step by step, this idea of
breaking these things down into steps,
as they're doing some kind of RL rollout.
And even in those papers, they basically
talked about that they just had some
way of doing verification and doing
the rewards to be able to work out what
parts to credit when you actually do
your weight updates and do training.
They never actually told us, how that's
actually been done, So in the past
we've seen companies like DeepSeek
really try and do this with our GRPO.
Here, Minimax is using sort
of like an alternate, to GRPO
and PPO, which is this CISPO.
And while they don't give a lot of
examples for this, it seems that this
is sort of key to creating the stability
in the mixture of experts model as they
scale up this sort of large RL training.
So obviously if they're training on
so many of these, RL environments
that they talk about hundreds of
thousands of different environments
that they use internally, this is
really where we get these sort of
super performance in office skills that
they've focused on here And we've seen
the same thing from Anthropic, right?
As Anthropic has rolled out Cowork,
one of the reasons they've been able
to do that is because they've got
lots of these RL environments for
tasks that would be done in cowork.
just like most of the models out there
have really doubled down on this kind
of idea for code and this idea of
experiments for code to get your model
to be much better at generalizing
to different kinds of code tasks
and treating the sort of, building
of code like it's a game So
overall, just to finish up, this
is definitely an interesting model.
Like I said earlier on, for me,
some of the really interesting
parts of this is just how well
OpenHands, is recommending this model.
And for looking at a model that
OpenHands is claiming is just a
230 billion parameter, MOE, with
only 10 billion parameters active.
This is really quite amazing that it can
get such good results from a model which
is relatively very small compared to
many of the proprietary models out there.
So if we're coming in here and looking
plans that Minimax is actually selling.
At first, when I came in here, I
kinda looked at this and thought, gee,
okay, that's not that cheap, right?
Like, I already pay $200 a month for
Claude Code until I realized that this
is the cost for the entire year here.
Not for, a monthly.
If we actually click on the monthly,
we can see that the costs are a lot
cheaper, compared to the models like
Opus, codex, Gemini Pro, et cetera.
Now the models are already
up on OpenRouter, So you
can actually try it out.
so it looks like there are already
multiple, providers, out here.
We can see the difference here
between the sort of 50 token
per second one at a $1.20.
the Lightning one at sort of, $2.40 here.
and my guess is that over the next few
days we'll see a number of different
providers already putting these things up.
Just remember to always sort
of check out the details here.
Actually, it turns out that, this Minimax
is a, provider that's headquartered in
Singapore with data centers in the US.
but this gives you a great
way that you can actually
sort of play around with this.
Now, I very deliberately haven't made any
videos, about OpenClaw 'cause there were
just so many security issues around it.
I didn't want to contribute to
people getting in trouble, et cetera.
but if people are interested, I'm
certainly gonna be testing this
model out with OpenClaw and sort
of seeing like, okay, how does this
actually compare to some of the
proprietary models that are out there?
And let me know in the comments
if you're interested in that.
so overall this is a very
interesting release from Minimax.
It's coming on the back of a number
of releases from them Also, lots of
other releases going on this week
and perhaps even next week from
some of the open weights companies
in China who often like to release
their models for Chinese New Year.
The big one being, DeepSeek.
So it'd be very interesting to see if
we see that in the next couple of days.
Anyway, as always, let me know in the
comments what you think about this.
If you try it out, I don't
think this is gonna match Opus
4.6 in quality for everything.
But when you are looking at a price that's
under a 10th of the price for the previous
one, it's certainly got a place for many
builders to actually start using this.
Alright, As always, if you found the video
useful, please click like and subscribe,
and I will talk to you in the next video.
Bye for now.

Help & FAQ

Be Smart

Tim Ferriss

Apr 04, 2026

Watch Read Summary

PDF

Introduction

Pricing Comparison

OpenHands Benchmark

Technical Insights: How Minimax Achieved the Low Cost

Use‑Case Opportunities

Availability & Access

Final Thoughts

Frequently Asked Questions

Who is Sam Witteveen on YouTube?

Does this page include the full transcript of the video?

Helpful resources related to this video

Share This Summary

Embed This Summary