Minimax M2.5: The $1‑per‑Hour LLM That Could Disrupt the AI Market

 3 min read

YouTube video ID: f1DzkFc9vxo

Source: YouTube video by Sam WitteveenWatch original video

PDF

Introduction

The latest release from Chinese AI leader Minimax, the M2.5 model, claims to run for just $1 per hour at roughly 100 tokens per second. If true, this price point is dramatically lower than the $15‑$20 per hour you’d pay for Claude Opus, GPT‑5, or the new Spark model from Cerebras.

Pricing Comparison

  • Claude Opus / GPT‑5: $15‑$20 per hour (throughput‑dependent).
  • Cerebras Spark: Slightly higher than Opus.
  • Minimax M2.5 (Lightning): $0.30 per million input tokens, $2.40 per million output tokens.
  • Minimax M2.5 (Standard): Same input cost, $1.20 per million output tokens (≈½ the Lightning price).
  • Cost Ratio: M2.5 is 1/10 to 1/20 the price of Opus, Gemini 3 Pro, or GPT‑5 for comparable workloads.

OpenHands Benchmark

OpenHands (formerly OpenDevin), a research project from Carnegie Mellon, evaluated M2.5 as the top open‑source model for coding and office‑task assistance. Their blog highlights: - M2.5 is still a bit behind Claude Opus and GPT‑5.2 in raw quality, but over 90 % cheaper. - The model handles long‑running tasks (e.g., continuous integration, document generation) without breaking the bank. - Pricing calculations show that a nonstop 100‑token‑per‑second run stays well under $1 per hour, making “always‑on agents” financially viable.

Technical Insights: How Minimax Achieved the Low Cost

  1. Rapid Model Iteration – M2 → M2.1 → M2.5 released within 108 days, each version improving speed and cost.
  2. Reinforcement‑Learning (RL) Scaling – Hundreds of thousands of custom RL environments are used as training playgrounds for office‑type tasks (spreadsheets, docs, code). This focused RL training yields large performance gains without massive parameter counts.
  3. Agentic RL Framework – Minimax built an asynchronous scheduling system that lets many agents explore environments in parallel, then merges experiences via a tree‑structured merging strategy. This reportedly gives a 40× training speed‑up compared to naïve generate‑then‑train loops.
  4. Mixture‑of‑Experts (MoE) Architecture – The public claim is a 230 B‑parameter MoE model with only 10 B active parameters at inference time, keeping compute and memory footprints low.
  5. Alternative RL Optimizer – Instead of standard PPO/GRPO, Minimax uses a proprietary “CISPO” algorithm to maintain stability while scaling RL across many tasks.

Use‑Case Opportunities

  • Always‑On Coding Assistants – Continuous code review, CI/CD pipelines, and automated refactoring.
  • Office Automation – Auto‑generation of reports, spreadsheets, email drafts, and knowledge‑base updates.
  • Deep Research Agents – Long‑running web‑scraping, literature summarisation, or data‑analysis bots that can run 24/7 at negligible cost.
  • OpenClaw Integration – The creator hints at testing M2.5 with the OpenClaw autonomous‑agent framework, which could showcase the model’s real‑world performance against proprietary alternatives.

Availability & Access

  • The model is not open‑weights yet, but Minimax has shared the weights with several cloud providers (e.g., Ollama) for free trials.
  • Hosted endpoints are already on OpenRouter and other API marketplaces, with pricing displayed per‑token as above.
  • Minimax is headquartered in Singapore with US data centers, meaning low‑latency access outside China.

Final Thoughts

M2.5 proves that price, not just raw scale, can be a competitive advantage. By leveraging massive RL‑driven fine‑tuning and an efficient MoE design, Minimax delivers a model that is cheap enough for perpetual deployment while still offering respectable quality for coding and office tasks. Builders should start experimenting now, especially for workloads where latency is less critical than cost.

Minimax’s M2.5 shows that a well‑engineered, RL‑fine‑tuned LLM can deliver usable performance at a fraction of the cost of leading proprietary models, opening the door to always‑on AI agents for developers and enterprises.

Frequently Asked Questions

Who is Sam Witteveen on YouTube?

Sam Witteveen is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF