Why AI Token Spending Soars Despite Cheaper Prices

 15 min video

 6 min read

YouTube video ID: c-kFj1avb5Y

Source: YouTube video by Logically AnsweredWatch original video

PDF

Headlines have recently highlighted significant overspending on AI tokens by major companies, with Uber reportedly exhausting its AI budget in four months and another company spending $500 million in a single month. Despite a 90% drop in token prices since 2023, AI expenditure continues to rise, leading to a reevaluation of AI strategies.

The Paradox of Cheaper Tokens and Soaring Costs

The phenomenon of "tokenmaxxing," where employees were encouraged to maximize token usage, contributed significantly to this overspending. Companies like Uber, Meta, Microsoft, and Amazon initially pushed for widespread AI adoption among their employees.

  • Uber: Engineers were given access to tools like Claude Code and Cursor, leading to 84% of developers becoming "agentic coding users" by March 2026. However, the company's entire AI budget was depleted in just four months, prompting a cap on employee AI spend. Uber's COO and President, Andrew Macdonald, questioned the justification of AI spend without a direct link to useful features and functionality.
  • Meta: Workers reportedly consumed 60 trillion Claude tokens in 30 days.
  • Microsoft: After encouraging diverse AI tool usage, Microsoft is now consolidating efforts back to GitHub Copilot, a move widely seen as cost-cutting.
  • Amazon: The company set targets for over 80% of developers to use AI weekly and even implemented an AI leaderboard, KiroRank, to track token usage. This led to employees artificially inflating their scores through "tokenmaxxing," prompting Amazon's Senior Vice President to advise against using AI "just for the sake of using AI." KiroRank was subsequently taken offline.
  • NVIDIA: An executive noted that the cost of compute for his team far exceeded employee costs.
  • Anonymous Client: One client reportedly spent $500 million on AI in a month due to unlimited employee licenses.
  • Other Companies: Shopify, Spotify, ServiceNow, and Roku have all cited AI as a major pressure point on operating expenses in their earnings calls.

The Hidden Costs and "Tokenpocalypse"

Beyond direct token costs, companies are facing significant problems and hidden losses:

  • Code Churn: This refers to the ratio of deleted to added lines of code. Alex Circei, CEO of Waydev, observed that while AI-generated code acceptance rates appear high (80-90%), the extensive revisions required by engineers reduce the real-world acceptance rate to 10-30%.
  • Productivity vs. Churn: GitClear found that while AI tools increased developer productivity, the associated code churn was 2.2 times greater than the productivity gain. Faros AI reported an 861% increase in code churn based on two years of customer data.
  • Hidden Losses: A study of 2,444 companies revealed that for every dollar spent on AI tokens, $0.44 goes to fixing AI-generated bugs, $0.27 to rewriting AI-produced code, and $0.11 to review and merge delays. This means nearly 80% of AI procurement costs are lost.

This situation has been dubbed the "Tokenpocalypse," where companies, by tying token consumption to productivity, inadvertently drove up costs without proportional benefits. This exemplifies Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."

Goldman Sachs reported that companies are "overrunning their initial budgets for inference by orders of magnitude." Despite token prices falling by 90% since 2023 (or 280x by other metrics), AI spending is projected to increase by 320%, with worldwide IT spending expected to reach $6.31 trillion in 2026.

Understanding Tokens and Inefficiency

To understand why AI is becoming more expensive despite cheaper tokens, it's crucial to define what a "token" is. Large Language Models (LLMs) process language by breaking it down into numerical tokens (e.g., "darkness" into "dark" and "ness," represented numerically). The goal, according to NVIDIA, is to achieve the "fastest processing time and lowest cost per token."

Several factors contribute to the inefficiency:

  • Non-Linear Scaling of Productive Output: A study by Jellyfish found that a tenfold increase in token budget resulted in only about a twofold increase in pull requests (proposed code changes). Tokens behave more like "rocket fuel," requiring exponentially more resources for incremental speed gains.
  • Advanced Reasoning Models and Latency: More advanced models like Claude Opus and GPT-4o, designed for complex problems, spend minutes or even hours "thinking" to solve a problem. These "reasoning tokens" are generated in addition to input and output tokens. While powerful, this leads to significant latency and increased infrastructure costs. Firat Elbey, Principal Product Manager, notes that "every unnecessary reasoning cycle increases latency, compounds infrastructure costs, and consumes energy."
  • Inefficient Model Usage: Using high-level models for simple tasks leads to longer durations and overconsumption of tokens, while complex tasks given to low-level models result in inadequate outcomes. The indiscriminate deployment of powerful tools for tasks that require no reasoning has real consequences.
  • Agentic AI and the Agentic Loop Multiplier: Agentic AI models, which autonomously pursue goals using various tools, operate in a loop. Each iteration involves re-reading all previous context, leading to a massive increase in token consumption. This "Agentic Loop Multiplier" is a significant cost driver. Goldman Sachs forecasts a 24-fold increase in token consumption by 2030 due to agentic AI, reaching 120 quadrillion tokens per month.

The Jevons Paradox and Business Model Shifts

The core reason cheaper tokens led to higher costs is the Jevons paradox: increased efficiency in resource use leads to greater demand for that resource. As token costs decreased, demand and consumption surged.

This paradox is driving a major market transition in the AI industry, mirroring shifts seen in streaming services and delivery apps. Many AI platforms are moving from subscription-based models to token-use pricing:

  • Shift to Token-Based Billing: Since spring 2025, major AI agent companies like Cursor, Vercel's V0, Replit, and Lovable have simultaneously transitioned to token-based billing, reduced or eliminated free tiers, and introduced overage charges.
  • Subscription Allocation: While subscriptions still exist, they now often provide an allocation of tokens, with high-spending users facing additional charges.
  • Beyond Seat-Based Pricing: The traditional software-as-a-service (SaaS) model of seat-based pricing is disappearing, as AI agents automate many tasks, making subscription-based charging "foolish," according to SAP CEO Christian Klein.

The Rise of Cost-Effective Alternatives

Companies are becoming more critical of their AI spending and its return on investment. This scrutiny has led to the emergence of more cost-effective alternatives, particularly from Chinese AI companies.

  • DeepSeek and Kimi: DeepSeek's V4-Pro model costs $3.48 for 1 million tokens of output, and Kimi costs $4 per million tokens. This is significantly cheaper than OpenAI and Anthropic, which charge $30 and $25 respectively for the same amount.
  • Market Impact: While Anthropic and OpenAI still lead in corporate adoption (34.4% and 32.3%), DeepSeek is rapidly gaining ground as a cost-effective alternative, even surpassing US models in downloads.

This shift indicates a move towards a new stage in the AI lifecycle, where efficiency and cost-effectiveness are paramount. The current situation highlights that while AI offers powerful tools, human cognition's adaptive resource allocation—knowing when and how deeply to process information—remains crucial. As Whizy Kim of Techbrew notes, "unlike AI, human workers can actually be held accountable for their mistakes." Even Microsoft is reportedly pivoting back to Copilot, despite its past issues, underscoring the ongoing challenges and evolving strategies in AI adoption.

  Takeaways

  • Companies such as Uber, Meta, Microsoft, and Amazon rapidly depleted AI budgets—Uber exhausted its entire AI spend in four months—despite a 90% drop in token prices, highlighting a disconnect between cheaper tokens and uncontrolled spending.
  • The internal practice called “tokenmaxxing,” which turned token usage into a performance metric, led employees to inflate consumption, creating hidden costs like massive code churn and extensive bug‑fixing.
  • Research shows that for every dollar spent on AI tokens, about $0.82 is lost to fixing bugs, rewriting code, and review delays, meaning only roughly 20% of the spend delivers productive value.
  • Cheaper tokens triggered the Jevons paradox and the “Agentic Loop Multiplier,” where autonomous AI agents repeatedly reprocess context, causing exponential token growth and projecting a 24‑fold increase in consumption by 2030.
  • In response, firms are moving to token‑based billing, cutting free tiers, and adopting cheaper models like DeepSeek and Kimi while reassessing AI strategies to curb wasteful expenditures.

Frequently Asked Questions

What is "tokenmaxxing" and how did it drive AI overspending?

Tokenmaxxing is the practice of encouraging employees to maximize the number of AI tokens they consume, often by turning token usage into a performance metric or leaderboard. By rewarding high token counts, firms like Uber and Amazon saw staff inflate usage, rapidly draining budgets despite lower token prices.

How does the "Agentic Loop Multiplier" affect token consumption?

The Agentic Loop Multiplier describes how autonomous AI agents repeatedly reread all prior context each iteration, multiplying token usage with every loop. This exponential growth means a single task can consume many more tokens than a static prompt, driving up costs especially as agentic AI scales toward projected 120 quadrillion tokens per month.

Who is Logically Answered on YouTube?

Logically Answered is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

Full transcript is not shown on this page

This page focuses on the summary and original notes. For full verification, refer to the original YouTube video.

PDF