The Token Tax

Every time an AI model gets called, GPU cycles get burned. GPUs cost 6-8x more per operation than traditional CPU compute. That is structural, and right now it is mostly hidden to everyday (enterprise) users.

OpenAI projects a cash burn of $25 billion in 2026 and $57 billion in 2027. They burn $2 for every $1 they earn on inference. Anthropic runs a similar ratio, burning roughly $3 billion a year against $5 billion in annualized revenue (Though being private and with the most recent Claude Code success the top line numbers might be different). Both companies are selling tokens below cost. The difference is covered by venture capital. OpenAI alone needs $665 billion in cumulative capital through 2030, with break-even not expected before then.

This is not a business model. It is a price war funded by other people’s money.

The question rarely asked: What happens when the subsidy ends?

Token prices have dropped aggressively. Anthropic cut Opus pricing by 67% in a single release. OpenAI launched budget tiers at $0.05 per million input tokens. The industry is racing to the bottom on price while racing to the top on cost. This math does not work forever.

The compounding problem is agentic workflows. A single user request that triggers an AI agent does not make one API call. It makes ten or twenty. The agent reasons, checks, iterates, calls tools, and reasons again. Each loop burns tokens. Enterprise AI budgets now spend 85% on inference alone, up from 55% two years ago. Even if the per-token price drops, the per-task cost keeps climbing because the number of tokens per task is exploding.

Here is where it gets uncomfortable. 84% of companies already report measurable gross margin erosion from AI infrastructure costs. 26% report erosion above 16%. And 90% of CIOs say cost management is limiting the value they can extract from AI. This is happening at subsidized prices. At real prices, the numbers get worse.

Think about which use cases survive a 3-5x price increase. Enterprise decision support where a single AI-assisted analysis saves a million-dollar mistake? That survives. A customer service chatbot handling routine queries at $0.002 per conversation? That probably survives. But the long tail of AI features bolted onto SaaS products, the auto-summarizers, the AI-generated email drafts, the routine code completion, these are built on cheap tokens. When tokens stop being cheap, these features either get cut or their costs get passed to users who may not value them enough to pay.

The counterargument is efficiency. NVIDIA’s Blackwell GPUs deliver 50x better token output per watt than the previous generation. Google’s TPU v7 claims a 4x improvement with 67% better energy efficiency. Custom silicon can reduce inference costs by 40-60% compared to general-purpose GPUs. These gains are real. The question is whether they arrive fast enough to offset the demand growth from agentic workloads that multiply token consumption per task by 10-20x.

Three things to watch. First, track your cost per business outcome, not your cost per token. A cheaper token that gets used twenty times per task is not cheaper. Second, identify which of your AI use cases are viable at 3x current token prices. If the answer is “none of them,” you have a subsidy dependency, not a strategy. Third, watch the funding rounds. When the next OpenAI or Anthropic raise comes with down-round terms or profitability conditions, the price war ends and the repricing begins.

The venture capital subsidy on AI compute is the largest indirect price support in the history of enterprise software. It will not last. The businesses that planned for real costs will be fine. The ones that built on subsidized tokens will learn the same lesson every business learns when someone else stops paying part of the bill.

Sources

  • OpenAI burn rate projections: Medium, “The Burn Rate Crisis” (2026)
  • Anthropic revenue and burn rate: Finout (2026)
  • Token pricing comparison: Finout, “OpenAI vs Anthropic API Pricing” (2026)
  • Agentic inference cost growth: AnalyticsWeek, “Inference Economics” (2026)
  • Margin erosion data: CloudZero, “State of AI Costs” (2025)
  • NVIDIA Blackwell efficiency: NVIDIA blog
  • Google TPU v7: AI Ireland (2026)

About dselz

Husband, father, internet entrepreneur, founder, CEO, Squirro, Memonic, local.ch, Namics, rail aficionado, author, tbd...
This entry was posted in Artifical Intelligence, Business, Think Different. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *