AI Coding Has Entered a Token Subsidy War

The subsidy

On March 5, 2026, Forbes reported that Cursor had internally estimated some high-end Claude Code subscriptions could consume far more in compute than the monthly fee implied.^[1] The exact figures in that story rely on unnamed sources, so they should be treated as reported claims rather than audited fact. But the broader point is hard to miss: AI coding has entered a subsidy war.

That war exists because the product changed. A chatbot that answers a short prompt is one thing. An agent that reads a repository, keeps long context in memory, calls tools, runs tests, revises its own code, and stays active for hours is another. OpenAI bills output tokens separately, notes that reasoning tokens count as output, and charges separately for built-in tools. Anthropic similarly prices long-context usage and tools separately, and its Claude Code documentation says cost rises with context size, tool overhead, and the number of active subagents.^[5]^[8]^[11]

The result is a market in which the headline subscription price often tells you less than the usage pattern underneath it.

The mix

The most important structural question in end-user AI is no longer what a subscription should cost. It is how much usage a company can bundle before metering has to reappear.

OpenAI includes Codex in paid ChatGPT plans, but it keeps API billing separate and lets Business customers add credits as needed.^[2]^[3]^[4] Anthropic includes Claude Code in Pro and Max, while Team and Enterprise plans can switch to extra usage billed at standard API rates.^[9]^[10] Cursor has made the underlying economics even more explicit: its individual plans include a fixed amount of API-priced usage, and its Teams plan moved from fixed per-request pricing to usage priced off the model providers' API rates plus a platform markup.^[14]^[15]^[16]

This is not three companies trying unrelated experiments. It is three companies converging on the same answer from different directions. Subscriptions are the adoption layer: they reduce friction, flatten buying decisions, and help users build habits. APIs, credits, and overages are the economic truth layer: they expose the marginal cost of heavy usage. The durable pricing model is looking less like "unlimited AI" and more like software seats with embedded cloud spend.

The efficiency question

Will models get dramatically more efficient? Yes. They already have.

Stanford HAI reported that the inference cost for GPT-3.5-level performance fell more than 280-fold between November 2022 and October 2024.^[17] Epoch AI found that inference prices at fixed performance levels have been falling between 9x and 900x per year depending on the task, with a median around 50x annually.^[18] Alphabet said it reduced Gemini serving unit costs by 78 percent over 2025. OpenAI said GPT-4.1 was 26 percent less expensive than GPT-4o for median queries and later introduced GPT-5.1 with adaptive reasoning and lower-cost caching for simpler tasks.^[19]^[6]^[7]

That is the good news. The harder news is that efficiency gains do not automatically simplify the business model, because cheaper models invite more ambitious use. Longer context windows invite bigger prompts. Better reasoning enables longer chains of work. Better tool use encourages more automation. Anthropic's Sonnet 4.6 added a 1 million-token context window in beta and highlighted token-efficiency features such as compaction and tool filtering.^[12] Those features matter precisely because usage is moving toward bigger and more autonomous workloads.

So the answer is not that performance will stop mattering and efficiency will take over. Efficiency is becoming part of performance.

Capex and tokenomics

The usual argument here splits into two camps. Either the economics are difficult because token pricing is structurally unfavorable, or they are manageable because the largest players can spend enough capital to make the problem go away.

It is both.

Massive capital spending is what gives the biggest players room to keep cutting prices, bundling usage, and absorbing periods of ugly unit economics. Alphabet said its 2026 capital expenditures are expected to be between $175 billion and $185 billion. Meta guided to $115 billion to $135 billion. Microsoft said cloud gross margin was pressured by continued investments in AI infrastructure and growing AI product usage.^[19]^[20]^[21]

But the spread of credits, rate limits, premium seats, and usage-based overages is a reminder that raw tokenomics still matter at the heavy-user edge. Anthropic's pricing rises for prompts above 200,000 input tokens. Claude Code's own documentation says team usage averages roughly $100 to $200 per developer per month, with large variance. Cursor says daily agent users typically consume more than its base plan includes, while power users often exceed $200 per month.^[8]^[11]^[15]

If frontier-agent usage were already cleanly economical, companies would not be rebuilding their products around hybrid billing. They are doing that because the marginal economics are still uneven and spiky.

OpenAI is not the whole story

It is tempting to tell this as a story about OpenAI falling behind. The source base does not really support that.

A more defensible reading is that Anthropic enjoyed a visible stretch of coding momentum, especially after Claude 3.7 Sonnet and Claude Code in February 2025, followed by Sonnet 4.6 in February 2026.^[13]^[12] OpenAI's response was not to abandon performance for efficiency. It was to pursue both at once: GPT-4.1 in April 2025, Codex upgrades in September 2025, GPT-5.1 for developers in November 2025, and the Codex app in February 2026.^[6]^[4]^[7]^[3]

So if the question is whether OpenAI started gearing toward efficiency, the answer is clearly yes. The more important point is that everyone else did too. The labs are not choosing between being smart and being cheap. They are trying to improve the performance-to-cost ratio fast enough that subscriptions remain attractive while enterprise and API revenue make the math survivable.

Where pricing goes

The likely future of AI end-user pricing is not pure token billing and not truly unlimited subscriptions.

It is hybrid pricing that hides complexity for average users and restores economic discipline for heavy ones: seats plus included usage, credit top-ups, enterprise pools, premium tiers for higher autonomy, and differentiated pricing for context length, reliability, or background execution. The meter does not disappear. It just becomes less visible until the user becomes expensive enough to meter more precisely.

That is why the Subscription/API mix matters so much. Subscriptions buy distribution. APIs discover the real price. Enterprise plans do the political work of making that price easier to tolerate.

The companies that win this market will not just have the best model. They will have the best performance-to-cost-to-distribution loop. Right now, the subsidy war is how that loop gets built.

HOUSH CAPITAL