Last week, I watched a friend refresh the usage counter in Claude Code three times in a single hour. He was on the Max 20x plan, paying $200 a month, and his five-hour session quota had hit one hundred percent after about seventy minutes of normal work. His weekly bucket showed seventeen percent used. The math did not make sense, and that was the point.
The cheap coding plan, as we have known it for the last eighteen months, is ending. Not in one announcement. In changelogs. In silently lowered limits. In pricing pages updated overnight. In support tickets that go unanswered.
This post is about what is actually happening, why it was always going to happen, and what to do about it before the next round of changes lands in your inbox.
The Signals Are Everywhere
You do not have to read between the lines anymore. The evidence is dated, public, and stacking up week by week.
On Anthropic’s side, the GitHub issue tracker for Claude Code has become a slow-motion rebellion. Issue #28848, opened in late February, documents Max plan users burning through their weekly cap roughly twice as fast since the Claude 4.6 release, with no announcement and no published numerical limits. Issue #31423 from early March describes a Max 20x subscriber hitting one hundred percent of his five-hour session limit in about an hour while his weekly meter sat at seventeen percent. Issues #40698 and #42939 from late March and early April say the same thing in different words. Anthropic has acknowledged in passing that “people are hitting limits faster than expected.” It has not published the limits.
On the Chinese side, Z.ai’s GLM Coding Plan tiers reportedly jumped overnight in early April from $10 / $30 / $80 to $18 / $72 / $160. Around the same time, Z.ai launched GLM-5.1 with API pricing at least eight percent higher than GLM-5 Turbo, at $1.26 input and $3.96 output per million tokens. The model itself is MIT-licensed, which is generous, but the hosted price is the direction of travel.
On the IDE side, Cursor and Windsurf have been telling on themselves for almost a year. Cursor’s June 2025 pricing change moved heavy users onto credit pools and metered “Auto” routing. Windsurf shifted to daily and weekly quotas. Cursor launched Glass in early April at a reported fifty-billion-dollar valuation, and the WIRED interviews around the launch were brutal: developers told the reporter they had moved to Claude Code and Codex, not because the product was worse, but because, as one founder put it, “whichever tool has the most generous rate limit” wins. Chamath Palihapitiya’s 8090 estimated in March that AI coding bills at startups have tripled since November 2025.
Every one of these stories points the same direction.
Why This Was Always Going to Happen
The cheap coding plan was never a product. It was a customer-acquisition coupon, paid for out of someone else’s pocket.
Look at the unit economics honestly. A serious developer using Cursor Pro at $20 a month was, on the underlying provider, consuming somewhere between $80 and $150 of inference. A Claude Code Max subscriber at $200 a month, working a normal day, regularly burns through what would be more than $1,000 of equivalent API spend. This is not a secret. It is the explicit strategy.
Anthropic and OpenAI can afford this because they raised more than $100 billion combined and treat developer subscriptions as a strategic loss leader, the cheapest form of enterprise sales they will ever run. Every developer hooked on Claude Code is a future five-figure team contract. The subscription is a coupon paid by the venture round to acquire the buyer.
Cursor and Windsurf cannot afford it. They raised an order of magnitude less, and they sell the subscription as the actual product, not as a funnel into something else. So they were the first to move: credit pools, daily quotas, “Auto” routing that quietly downgrades the model when you are not looking.
Anthropic is now running the quieter version of the same play. Same price, smaller bucket, no announcement. It is not a bug. It is the unit economics catching up with the marketing. The only question was when.
The path to sustainability for hosted AI coding is the API. Token-priced, pay-per-use, with a margin you can see on the receipt. Z.ai pricing GLM-5.1 at $1.26 / $3.96 is the shape of things to come. Flat plans will keep narrowing until they price like the API plus a margin, or they will be quietly phased out for “professional” tiers that look suspiciously like prepaid credit.
There are still genuinely generous flat plans out there. MiniMax’s Coding Plan starts at $10 a month for the Starter tier and tops out at $50 for Max, with the Pro and Max tiers claiming roughly twenty times the capacity of Claude Code Max. The MiniMax M2.7 token plan plugs straight into OpenCode and most BYOK tools. OpenCode’s own Go tier is still positioned aggressively. These exist because the providers behind them are buying mindshare for their open-weight models. They are real, and you should use them. You should also assume they will not stay this cheap forever, for exactly the same reasons.
Do Not Get Locked In
If the lesson of the last six months is one thing, it is this: the only durable pricing strategy is the one you can change on a weekend.
Here is the trade-off space, simplified.
| Strategy | Lock-in risk | Effort | Best for |
|---|---|---|---|
| BYOK in a provider-agnostic client | Low | Low | Most developers |
| Multi-provider routing (OpenRouter, LiteLLM) | Low | Medium | Cost-sensitive teams |
| Self-hosted small or medium open-weight models | None | High | Privacy and offload |
| Long annual contract with a single vendor | High | Low | Avoid in 2026 |
The principles below follow from that table.
Use provider-agnostic tools. OpenCode, Cline, Aider, Continue, anything that lets you swap the model without rewriting your workflow. Claude Code is excellent until the day it is not yours to configure. The day that day arrives, you do not want to be re-learning your whole loop.
Test the open-weight frontier. Qwen 3.6 Max, Kimi K2.6, MiniMax M2.7, GLM-5.1, DeepSeek R2. Many of them are MIT- or Apache-licensed. Most are reachable through OpenRouter, Novita, Fireworks, Together, or self-hosted. The performance gap to the closed frontier on real coding tasks is much smaller than the marketing suggests, and for the eighty percent of work that is editing, refactoring, writing tests, and reading docs, it is essentially zero.
Match the model weight to the task. This is the single highest-leverage habit you can build. Heavy reasoning models for planning and architecture. Cheap, fast models for execution: edits, tests, refactors, structured output. A planning step in Opus or GPT-5.4 followed by execution in MiniMax M2.7 or GLM-5.1 will cut your bill by five to ten times with no measurable quality loss on a typical ticket. Configure your tool to route deliberately, not by default.
Self-host the offload tier. A quantized Qwen 3.5 14B at Q8, or a Gemma 3, or an NVIDIA Nemotron at 7B–12B, is enough for commit messages, docstring generation, structured extraction, log triage, lint fixes, and most autocomplete. A single RTX 4090, or a Mac with 64 GB of unified memory, is enough hardware. Once you offload that traffic, your hosted bill drops by a surprising amount, and you have a working setup the day your provider has an outage.
Look beyond US labs. Chinese (Z.ai, MiniMax, Moonshot, Alibaba) and European (Mistral, Aleph Alpha) providers are competing on price and openness in ways the major US labs are no longer obliged to. Pricing pressure travels in one direction: from the most competitive market to the others. Knowing how to call those models is a hedge that costs you nothing.
Use free tiers only for non-sensitive work. Free models are paid for in your data. Most free endpoints log requests, many use them for training, and some have terms of service that are, charitably, ambiguous. Fine for a one-off regex. Not fine for client code or anything under NDA.
Favor BYOK tools. If a tool will not let you bring your own key, it owns the price knob, and at some point it will turn it. Treat BYOK support as a hard requirement, not a nice-to-have.
Write your skills and agents to be provider-agnostic. No hardcoded model IDs in prompts. Externalize them in config. Keep your prompt library, your custom commands, your MCP server choices, and your agent definitions portable. If switching providers requires rewriting your skills, you have built a moat around yourself, not the vendor.
A Pragmatic Reference Setup
Here is one honest stack a single developer can run today, end to end, without betting everything on one provider.
The driver is OpenCode, configured with three providers and a routing rule.
The first provider is Anthropic, BYOK, on the API. Used sparingly and deliberately for the hardest planning sessions, architecture discussions, and the occasional gnarly debug. You pay only for what you use, the per-token price is high, and the budget is yours to set.
The second is MiniMax M2.7, either through the Coding Plan at $10 a month for light use or through the M2.7 token plan via the Anthropic-compatible endpoint, swapped in as the default execution model. This is where the bulk of edits, tests, and refactors land. GLM-5.1 through Z.ai or OpenRouter is an interchangeable second choice; pick whichever has the better day.
The third is a locally hosted Qwen 3.5 14B at Q8 through Ollama, wired into your editor for autocomplete, commit messages, doc summaries, and any work that should not leave the machine. Free at the margin. Always available, even on a plane.
OpenRouter sits behind all of this as a fallback route. If any one provider goes down or repeats the silent-nerf trick, switching to a different model is one config line, not a migration.
The total fixed cost of that setup, on a normal month, lands somewhere between thirty and sixty dollars, and it removes the single biggest operational risk in your daily workflow: the assumption that the plan you signed up for is the plan you will get next month.
The Quiet Part, Out Loud
The era of paying twenty dollars to consume two hundred dollars of inference is ending the way every venture-subsidized era ends. Quietly. In changelogs. In stealth-nerfed dashboards. In pricing pages that get updated overnight. In support tickets that get auto-closed as duplicates.
The developers who will not feel it are the ones whose workflow already does not care which provider is on the other end of the API call. Their skills are portable. Their tools are agnostic. Their offload tier runs on their own hardware. Their next provider is one config line away.
That setup takes a weekend to build. It will pay for itself the first time your favorite plan changes, and it will keep paying every time after that.
Build it now, while you still have the choice.