Coinbase Cut AI Costs 50% Without Cutting Usage
Coinbase halved its AI bill while token usage kept climbing: Chinese models plus 60% cache efficiency. A new benchmark for enterprise AI spending optimization.
Evgenii Arsentev · PhDCoinbase CEO Brian Armstrong quietly flipped a large portion of his company's AI workloads to cheaper Chinese models — GLM 5.2 from Zhipu AI and Kimi 2.7 from Moonshot AI — and cut the company's AI bill in half. Token usage kept climbing the whole time. That's the math getting attention across the industry right now: less spend, more done.
Two changes drove the savings. First, an automatic routing system now picks the best model for each request based on what the task needs, what it costs, and whether the answer can be reused. Second — and this is where most of the savings came from — the cache hit rate jumped from 5% to 60%. Caching means that when your application asks the AI a similar question more than once, it reuses the saved answer instead of paying to generate a new one. Going from 5% to 60% means six out of every ten AI requests now cost almost nothing.
The competitive backdrop
GLM 5.2 is a model from Chinese AI lab Zhipu AI — the same model that made news this week for matching Anthropic's Mythos on certain cybersecurity benchmarks. Kimi 2.7 comes from Moonshot AI. Both are priced significantly below OpenAI's and Anthropic's top-tier offerings for typical enterprise workloads, which is why they attract the bulk of the routing.
Armstrong added accountability pressure into the system: his stated policy is that "the more you spend on AI, the more impact we expect." Ninety-one percent of Coinbase developers now stay within their previous usage limits — the optimization improved efficiency per task rather than simply unlocking more spending. The company isn't alone in this shift. Startup Lindy switched to DeepSeek V4. Snowflake is actively testing Chinese models as direct alternatives to its existing OpenAI and Anthropic contracts.
The timing matters. OpenAI launched GPT-5.6 Sol at the same price point as its predecessor but with better efficiency per token — a direct signal the company is feeling competitive pressure from cheaper alternatives. For Western AI labs, the question is whether optimizations in caching and smart routing make the price gap moot, or whether Coinbase-scale defections represent a lasting structural shift in where enterprise AI spending lands.
Cache hit rate is probably the biggest cost lever most teams haven't pulled yet. The logic is simple: if the same type of question comes up more than once in your product, save the first answer and serve it on repeats. Most API frameworks support this with minimal setup — the gap is usually just not enabling it. Start there before switching providers or renegotiating contracts. Going from 5% to 60% cache efficiency is the difference between a controlled spend and a runaway bill.
Related guides

Author
Evgenii Arsentev
PhD · Chief Product Officer at a tech company
Want to actually build this?
Guides explain. The free course transforms — personalized, gamified, and built to get you shipping fast.
◉ Start the free courseSource: the-decoder.com