A Free 1-Trillion-Param Model That Codes Like the Best

Moonshot AI's open Kimi K2.7 Code is a 1T-parameter model that runs on your own server, jumps to 62 on its coding test, and thinks ~30% less while doing it.

4 min readEAEvgenii ArsentevEvgenii Arsentev · PhD

Moonshot AI has released Kimi K2.7 Code, an open-weights model aimed squarely at writing software and driving coding agents. On paper it's enormous — a Mixture-of-Experts design with one trillion total parameters — but it activates only 32 billion of them for any given request. That 'big brain, small bill' approach is why a model this large can still run at practical speed: it only powers up the part of itself it needs each time, rather than firing all trillion parameters at once.

The gains over the previous Kimi are concrete. On Moonshot's own Kimi Code Bench v2, the new model scores 62.0, up from 50.9. Its agent scores climbed too — 76.0 on one tool-use benchmark (from 69.4) and 81.1 on another (from 72.8). It can hold roughly 256,000 tokens of context at once — enough to keep a large chunk of a codebase in view — and, notably, it cuts its 'thinking' token usage by about 30% versus the prior version, so it reaches answers faster while improving real-world task completion.

Open weights are the real story

What sets this apart from a service like ChatGPT or Claude isn't just the benchmark numbers — it's that the weights are open, under a Modified MIT License. 'Open weights' means the actual model file is downloadable: you can run it on your own server, keep your source code in-house instead of sending it to another company, and avoid paying per request. It runs on the common open inference engines (vLLM, SGLang, KTransformers), and it ships with a small vision component, so it can read images and screenshots, not just text.

For builders, the practical hook is that it plugs into the agent tooling many people already use — including MCP, the shared protocol that lets an AI reach outside tools like a browser, your files, or a database. The strong jumps on MCP-style benchmarks suggest it's genuinely better at the back-and-forth, multi-step work that agents do, not just at spitting out a single function.

Why it matters for you

Until recently, getting top-tier coding help meant renting it from a handful of closed providers and paying for every call. A free, downloadable model that scores in this range — and runs your work on hardware you control — chips away at that. If you build things by talking to a coding agent, you now have a serious alternative that doesn't lock you in or meter you by the request.

My take: the 30% drop in thinking tokens is the underrated line. Reasoning models often win benchmarks by burning huge amounts of compute to 'think,' which makes them slow and pricey in daily use. Getting better results while thinking less is exactly the trade-off that makes a model pleasant to actually live with.

What this means in practice

'Open weights' is the phrase to remember: it means you can download the model and run it on your own machine, keeping your code private and skipping per-request fees. Kimi K2.7 Code is a near-top-tier coding model you can self-host — worth a look if you're tired of metered, closed assistants.

#open weights#coding models#Moonshot AI#Kimi#AI agents

Related guides

EAEvgenii Arsentev

Author

Evgenii Arsentev

PhD · Chief Product Officer at a tech company

Want to actually build this?

Guides explain. The free course transforms — personalized, gamified, and built to get you shipping fast.

◉ Start the free course

Source: huggingface.co