GLM-5.2: a 1M-Token Coding Model, MIT Weights Soon

Z.ai released GLM-5.2 with a usable 1M-token context and two effort levels. It drops into Claude Code via an Anthropic-compatible API; MIT weights soon.

5 min readEAEvgenii ArsentevEvgenii Arsentev · PhD

Z.ai released GLM-5.2 on June 13, a coding-focused model with a one-million-token context window and a choice of two reasoning levels, "High" and "Max." It plugs into eight agentic coding tools on day one — Claude Code, Cline, OpenCode and OpenClaw among them — through an Anthropic-compatible endpoint, so an existing setup can point at it with almost no rework. The weights ship under an MIT license, but not yet: Z.ai says they arrive next week.

The headline number is the context window. A million tokens of input is roughly five times what GLM-5.1 handled, and "usable" is the operative word — vendors have claimed huge windows before that fell apart past a few hundred thousand tokens. For coding, a window this size means an agent can hold an entire mid-sized repository, its tests and its docs in view at once instead of guessing from a handful of files. Output caps at about 131,000 tokens per response, enough to rewrite large chunks in a single pass.

What "High" and "Max" actually change

The two effort levels are a knob for how hard the model thinks before answering. "High" is the everyday setting; "Max" spends more compute on multi-step problems — the gnarly refactor, the bug that spans five files — at the cost of speed and tokens. That mirrors a wider shift this year: Microsoft's Satya Nadella has been warning against "token-maxing," reaching for the most expensive setting on every trivial task. The skill is matching effort to the job, not always cranking it to maximum.

Under the hood it's a 744-billion-parameter Mixture-of-Experts model that activates 40 billion parameters per token — big in total, lean per query. The honest caveat: Z.ai shipped no benchmarks at launch. No SWE-bench, no Terminal-Bench, no Code Arena numbers. That's a real gap; until the weights and scores are out, performance claims rest on the spec sheet and early user reports, not independent tests.

Why this matters beyond developers: the Anthropic-compatible endpoint is the quiet story. More providers are copying that API shape, which means the tool you use and the model behind it are decoupling. If one model gets pulled offline — as Anthropic's Fable did last week — or a price suddenly changes, you can repoint at an alternative without relearning your workflow. Open weights under MIT take that further: you could eventually run the thing yourself.

What I'd actually do

If you already use Claude Code or Cline, GLM-5.2 is a low-risk thing to try: set the Anthropic-compatible base URL to Z.ai's endpoint and run it on a real task you've already solved, so you have something to compare against. Start on "High," switch to "Max" only when a problem genuinely needs it, and wait for the MIT weights and real benchmarks before trusting it with anything critical.

#models#open-models#coding-tools
EAEvgenii Arsentev

Author

Evgenii Arsentev

PhD · Chief Product Officer at a healthtech company

Want to actually build this?

Guides explain. The free course transforms — personalized, gamified, and built to get you shipping fast.

◉ Start the free course

Source: marktechpost.com