Kimi K2.7 vs GLM 5.2 for Coding

June 22, 2026 · 9 min read

Practical guides for developers

Kimi K2.7 vs GLM 5.2 is a comparison worth making carefully. Both are open-weight MoE models released one day apart in June 2026, both target developers and agentic coding workflows, but their design choices diverge significantly. GLM-5.2 (from Z.ai / Zhipu AI) leads with a 1M-token context window and text-only breadth. Kimi K2.7 Code (from Moonshot AI) brings multimodality via MoonViT, a HighSpeed variant for low-latency sessions, and a mandatory reasoning mode with no opt-out. This article covers benchmark scores from independent and vendor sources, API behaviour including Kimi's hard parameter constraints, pricing with a worked cost example, self-hosting feasibility, and a decision framework for choosing between the two.

Quick comparison

Benchmarks

Two benchmark pools exist for these models: independent third-party measurements, and vendor-reported results published at launch. They tell different parts of the story and should be read separately.

Independent scores

Artificial Analysis is the main independent source for both models. GLM-5.2 scores 51 on the AA Intelligence Index, and Kimi K2.7 Code is estimated at 42. Note that the AA comparison page measures Kimi K2.7 against GLM-5, not GLM-5.2, so some rows in the table below reflect the older GLM model.

Two findings here are underreported in most roundups. First, GLM-5 scored a 66% non-hallucination rate vs Kimi's 20% on the AA-Omniscience benchmark, a substantial gap. Second, GLM-5 leads on instruction following (72% vs 63%) despite Kimi winning on scientific reasoning and long-context tasks. Both comparisons are against the older GLM-5, not GLM-5.2.

Vendor-reported results

All Kimi K2.7 numbers at launch were first-party. GLM-5.2 has broader coverage from named benchmarks including SWE-bench Pro, Terminal-Bench, and FrontierSWE.

GLM-5.2 (company-reported):

Kimi K2.7 Code (company-reported):

No independent SWE-bench Pro result for Kimi K2.7 was available at time of writing.

Real-world coding tests

Three independent hands-on tests produced results that narrow the gap considerably.

Fabio Akita (Rails 8 coding benchmark): GLM-5.2 scored 87/100 (Tier A, rank #6) and Kimi K2.7 scored 86/100 (Tier A, rank #8). GLM 5.1 had scored just 46/100 (Tier C), making the 5.1-to-5.2 jump the largest single-version gain in the benchmark's history. Kimi was faster to complete (22 minutes vs 43 minutes), but GLM produced cleaner dependency injection and zero hallucinations.

Kilo Code (feature flag backend service): GLM-5.2 earned a planning score of 9.0 vs Kimi's 8.1. Both models built functionally identical services, and GLM passed 15/15 automated checks compared to Kimi's 14/15. The difference came in architectural judgment during planning. GLM made explicit decisions on edge cases that Kimi deferred.

Mehul Gupta (World Cup app debugging): Both models tied on bug fixing. Kimi finished faster and added unsolicited features, showing more agentic initiative. GLM produced a richer one-shot animated HTML simulation.

Across all three tests, GLM-5.2 wins on planning quality and hallucination avoidance, and Kimi K2.7 wins on speed and autonomous scope extension.

Context window

GLM-5.2 supports a 1M-token context window, described by Z.ai as lossless at full context. This is large enough to ingest entire repositories, long production logs, or multi-document sets in a single call. Maximum output is 128K tokens.

Kimi K2.7 has a nominal 256K-token context window, but practical recall degrades past approximately 180K tokens. The 180K effective window still handles most single-file or moderate-repository tasks. If your prompt will regularly exceed 200K tokens, GLM-5.2 is the safer choice.

Multimodality

Kimi K2.7 accepts images (PNG, JPEG, WebP, GIF) and video (MP4, MPEG, MOV, AVI, WMV) through its integrated MoonViT encoder. Workflows that involve UI screenshots, design mockups, video recordings of bugs, or scanned documents can pass all of these to the model in a single call.

GLM-5.2 is text-only. Z.ai provides a separate GLM-OCR model for document and image understanding, priced at $0.03/MTok. For text-only workflows this is fine, but if images appear in the loop, GLM-5.2 requires an additional model step.

API setup and constraints

Both models expose an OpenAI-compatible chat completions endpoint.

Kimi K2.7 Code quickstart

Thinking mode is mandatory and cannot be disabled. Several sampling parameters are also locked server-side, with temperature fixed at 1.0, top_p at 0.95, n at 1, and presence_penalty and frequency_penalty at 0.0. Passing any other value for these parameters returns an API error. For multi-turn tool calls, you must also preserve reasoning_content from the current assistant message in context.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["MOONSHOT_API_KEY"],
    base_url="https://api.moonshot.ai/v1",
)

response = client.chat.completions.create(
    model="kimi-k2.7-code",
    messages=[
        {"role": "system", "content": "You are a coding agent."},
        {"role": "user", "content": "Refactor utils.py to remove duplicate code."},
    ],
    max_tokens=32768,
    # temperature, top_p, n, and penalties are fixed server-side.
    # Passing any other value raises an API error.
    # Thinking mode is always on; it cannot be disabled.
)

print(response.choices[0].message.content)

GLM-5.2 quickstart

GLM-5.2 is more flexible. Temperature is adjustable, reasoning mode is optional (enable it with "thinking": {"type": "enabled"}), and reasoning_effort accepts multiple levels. No sampling parameters are locked.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["ZAI_API_KEY"],
    base_url="https://api.z.ai/api/paas/v4",
)

response = client.chat.completions.create(
    model="glm-5.2",
    messages=[
        {"role": "user", "content": "Plan a feature flag service for gradual rollouts."},
    ],
    max_tokens=131072,
    extra_body={
        "thinking": {"type": "enabled"},
        "reasoning_effort": "max",
    },
)

print(response.choices[0].message.content)

For Cline and other OpenAI-compatible tools, set the base URL to https://api.z.ai/api/paas/v4 and the model to glm-5.2.

Speed and pricing

Speed

The HighSpeed variant doubles the price but changes the economics for interactive coding sessions. At 260 tok/s, it approaches editor autocomplete latency. GLM-5.2's speed varies significantly by provider. Z.ai's own endpoint ran at 12–55 tok/s during Fabio Akita's benchmark (throttled under high load), while OpenRouter can deliver up to 166 tok/s.

Pricing

For a 30K-input / 5K-output code review task, Kimi K2.7 costs approximately $0.0485 and GLM-5.2 costs approximately $0.064, a saving of about 24% in Kimi's favour. At identical task complexity, Kimi is the cheaper option unless HighSpeed is required. OpenRouter lists lower prices for both models due to negotiated platform rates. The figures above are official vendor prices.

Self-hosting

Neither model is a laptop deployment. Kimi's 595 GB at INT4 requires multi-GPU server hardware. GLM-5.2 has broader self-hosting support, including Unsloth and GGUF quantizations, which opens more pathways to lower-tier infrastructure. Both models are open-weight under permissive licenses, which matters for teams that need to avoid vendor lock-in or export-control exposure.

When to choose each model

Choose GLM-5.2 if

Your prompts regularly exceed 180K tokens (full repository ingestion, large log analysis, multi-document review)
You need maximum output tokens in a single response (up to 128K)
You want to tune sampling parameters or make reasoning mode optional
Your workflow is entirely text-based (no screenshots, no video)
You need the strongest available independent benchmark coverage before committing to a model
Planning quality matters more than raw speed (GLM-5.2 edged out Kimi on architectural decision-making in two independent planning tests)

Choose Kimi K2.7 Code if

Your workflows include UI screenshots, design files, video screen recordings, or scanned documents
You want lower per-token costs for standard (non-HighSpeed) usage
Interactive latency is critical (the HighSpeed variant at 180–260 tok/s is faster than GLM-5.2's typical endpoints)
Agentic initiative matters (Kimi showed more autonomous scope extension in the Mehul Gupta test)
You're running batch coding jobs where Kimi's lower input price compounds across many calls

FAQ

Does GLM-5.2 support image input? No. GLM-5.2 is text-only. Z.ai offers a separate GLM-OCR model for image and document processing. Kimi K2.7 accepts images and video natively through its MoonViT encoder.

Can I disable thinking mode in Kimi K2.7? No. The Kimi API documentation states that thinking mode is required. Attempting to pass a parameter to disable it returns an API error.

Which model has better benchmark coverage? GLM-5.2 has broader independent third-party coverage as of June 2026, including SWE-bench Pro, Terminal-Bench, and FrontierSWE scores. Kimi K2.7's launch numbers were entirely vendor-reported. Independent evaluations for Kimi K2.7 are still pending.

What is the hallucination difference between the two? Artificial Analysis measured a non-hallucination rate of 20% for Kimi K2.7 vs 66% for GLM-5 on the AA-Omniscience benchmark. This is a significant gap, though the comparison is against GLM-5 (the older version), not GLM-5.2 specifically.

Can I use either model in Cline or other OpenAI-compatible tools? Yes for both. For GLM-5.2, set the base URL to https://api.z.ai/api/paas/v4 and the model to glm-5.2. For Kimi K2.7, set the base URL to https://api.moonshot.ai/v1 and the model to kimi-k2.7-code. Both are OpenAI SDK compatible.

Quick comparison​

Benchmarks​

Independent scores​

Vendor-reported results​

Real-world coding tests​

Context window​

Multimodality​

API setup and constraints​

Kimi K2.7 Code quickstart​

GLM-5.2 quickstart​

Speed and pricing​

Speed​

Pricing​

Self-hosting​

When to choose each model​

Choose GLM-5.2 if​

Choose Kimi K2.7 Code if​

FAQ​