
Kimi K2.7 vs GLM 5.2 is a comparison worth making carefully. Both are open-weight MoE models released one day apart in June 2026, both target developers and agentic coding workflows, but their design choices diverge significantly. GLM-5.2 (from Z.ai / Zhipu AI) leads with a 1M-token context window and text-only breadth. Kimi K2.7 Code (from Moonshot AI) brings multimodality via MoonViT, a HighSpeed variant for low-latency sessions, and a mandatory reasoning mode with no opt-out. This article covers benchmark scores from independent and vendor sources, API behaviour including Kimi's hard parameter constraints, pricing with a worked cost example, self-hosting feasibility, and a decision framework for choosing between the two.
Quick comparison
Benchmarks
Two benchmark pools exist for these models: independent third-party measurements, and vendor-reported results published at launch. They tell different parts of the story and should be read separately.
Independent scores
Artificial Analysis is the main independent source for both models. GLM-5.2 scores 51 on the AA Intelligence Index, and Kimi K2.7 Code is estimated at 42. Note that the AA comparison page measures Kimi K2.7 against GLM-5, not GLM-5.2, so some rows in the table below reflect the older GLM model.
Two findings here are underreported in most roundups. First, GLM-5 scored a 66% non-hallucination rate vs Kimi's 20% on the AA-Omniscience benchmark, a substantial gap. Second, GLM-5 leads on instruction following (72% vs 63%) despite Kimi winning on scientific reasoning and long-context tasks. Both comparisons are against the older GLM-5, not GLM-5.2.
Vendor-reported results
All Kimi K2.7 numbers at launch were first-party. GLM-5.2 has broader coverage from named benchmarks including SWE-bench Pro, Terminal-Bench, and FrontierSWE.
GLM-5.2 (company-reported):
Kimi K2.7 Code (company-reported):
No independent SWE-bench Pro result for Kimi K2.7 was available at time of writing.
Real-world coding tests
Three independent hands-on tests produced results that narrow the gap considerably.
Fabio Akita (Rails 8 coding benchmark): GLM-5.2 scored 87/100 (Tier A, rank #6) and Kimi K2.7 scored 86/100 (Tier A, rank #8). GLM 5.1 had scored just 46/100 (Tier C), making the 5.1-to-5.2 jump the largest single-version gain in the benchmark's history. Kimi was faster to complete (22 minutes vs 43 minutes), but GLM produced cleaner dependency injection and zero hallucinations.
Kilo Code (feature flag backend service): GLM-5.2 earned a planning score of 9.0 vs Kimi's 8.1. Both models built functionally identical services, and GLM passed 15/15 automated checks compared to Kimi's 14/15. The difference came in architectural judgment during planning. GLM made explicit decisions on edge cases that Kimi deferred.
Mehul Gupta (World Cup app debugging): Both models tied on bug fixing. Kimi finished faster and added unsolicited features, showing more agentic initiative. GLM produced a richer one-shot animated HTML simulation.
Across all three tests, GLM-5.2 wins on planning quality and hallucination avoidance, and Kimi K2.7 wins on speed and autonomous scope extension.
Context window
GLM-5.2 supports a 1M-token context window, described by Z.ai as lossless at full context. This is large enough to ingest entire repositories, long production logs, or multi-document sets in a single call. Maximum output is 128K tokens.
Kimi K2.7 has a nominal 256K-token context window, but practical recall degrades past approximately 180K tokens. The 180K effective window still handles most single-file or moderate-repository tasks. If your prompt will regularly exceed 200K tokens, GLM-5.2 is the safer choice.
Multimodality
Kimi K2.7 accepts images (PNG, JPEG, WebP, GIF) and video (MP4, MPEG, MOV, AVI, WMV) through its integrated MoonViT encoder. Workflows that involve UI screenshots, design mockups, video recordings of bugs, or scanned documents can pass all of these to the model in a single call.
GLM-5.2 is text-only. Z.ai provides a separate GLM-OCR model for document and image understanding, priced at $0.03/MTok. For text-only workflows this is fine, but if images appear in the loop, GLM-5.2 requires an additional model step.
API setup and constraints
Both models expose an OpenAI-compatible chat completions endpoint.
Kimi K2.7 Code quickstart
Thinking mode is mandatory and cannot be disabled. Several sampling parameters are also locked server-side, with temperature fixed at 1.0, top_p at 0.95, n at 1, and presence_penalty and frequency_penalty at 0.0. Passing any other value for these parameters returns an API error. For multi-turn tool calls, you must also preserve reasoning_content from the current assistant message in context.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["MOONSHOT_API_KEY"],
base_url="https://api.moonshot.ai/v1",
)
response = client.chat.completions.create(
model="kimi-k2.7-code",
messages=[
{"role": "system", "content": "You are a coding agent."},
{"role": "user", "content": "Refactor utils.py to remove duplicate code."},
],
max_tokens=32768,
# temperature, top_p, n, and penalties are fixed server-side.
# Passing any other value raises an API error.
# Thinking mode is always on; it cannot be disabled.
)
print(response.choices[0].message.content)
GLM-5.2 quickstart
GLM-5.2 is more flexible. Temperature is adjustable, reasoning mode is optional (enable it with "thinking": {"type": "enabled"}), and reasoning_effort accepts multiple levels. No sampling parameters are locked.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["ZAI_API_KEY"],
base_url="https://api.z.ai/api/paas/v4",
)
response = client.chat.completions.create(
model="glm-5.2",
messages=[
{"role": "user", "content": "Plan a feature flag service for gradual rollouts."},
],
max_tokens=131072,
extra_body={
"thinking": {"type": "enabled"},
"reasoning_effort": "max",
},
)
print(response.choices[0].message.content)
For Cline and other OpenAI-compatible tools, set the base URL to https://api.z.ai/api/paas/v4 and the model to glm-5.2.
Speed and pricing
Speed
The HighSpeed variant doubles the price but changes the economics for interactive coding sessions. At 260 tok/s, it approaches editor autocomplete latency. GLM-5.2's speed varies significantly by provider. Z.ai's own endpoint ran at 12–55 tok/s during Fabio Akita's benchmark (throttled under high load), while OpenRouter can deliver up to 166 tok/s.
Pricing
For a 30K-input / 5K-output code review task, Kimi K2.7 costs approximately $0.0485 and GLM-5.2 costs approximately $0.064, a saving of about 24% in Kimi's favour. At identical task complexity, Kimi is the cheaper option unless HighSpeed is required. OpenRouter lists lower prices for both models due to negotiated platform rates. The figures above are official vendor prices.
Self-hosting
Neither model is a laptop deployment. Kimi's 595 GB at INT4 requires multi-GPU server hardware. GLM-5.2 has broader self-hosting support, including Unsloth and GGUF quantizations, which opens more pathways to lower-tier infrastructure. Both models are open-weight under permissive licenses, which matters for teams that need to avoid vendor lock-in or export-control exposure.
When to choose each model
Choose GLM-5.2 if
- Your prompts regularly exceed 180K tokens (full repository ingestion, large log analysis, multi-document review)
- You need maximum output tokens in a single response (up to 128K)
- You want to tune sampling parameters or make reasoning mode optional
- Your workflow is entirely text-based (no screenshots, no video)
- You need the strongest available independent benchmark coverage before committing to a model
- Planning quality matters more than raw speed (GLM-5.2 edged out Kimi on architectural decision-making in two independent planning tests)
Choose Kimi K2.7 Code if
- Your workflows include UI screenshots, design files, video screen recordings, or scanned documents
- You want lower per-token costs for standard (non-HighSpeed) usage
- Interactive latency is critical (the HighSpeed variant at 180–260 tok/s is faster than GLM-5.2's typical endpoints)
- Agentic initiative matters (Kimi showed more autonomous scope extension in the Mehul Gupta test)
- You're running batch coding jobs where Kimi's lower input price compounds across many calls
FAQ
Does GLM-5.2 support image input? No. GLM-5.2 is text-only. Z.ai offers a separate GLM-OCR model for image and document processing. Kimi K2.7 accepts images and video natively through its MoonViT encoder.
Can I disable thinking mode in Kimi K2.7? No. The Kimi API documentation states that thinking mode is required. Attempting to pass a parameter to disable it returns an API error.
Which model has better benchmark coverage? GLM-5.2 has broader independent third-party coverage as of June 2026, including SWE-bench Pro, Terminal-Bench, and FrontierSWE scores. Kimi K2.7's launch numbers were entirely vendor-reported. Independent evaluations for Kimi K2.7 are still pending.
What is the hallucination difference between the two? Artificial Analysis measured a non-hallucination rate of 20% for Kimi K2.7 vs 66% for GLM-5 on the AA-Omniscience benchmark. This is a significant gap, though the comparison is against GLM-5 (the older version), not GLM-5.2 specifically.
Can I use either model in Cline or other OpenAI-compatible tools?
Yes for both. For GLM-5.2, set the base URL to https://api.z.ai/api/paas/v4 and the model to glm-5.2. For Kimi K2.7, set the base URL to https://api.moonshot.ai/v1 and the model to kimi-k2.7-code. Both are OpenAI SDK compatible.