LLMz-ai

Z.ai: GLM 4.5 Air

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter...

Anyone in the Space can @-mention Z.ai: GLM 4.5 Air with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

GLM 4.5 Air targets cost-conscious teams needing a 128K context window without the premium pricing of Western frontier models. At $0.13/$0.85 per Mtok, it undercuts GPT-4o and Claude significantly while maintaining a large context. The trade-off is sparse public benchmark data and less-proven performance on nuanced reasoning tasks. Reach for this when budget constraints matter more than bleeding-edge accuracy, especially for Chinese-language workflows or bulk document processing where context length justifies the risk.

Best for

Budget-sensitive long-context tasks
Chinese-language document analysis
High-volume text processing pipelines
Prototyping before scaling to premium models

Strengths

The 128K context window matches GPT-4 Turbo at a fraction of the cost, making it viable for ingesting full codebases or lengthy contracts in a single call. Pricing sits roughly 80% below Anthropic and OpenAI equivalents, which matters for teams running thousands of requests daily. The model originates from Zhipu AI's GLM series, which has shown competitive performance on Chinese-language benchmarks in prior releases, suggesting strength in multilingual scenarios where English-only models stumble.

Trade-offs

Public benchmark coverage is nearly nonexistent, so you're flying blind compared to models with extensive MMLU, HumanEval, and reasoning evals. Early adopters report weaker performance on complex multi-step reasoning and creative writing versus Claude Sonnet or GPT-4o. Output quality can drift on edge-case prompts, and the model lacks the safety tuning depth of Western labs. If your task demands high reliability or passes through compliance review, the lack of transparency becomes a blocking issue.

Specifications

Provider: z-ai
Category: llm
Context length: 131,072 tokens
Max output: 98,304 tokens
Modalities: text
License: proprietary
Released: 2025-07-25

Pricing

Input: $0.13/Mtok
Output: $0.85/Mtok
Model ID: z-ai/glm-4.5-air

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$6.09

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
z-ai	131k	$0.13/Mtok	$0.85/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Bulk Contract Extraction

Extract all payment terms, renewal clauses, and termination conditions from the attached lease. List each with page references and flag any ambiguous language.

Open in a Space →

Chinese-English Code Comments

Translate all Chinese comments in this Python file to English. Preserve technical terms and keep the tone consistent with existing English comments.

Open in a Space →

Multi-Document Summarization

Summarize the key findings and methodology from these five papers. Highlight where results conflict and note any shared limitations.

Open in a Space →

Cost-Optimized Chatbot Backend

You are a support agent for an e-commerce platform. Answer the user's question about order status, refunds, or shipping. Be concise and friendly.

Open in a Space →

Codebase Context Search

Given this full codebase, explain how the authentication flow works from login to token refresh. Include file names and function calls.

Open in a Space →