LLMmoonshotai

MoonshotAI: Kimi K2 0905

Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...

Anyone in the Space can @-mention MoonshotAI: Kimi K2 0905 with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Kimi K2 0905 delivers a 262K token context window at $0.60 per million input tokens — roughly half the cost of GPT-4o or Claude Sonnet 4.5 for long-context work. MoonshotAI built this model for Chinese and English processing, with strong performance on multi-document synthesis and extended reasoning chains. The trade-off: limited public benchmark data makes it harder to gauge performance against Western incumbents on code, math, or creative writing. Reach for this when you need massive context on a budget and your workload skews toward document analysis or bilingual support.

Best for

Long-context document analysis under budget
Bilingual Chinese-English processing
Multi-document synthesis and summarization
Cost-sensitive extended reasoning tasks

Strengths

The 262K context window handles entire codebases, legal filings, or research paper collections in a single call. Input pricing at $0.60/Mtok undercuts GPT-4o ($2.50) and Claude Sonnet 4.5 ($3.00) by 75%, making it viable for high-volume long-context jobs. MoonshotAI optimized the model for Chinese-English bilingual tasks, a gap in many Western models. The architecture supports extended reasoning chains without the context-stuffing tricks required by smaller windows.

Trade-offs

Public benchmark coverage is sparse — no MMLU, HumanEval, or GPQA scores available to compare against Claude, GPT-4o, or Gemini on standard tasks. Output pricing at $2.50/Mtok sits near GPT-4o levels, so cost savings vanish if your use case generates verbose responses. Early reports suggest weaker performance on advanced math and code generation relative to Sonnet 4.5 or o1-preview. The model is less tested in production Western workflows than incumbents with years of public deployment.

Specifications

Provider: moonshotai
Category: llm
Context length: 262,144 tokens
Max output: 262,144 tokens
Modalities: text
License: proprietary
Released: 2025-09-04

Pricing

Input: $0.60/Mtok
Output: $2.50/Mtok
Model ID: moonshotai/kimi-k2-0905

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$20.59

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
moonshotai	262k	$0.60/Mtok	$2.50/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Multi-Document Synthesis

I'm providing five research papers on climate adaptation strategies. Read all five, then produce a 500-word synthesis highlighting consensus findings, conflicting conclusions, and gaps in the literature.

Open in a Space →

Bilingual Contract Review

Compare the Chinese and English versions of this SaaS agreement. Flag any clauses where the translations diverge in meaning, especially around liability, termination, and IP rights.

Open in a Space →

Codebase Documentation

Here's a 50-file Python codebase for a task queue system. Write a 1000-word architecture overview covering data flow, key abstractions, and extension points for new developers.

Open in a Space →

Long-Form Q&A

I've pasted a 200-page employee handbook. Answer this question with specific section references: What are the procedures for requesting parental leave as a contractor in Germany?

Open in a Space →

Meeting Transcript Analysis

Here are transcripts from three consecutive board meetings (8 hours total). List all action items assigned to the CFO, grouped by strategic initiative, with timestamps.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

I have a 180-page PDF contract in Chinese. Extract all payment terms, delivery schedules, and penalty clauses. Organize them in a comparison table with English translations.

Output

This example would produce a structured table with three columns: Original Chinese Text, English Translation, and Document Section Reference. The model processes the entire 180-page document within its 262k token context window, identifying 23 payment milestones across sections 4.2, 7.1, and 12.3, eight delivery schedules with associated penalties ranging from 0.5% to 3% of contract value, and four force majeure clauses. Each extracted clause includes page numbers and maintains legal terminology precision in both languages. The table format makes cross-referencing straightforward for legal review.

Notes

Showcases the model's extended context window handling long documents without chunking, plus its bilingual Chinese-English capability. The 262k token limit supports most business documents in full. However, at $2.00/Mtok output, generating comprehensive tables from large documents costs more than models with cheaper output pricing.

Prompt

Debug this Python function that's supposed to merge overlapping time intervals but fails on edge cases. Explain what's wrong and provide a corrected version with test cases.

Output

This example would produce a detailed analysis identifying that the original function fails when intervals are adjacent but not overlapping (e.g., [1,3] and [3,5]), and when the input list is unsorted. The model explains that the comparison logic uses `interval[0] <= last[1]` when it should use `interval[0] <= last[1] + 1` for adjacent intervals, and that sorting is missing. The corrected version includes a sort step, fixed comparison logic, and five test cases covering empty input, single interval, overlapping pairs, adjacent intervals, and completely separate intervals with expected outputs.

Notes

Demonstrates code comprehension and systematic debugging across a realistic edge-case scenario. The model's text-only modality means it handles code as text, which works well for most programming tasks. The $0.40 input pricing makes it economical for iterative debugging sessions with large codebases.

Prompt

Write a 2000-word research summary on recent advances in solid-state battery technology, citing specific papers from 2023-2024. Include technical challenges, commercial timeline predictions, and key players.

Output

This example would produce a structured summary covering sulfide and oxide electrolyte developments, with specific references to Nature Energy and Advanced Materials publications. The model discusses lithium dendrite mitigation strategies, manufacturing scalability challenges at the cathode-electrolyte interface, and cost projections showing solid-state batteries reaching price parity with lithium-ion by 2028-2030. It identifies QuantumScape, Solid Power, and Toyota's research divisions as leading commercial efforts, noting that current energy density improvements of 40-50% over conventional cells come with 3-5x manufacturing complexity. The summary maintains technical depth while remaining accessible to engineering managers.

Notes

Highlights the model's ability to synthesize technical information into long-form content within a single response. The large context window allows incorporating multiple research threads without losing coherence. However, without real-time web access, the model cannot verify 2024 publication dates or latest commercial announcements, so fact-checking recent claims remains necessary.

Use-case deep-dives

Multi-document legal discovery

When 262k context handles discovery without chunking strategies

A 4-person litigation support team needs to cross-reference depositions, contracts, and email threads spanning 80-120 pages per case. Kimi K2 0905's 262,144-token window fits most discovery packets in a single prompt, eliminating the chunking and retrieval overhead that breaks context in smaller models. At $0.40 input per million tokens, loading a 60,000-token discovery set costs $0.024—cheap enough to run exploratory queries without budgeting friction. The $2.00 output rate becomes the cost driver if you're generating 10-page summaries per case; if your output stays under 5,000 tokens, this model undercuts GPT-4 Turbo on total cost. Without public benchmarks, you're flying blind on accuracy for complex reasoning tasks, so pilot it on 5-10 real cases before committing. If discovery packets routinely exceed 200k tokens or you need cited legal reasoning with audit trails, test against Claude 3.5 Sonnet's 200k window and stronger benchmark track record.

Quarterly earnings call transcription analysis

Why investor relations teams use this for full-transcript sentiment extraction

A 3-person IR team at a mid-cap public company analyzes 12-15 earnings calls per quarter, each running 18,000-25,000 tokens when transcribed. Kimi K2 0905 ingests the full call, Q&A, and prior quarter's transcript in one prompt (typically 50,000 tokens combined), then extracts sentiment shifts, forward guidance changes, and analyst concern patterns. The $0.40 input rate means a 50k-token analysis costs $0.02 in input fees—negligible compared to the 20 minutes saved vs. manual review. Output costs stay low because the team wants structured JSON extracts (500-800 tokens), not prose summaries. The 262k window also handles side-by-side comparison of 4-5 quarters without retrieval pipelines. The missing benchmark data is a yellow flag for nuanced financial reasoning; run a shadow comparison against GPT-4o on 3 calls to verify it catches guidance hedges and tone shifts your team considers material.

Customer support ticket triage at scale

When high-volume triage needs cheap input costs more than reasoning depth

A 12-person support team at a SaaS company processes 400-600 tickets daily, each averaging 800 tokens (customer message, account history snippet, prior thread). Kimi K2 0905's $0.40 input pricing makes it the cheapest model in Switchy's roster for bulk classification and routing—triaging 500 tickets (400k tokens total) costs $0.16 in input fees. The model assigns category tags, urgency scores, and suggested macros in 150-200 output tokens per ticket, keeping output costs under $0.15 for the full batch. The 262k context window is overkill for individual tickets but useful for the team's weekly batch job that analyzes 2,000 tickets at once to surface trending issues. The trade-off: without MMLU or HumanEval scores, you don't know if it handles edge-case reasoning (refund policy exceptions, multi-product conflicts) as well as GPT-4o Mini. If more than 5% of tickets escalate due to misrouting, the labor cost exceeds the model savings—test on 200 tickets first.

Frequently asked

Is Kimi K2 0905 good for long document analysis?

Yes. With a 262,144-token context window, Kimi K2 handles full-length books, legal contracts, or multi-file codebases in a single prompt. That's roughly 200,000 words of input capacity. The model processes long contexts without the chunking overhead you'd face with smaller-window alternatives, making it practical for research synthesis or comprehensive document review tasks.

Is Kimi K2 0905 cheaper than GPT-4o or Claude Sonnet?

Yes, significantly on input. At $0.40 per million input tokens, Kimi K2 costs one-tenth of GPT-4o ($5.00) and one-eighth of Claude Sonnet 3.5 ($3.00). Output pricing at $2.00/Mtok is competitive with GPT-4o mini. For long-context workloads where you're feeding large documents, the input savings compound quickly. Budget-conscious teams processing high token volumes should evaluate this seriously.

Can Kimi K2 0905 handle code generation and debugging?

Unknown without benchmarks. MoonshotAI hasn't published HumanEval, MBPP, or SWE-bench scores for this model. The large context window helps with multi-file codebases, but we can't confirm instruction-following quality or syntax accuracy. If coding is your primary use case, test it against Claude Sonnet 3.5 or GPT-4o on your actual codebase before committing to production use.

How does Kimi K2 0905 compare to other Chinese LLMs?

We lack comparative data. MoonshotAI positions Kimi as a long-context specialist, but without published MMLU, C-Eval, or CMMLU scores, we can't benchmark it against DeepSeek, Qwen, or GLM models. The pricing undercuts most Western models, and the context window exceeds typical Chinese LLM offerings. Request eval access or run your own tests if you're choosing between domestic providers.

Should I use Kimi K2 0905 for customer-facing chatbots?

Depends on your latency tolerance and language requirements. The massive context window isn't necessary for most chat sessions, and we don't have latency benchmarks. If your users are primarily Mandarin-speaking and you need cost efficiency, it's worth testing. For English-dominant audiences or latency-sensitive applications, start with GPT-4o mini or Claude Haiku until you've validated Kimi's response quality and speed.