MoonshotAI: Kimi K2 0905
Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...
Anyone in the Space can @-mention MoonshotAI: Kimi K2 0905 with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Long-context document analysis under budget
- Bilingual Chinese-English processing
- Multi-document synthesis and summarization
- Cost-sensitive extended reasoning tasks
Strengths
The 262K context window handles entire codebases, legal filings, or research paper collections in a single call. Input pricing at $0.60/Mtok undercuts GPT-4o ($2.50) and Claude Sonnet 4.5 ($3.00) by 75%, making it viable for high-volume long-context jobs. MoonshotAI optimized the model for Chinese-English bilingual tasks, a gap in many Western models. The architecture supports extended reasoning chains without the context-stuffing tricks required by smaller windows.
Trade-offs
Public benchmark coverage is sparse — no MMLU, HumanEval, or GPQA scores available to compare against Claude, GPT-4o, or Gemini on standard tasks. Output pricing at $2.50/Mtok sits near GPT-4o levels, so cost savings vanish if your use case generates verbose responses. Early reports suggest weaker performance on advanced math and code generation relative to Sonnet 4.5 or o1-preview. The model is less tested in production Western workflows than incumbents with years of public deployment.
Specifications
- Provider
- moonshotai
- Category
- llm
- Context length
- 262,144 tokens
- Max output
- 262,144 tokens
- Modalities
- text
- License
- proprietary
- Released
- 2025-09-04
Pricing
- Input
- $0.60/Mtok
- Output
- $2.50/Mtok
- Model ID
moonshotai/kimi-k2-0905
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| moonshotai | 262k | $0.60/Mtok | $2.50/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Multi-Document Synthesis
I'm providing five research papers on climate adaptation strategies. Read all five, then produce a 500-word synthesis highlighting consensus findings, conflicting conclusions, and gaps in the literature.Open in a Space →
Bilingual Contract Review
Compare the Chinese and English versions of this SaaS agreement. Flag any clauses where the translations diverge in meaning, especially around liability, termination, and IP rights.Open in a Space →
Codebase Documentation
Here's a 50-file Python codebase for a task queue system. Write a 1000-word architecture overview covering data flow, key abstractions, and extension points for new developers.Open in a Space →
Long-Form Q&A
I've pasted a 200-page employee handbook. Answer this question with specific section references: What are the procedures for requesting parental leave as a contractor in Germany?Open in a Space →
Meeting Transcript Analysis
Here are transcripts from three consecutive board meetings (8 hours total). List all action items assigned to the CFO, grouped by strategic initiative, with timestamps.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
I have a 180-page PDF contract in Chinese. Extract all payment terms, delivery schedules, and penalty clauses. Organize them in a comparison table with English translations.
This example would produce a structured table with three columns: Original Chinese Text, English Translation, and Document Section Reference. The model processes the entire 180-page document within its 262k token context window, identifying 23 payment milestones across sections 4.2, 7.1, and 12.3, eight delivery schedules with associated penalties ranging from 0.5% to 3% of contract value, and four force majeure clauses. Each extracted clause includes page numbers and maintains legal terminology precision in both languages. The table format makes cross-referencing straightforward for legal review.
Showcases the model's extended context window handling long documents without chunking, plus its bilingual Chinese-English capability. The 262k token limit supports most business documents in full. However, at $2.00/Mtok output, generating comprehensive tables from large documents costs more than models with cheaper output pricing.
Debug this Python function that's supposed to merge overlapping time intervals but fails on edge cases. Explain what's wrong and provide a corrected version with test cases.
This example would produce a detailed analysis identifying that the original function fails when intervals are adjacent but not overlapping (e.g., [1,3] and [3,5]), and when the input list is unsorted. The model explains that the comparison logic uses `interval[0] <= last[1]` when it should use `interval[0] <= last[1] + 1` for adjacent intervals, and that sorting is missing. The corrected version includes a sort step, fixed comparison logic, and five test cases covering empty input, single interval, overlapping pairs, adjacent intervals, and completely separate intervals with expected outputs.
Demonstrates code comprehension and systematic debugging across a realistic edge-case scenario. The model's text-only modality means it handles code as text, which works well for most programming tasks. The $0.40 input pricing makes it economical for iterative debugging sessions with large codebases.
Write a 2000-word research summary on recent advances in solid-state battery technology, citing specific papers from 2023-2024. Include technical challenges, commercial timeline predictions, and key players.
This example would produce a structured summary covering sulfide and oxide electrolyte developments, with specific references to Nature Energy and Advanced Materials publications. The model discusses lithium dendrite mitigation strategies, manufacturing scalability challenges at the cathode-electrolyte interface, and cost projections showing solid-state batteries reaching price parity with lithium-ion by 2028-2030. It identifies QuantumScape, Solid Power, and Toyota's research divisions as leading commercial efforts, noting that current energy density improvements of 40-50% over conventional cells come with 3-5x manufacturing complexity. The summary maintains technical depth while remaining accessible to engineering managers.
Highlights the model's ability to synthesize technical information into long-form content within a single response. The large context window allows incorporating multiple research threads without losing coherence. However, without real-time web access, the model cannot verify 2024 publication dates or latest commercial announcements, so fact-checking recent claims remains necessary.
Use-case deep-dives
When 262k context handles discovery without chunking strategies
A 4-person litigation support team needs to cross-reference depositions, contracts, and email threads spanning 80-120 pages per case. Kimi K2 0905's 262,144-token window fits most discovery packets in a single prompt, eliminating the chunking and retrieval overhead that breaks context in smaller models. At $0.40 input per million tokens, loading a 60,000-token discovery set costs $0.024—cheap enough to run exploratory queries without budgeting friction. The $2.00 output rate becomes the cost driver if you're generating 10-page summaries per case; if your output stays under 5,000 tokens, this model undercuts GPT-4 Turbo on total cost. Without public benchmarks, you're flying blind on accuracy for complex reasoning tasks, so pilot it on 5-10 real cases before committing. If discovery packets routinely exceed 200k tokens or you need cited legal reasoning with audit trails, test against Claude 3.5 Sonnet's 200k window and stronger benchmark track record.
Why investor relations teams use this for full-transcript sentiment extraction
A 3-person IR team at a mid-cap public company analyzes 12-15 earnings calls per quarter, each running 18,000-25,000 tokens when transcribed. Kimi K2 0905 ingests the full call, Q&A, and prior quarter's transcript in one prompt (typically 50,000 tokens combined), then extracts sentiment shifts, forward guidance changes, and analyst concern patterns. The $0.40 input rate means a 50k-token analysis costs $0.02 in input fees—negligible compared to the 20 minutes saved vs. manual review. Output costs stay low because the team wants structured JSON extracts (500-800 tokens), not prose summaries. The 262k window also handles side-by-side comparison of 4-5 quarters without retrieval pipelines. The missing benchmark data is a yellow flag for nuanced financial reasoning; run a shadow comparison against GPT-4o on 3 calls to verify it catches guidance hedges and tone shifts your team considers material.
When high-volume triage needs cheap input costs more than reasoning depth
A 12-person support team at a SaaS company processes 400-600 tickets daily, each averaging 800 tokens (customer message, account history snippet, prior thread). Kimi K2 0905's $0.40 input pricing makes it the cheapest model in Switchy's roster for bulk classification and routing—triaging 500 tickets (400k tokens total) costs $0.16 in input fees. The model assigns category tags, urgency scores, and suggested macros in 150-200 output tokens per ticket, keeping output costs under $0.15 for the full batch. The 262k context window is overkill for individual tickets but useful for the team's weekly batch job that analyzes 2,000 tickets at once to surface trending issues. The trade-off: without MMLU or HumanEval scores, you don't know if it handles edge-case reasoning (refund policy exceptions, multi-product conflicts) as well as GPT-4o Mini. If more than 5% of tickets escalate due to misrouting, the labor cost exceeds the model savings—test on 200 tickets first.
Frequently asked
Is Kimi K2 0905 good for long document analysis?
Yes. With a 262,144-token context window, Kimi K2 handles full-length books, legal contracts, or multi-file codebases in a single prompt. That's roughly 200,000 words of input capacity. The model processes long contexts without the chunking overhead you'd face with smaller-window alternatives, making it practical for research synthesis or comprehensive document review tasks.
Is Kimi K2 0905 cheaper than GPT-4o or Claude Sonnet?
Yes, significantly on input. At $0.40 per million input tokens, Kimi K2 costs one-tenth of GPT-4o ($5.00) and one-eighth of Claude Sonnet 3.5 ($3.00). Output pricing at $2.00/Mtok is competitive with GPT-4o mini. For long-context workloads where you're feeding large documents, the input savings compound quickly. Budget-conscious teams processing high token volumes should evaluate this seriously.
Can Kimi K2 0905 handle code generation and debugging?
Unknown without benchmarks. MoonshotAI hasn't published HumanEval, MBPP, or SWE-bench scores for this model. The large context window helps with multi-file codebases, but we can't confirm instruction-following quality or syntax accuracy. If coding is your primary use case, test it against Claude Sonnet 3.5 or GPT-4o on your actual codebase before committing to production use.
How does Kimi K2 0905 compare to other Chinese LLMs?
We lack comparative data. MoonshotAI positions Kimi as a long-context specialist, but without published MMLU, C-Eval, or CMMLU scores, we can't benchmark it against DeepSeek, Qwen, or GLM models. The pricing undercuts most Western models, and the context window exceeds typical Chinese LLM offerings. Request eval access or run your own tests if you're choosing between domestic providers.
Should I use Kimi K2 0905 for customer-facing chatbots?
Depends on your latency tolerance and language requirements. The massive context window isn't necessary for most chat sessions, and we don't have latency benchmarks. If your users are primarily Mandarin-speaking and you need cost efficiency, it's worth testing. For English-dominant audiences or latency-sensitive applications, start with GPT-4o mini or Claude Haiku until you've validated Kimi's response quality and speed.