LLMmoonshotai

MoonshotAI: Kimi K2.5

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed...

Anyone in the Space can @-mention MoonshotAI: Kimi K2.5 with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Kimi K2.5 delivers a massive 256K context window at aggressive pricing—$0.38 input makes it one of the cheapest ways to process book-length documents or sprawling codebases. The model handles Chinese and English with vision support, though public benchmark data remains sparse. Reach for this when you need enormous context on a tight budget and can tolerate less-proven performance versus Western incumbents.

Best for

Book-length document analysis under budget
Chinese-English bilingual workflows
Large codebase comprehension tasks
Cost-sensitive long-context summarization
Vision tasks with extended context

Strengths

The 256K context window rivals GPT-4 Turbo while undercutting it by 70% on input tokens. Vision support pairs well with document-heavy workflows—think scanned contracts or multi-page technical diagrams. MoonshotAI's Chinese NLP heritage suggests strong performance on mixed-language content, a gap in many Western models. Output pricing at $2.02/Mtok stays competitive for generation-heavy tasks.

Trade-offs

Public benchmark coverage is thin—you're flying without the MMLU/HumanEval safety net that validates GPT-4 or Claude. Latency and availability outside Asia-Pacific regions may lag established providers. The model's reasoning performance on complex logic puzzles or multi-step math remains unproven in head-to-head tests. Teams requiring SOC 2 compliance or US-based data residency should verify infrastructure details before committing production workloads.

Specifications

Provider: moonshotai
Category: llm
Context length: 256,000 tokens
Max output: —
Modalities: text, image
License: proprietary
Released: 2026-01-27

Pricing

Input: $0.38/Mtok
Output: $2.02/Mtok
Model ID: moonshotai/kimi-k2.5

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$15.31

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
moonshotai	256k	$0.38/Mtok	$2.02/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Extract Contract Terms

Read this entire contract and extract all payment terms, termination clauses, and liability caps into a structured table. Include section references for each item.

Open in a Space →

Summarize Codebase Architecture

Review all files in this codebase and describe the overall architecture: main modules, data flow between components, and external dependencies. Highlight any design patterns or anti-patterns.

Open in a Space →

Bilingual Document Translation

Translate this technical specification from Chinese to English. Maintain all section headings, code snippets, and technical terms. Flag any ambiguous phrases for human review.

Open in a Space →

Multi-Page Invoice Processing

Process these invoice images and create a CSV with: vendor name, invoice number, line item description, quantity, unit price, total. Flag any illegible entries.

Open in a Space →

Research Paper Synthesis

Read these five research papers on [topic] and synthesize the key findings. Identify consensus views, contradictions, and gaps in the literature. Cite specific papers for each claim.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Analyze this quarterly financial report PDF and summarize the key revenue trends, highlighting any discrepancies between the narrative and the actual figures in the tables.

Output

The report shows Q3 revenue of $47.2M, up 18% YoY, but the executive summary claims 'over 20% growth.' This appears to reference constant-currency figures ($48.1M), mentioned only in footnote 7. The EMEA segment drove most growth (+31%), while APAC declined 4% despite narrative emphasis on 'strong Asian expansion.' Operating margin compressed from 22% to 19% due to increased R&D spend, which the CFO letter frames as 'strategic investment' but doesn't quantify the expected ROI timeline. Three tables use different date ranges, making sequential comparisons difficult.

Notes

This example showcases Kimi K2.5's 262K token context window — essential for ingesting full financial documents with complex tables. The model cross-references narrative claims against numerical data effectively. However, without public benchmarks, we can't verify its accuracy against specialized financial analysis models.

Prompt

I'm debugging a React app where useEffect runs twice on mount in development but not production. Walk me through why this happens and whether I should worry about it.

Output

In React 18+, Strict Mode intentionally double-invokes effects during development to surface bugs in cleanup logic. Your component mounts → effect runs → React simulates an unmount (cleanup) → remounts → effect runs again. This only happens in dev; production runs effects once. You should worry if: (1) your effect lacks a cleanup function but creates subscriptions, timers, or event listeners — these will leak in production after navigation, or (2) your effect assumes it runs exactly once and mutates external state without idempotency. If your effect is purely declarative (fetching data, setting up a listener with proper cleanup), the double-run is harmless and actually validates your code's resilience.

Notes

Demonstrates Kimi K2.5's ability to explain framework-specific behavior with practical debugging context. The response balances theory with actionable guidance. At $1.98/Mtok output, longer explanatory responses like this cost more than faster models, so users should consider whether the detail justifies the premium.

Prompt

Generate three Instagram caption variations for a sustainable fashion brand launching recycled denim jackets — one playful, one aspirational, one educational. Keep each under 100 characters.

Output

**Playful:** Old jeans, new love story 💙♻️ Your next favorite jacket had a past life. **Aspirational:** Wear the change. Every thread tells a story of tomorrow. #SlowFashion **Educational:** 12 plastic bottles = 1 jacket. Recycled denim cuts water use by 80%. Style with impact. Each caption targets a different audience segment while maintaining brand voice. The playful version uses conversational tone for broader engagement, aspirational leans into values-driven messaging for committed sustainability shoppers, and educational provides concrete metrics for skeptics who want proof points.

Notes

Shows Kimi K2.5 handling creative copywriting across tonal ranges with built-in strategic rationale. The model's multimodal capability (text + image input) means it could analyze product photos to inform caption style, though this example uses text only. The 262K context window is overkill for this task — shorter-context models would be more cost-efficient here.

Use-case deep-dives

Multi-document legal discovery

When 262K context handles discovery without chunking overhead

A 4-person litigation support team needs to cross-reference deposition transcripts, contracts, and email threads spanning 180 pages per case. Kimi K2.5's 262,144-token window fits entire case files in a single prompt—no RAG pipeline, no chunking errors, no context-loss between queries. At $0.40/Mtok input, loading 200K tokens costs $0.08 per case; outputs run $1.98/Mtok but discovery queries rarely exceed 2K tokens of response. The trade-off: without public benchmarks you're flying blind on reasoning quality compared to GPT-4 or Claude. If your workflow tolerates some trial runs and the context window solves a real chunking pain, this is worth piloting on 10-20 cases to validate accuracy before committing.

Multilingual customer support triage

Affordable long-context triage when benchmark gaps don't block the task

A 12-person SaaS support team routes 300 tickets daily across English, Mandarin, and Japanese. Each ticket includes prior conversation history averaging 8K tokens. Kimi K2.5's $0.40 input pricing undercuts GPT-4 Turbo by 75% on the read side, and the 262K window means you can include full account history without summarization. The risk: no MMLU, no HumanEval scores means you can't benchmark its classification accuracy or multilingual reliability against known baselines. If your triage logic is rule-heavy (tags, routing, sentiment) rather than nuanced reasoning, the price and context make this a strong candidate. Run a 2-week shadow deployment against your current model to compare routing accuracy before switching.

Long-form content editing

When manuscript-length context beats per-section editing workflows

A 3-person editorial agency edits 15K-word white papers for enterprise clients, checking tone consistency, citation accuracy, and narrative flow across sections. Kimi K2.5's 262K-token window fits a 60-page manuscript in one pass, letting the model catch cross-section contradictions that chunk-based editors miss. Input cost is $0.40/Mtok (roughly $0.03 per 15K-word doc); output runs higher at $1.98/Mtok but editing suggestions rarely exceed 5K tokens. The unknown: without public benchmarks on writing quality or instruction-following, you're betting on MoonshotAI's internal evals. If your clients tolerate a 10-doc pilot to validate editorial judgment, the context window advantage justifies the risk. Otherwise, stick with Claude 3.5 Sonnet where you have LMSYS scores to lean on.

Frequently asked

Is Kimi K2.5 good for long document analysis?

Yes. With a 262,144-token context window, Kimi K2.5 handles roughly 200,000 words in a single request—enough for multiple research papers, legal contracts, or codebases. The input pricing at $0.40/Mtok makes it economical for batch document processing compared to models charging $1+ per million input tokens.

Is Kimi K2.5 cheaper than GPT-4o or Claude Sonnet?

Input is cheaper—$0.40/Mtok versus $2.50-$3.00 for GPT-4o and Claude Sonnet. Output at $1.98/Mtok sits in the middle of the pack. For read-heavy workloads like summarisation or search, Kimi K2.5 costs less. For generation-heavy tasks, the output pricing narrows the gap.

Can Kimi K2.5 process images alongside text?

Yes, it supports multimodal input. You can upload images with text prompts for tasks like chart extraction, diagram analysis, or visual Q&A. No public benchmarks exist yet to compare its vision performance against GPT-4o or Gemini, so test your specific use case before committing.

How does Kimi K2.5 compare to earlier Kimi models?

MoonshotAI hasn't published detailed version-to-version comparisons. The K2.5 designation suggests incremental improvements over K2, likely in reasoning or instruction-following. Without benchmark data, you'll need to evaluate it directly against your existing workflows to measure any quality lift.

Should I use Kimi K2.5 for customer-facing chatbots?

Depends on your latency and language requirements. Kimi models historically perform well on Chinese-language tasks. For English-only applications, test response quality and speed against GPT-4o-mini or Claude Haiku. The large context window helps if your chatbot needs to reference long conversation histories or knowledge bases.