Xiaomi: MiMo-V2-Omni
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...
Anyone in the Space can @-mention Xiaomi: MiMo-V2-Omni with the team's shared context — pooled credits, one chat, one memory.
Starter is free forever — 1 Space, 100 credits/month, 1 MCP. No card.
Specifications
- Provider
- xiaomi
- Category
- llm
- Context length
- 262,144 tokens
- Max output
- 65,536 tokens
- Modalities
- text, audio, image, video
- License
- proprietary
- Released
- 2026-03-18
Pricing
- Input
- $0.40/Mtok
- Output
- $2.00/Mtok
- Model ID
xiaomi/mimo-v2-omni
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool — one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool — one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| xiaomi | 262k | $0.40/Mtok | $2.00/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.