# Xiaomi: MiMo-V2-Omni

Provider: xiaomi  
Category: llm  
Model ID: `xiaomi/mimo-v2-omni`

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

## Specs

- Context length: 262144 tokens
- Max output: 65536 tokens
- Modalities: text, audio, image, video
- Released: 2026-03-18

## Pricing

- Input: $0.40 per million tokens
- Output: $2.00 per million tokens

## Providers

- **xiaomi** — ctx 262144, input $0.40/M, output $2.00/M

---
Last verified: 2026-04-23T23:46:29.618Z  
Canonical URL: https://switchy.build/models/mimo-v2-omni