Mistral: Mistral Medium 3.5
Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...
Anyone in the Space can @-mention Mistral: Mistral Medium 3.5 with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Budget-conscious long-context analysis
- Multimodal tasks with cost constraints
- Document processing with vision needs
- Teams migrating from Mistral Small
Strengths
The 262K context window handles full codebases and lengthy documents without chunking. Multimodal support covers text, images, and file uploads in a single request. Input pricing at $1.50 per Mtok runs 40% cheaper than GPT-4o while maintaining vision capabilities. The model slots naturally into existing Mistral workflows for teams already using their API, with consistent prompt formatting across their lineup.
Trade-offs
Absence of public benchmarks makes capability assessment difficult compared to Claude or GPT-4 where MMLU, HumanEval, and reasoning scores are documented. As a mid-tier offering, expect it to trail Mistral Large on complex reasoning and code generation tasks. Output pricing at $7.50 per Mtok climbs higher than input, so verbose generation tasks eat budget quickly. Vision performance likely lags specialized models like GPT-4o or Claude Sonnet for detailed image analysis.
Specifications
- Provider
- mistralai
- Category
- llm
- Context length
- 262,144 tokens
- Max output
- —
- Modalities
- text, image, file
- License
- proprietary
- Released
- 2026-04-30
Pricing
- Input
- $1.50/Mtok
- Output
- $7.50/Mtok
- Model ID
mistralai/mistral-medium-3-5
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| mistralai | 262k | $1.50/Mtok | $7.50/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Analyze Multi-Page Contract
Review this contract and create a structured summary with: parties involved, key obligations for each party, payment terms, termination clauses, and any unusual provisions. Flag ambiguous language.Open in a Space →
Screenshot to Code
Generate React component code for this UI screenshot. Include responsive layout, appropriate semantic HTML, and Tailwind classes. Add comments explaining layout decisions.Open in a Space →
Research Paper Synthesis
I'm uploading three research papers on the same topic. Identify the main argument of each, compare their methodologies, highlight where they agree or conflict, and suggest gaps in the current research.Open in a Space →
Codebase Documentation Audit
Analyze this codebase and identify: functions lacking docstrings, modules without README files, complex logic needing inline comments, and inconsistent documentation patterns. Prioritize by impact on maintainability.Open in a Space →
Invoice Data Extraction
Extract from this invoice image: vendor name, invoice number, date, line items with descriptions and amounts, subtotal, tax, and total. Return as JSON with consistent field names.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Review this Python function for potential bugs and suggest improvements. Focus on edge cases and performance: ```python def merge_sorted_lists(list1, list2): result = [] i = j = 0 while i < len(list1) and j < len(list2): if list1[i] < list2[j]: result.append(list1[i]) i += 1 else: result.append(list2[j]) j += 1 return result ```
The function has a critical bug: it doesn't handle remaining elements after one list is exhausted. When the while loop exits, elements from the longer list are discarded. Fix by adding: ```python result.extend(list1[i:]) result.extend(list2[j:]) ``` Edge cases to consider: empty lists (currently handled correctly), lists with duplicate values (works fine), and single-element lists (works after the fix). Performance is O(n+m) which is optimal for this problem. The repeated `len()` calls in the loop condition are negligible in Python 3.x due to caching, but you could pre-compute them if profiling shows it matters.
This example shows Mistral Medium's code review capabilities with the 262K token context window supporting larger codebases. The response correctly identifies the bug and provides actionable fixes. However, without published benchmarks, it's unclear how this model compares to specialized code models on complex refactoring tasks or multi-file analysis.
I'm designing a SaaS dashboard. Describe a layout for a metrics overview page that displays: monthly recurring revenue, churn rate, active users, and conversion funnel. The design should work on desktop and be data-dense but scannable.
Use a 2×2 grid layout for the four primary metrics, each in a card with the metric value prominent (48px bold), label above (14px), and a sparkline showing 90-day trend below. Place MRR top-left as the anchor metric. Below the grid, add a full-width conversion funnel visualization using a horizontal bar chart with percentage drop-offs between stages. Use a single accent color (suggest blue) with opacity variations to show depth. For scannability: align all numbers right, use consistent spacing (24px between cards), and add subtle dividers. Include a date range selector in the top-right corner. The layout stays readable down to 1280px width without horizontal scroll.
Mistral Medium handles multimodal input, making it suitable for design discussions that reference visual concepts. The response is structured and specific. The trade-off: without image generation capabilities, users must translate these descriptions into mockups themselves, unlike models that can produce visual outputs directly.
Analyze this customer support transcript and extract: the customer's core issue, any product bugs mentioned, sentiment progression, and recommended next steps. Transcript: [3,200-word conversation between customer and support agent about billing discrepancies, failed payment retries, and confusion over pro-rated charges]
**Core Issue:** Customer was charged $47 instead of expected $29 after downgrading from annual to monthly plan mid-cycle. **Product Bugs Identified:** 1. Payment retry logic attempted charges 4 times in 2 hours instead of the documented 24-hour intervals 2. Downgrade confirmation email didn't explain pro-rating calculation **Sentiment Progression:** Started frustrated (mentions "third time contacting support"), became confused during pro-rating explanation, ended cautiously satisfied after refund offer. **Recommended Next Steps:** Issue $18 refund, update downgrade email template to include pro-rating formula, review payment retry timing in billing service, follow up in 48 hours to confirm resolution.
The 262K token context window makes Mistral Medium well-suited for analyzing long-form documents like support transcripts, legal contracts, or research papers. This example demonstrates structured extraction from conversational data. The $1.50/$7.50 per Mtok pricing means processing a 3,200-word transcript costs under $0.02, making it economical for high-volume document analysis workflows.
Use-case deep-dives
When 262K context beats chaining for legal doc review
A 4-person legal ops team processing vendor agreements needs to compare clauses across 8-12 contracts simultaneously. Mistral Medium 3.5's 262,144-token window fits roughly 200,000 words—enough to load an entire contract portfolio in one prompt without chunking or retrieval overhead. At $1.50/Mtok input, a 200K-token analysis costs $0.30, making it cheaper than running sequential calls through smaller-context models that need summarization passes. The trade-off: if your contracts average under 40 pages each and you're only comparing 2-3 at a time, a 128K model saves you half the cost. Use this when you're routinely cross-referencing 6+ documents and need the model to hold all context without lossy summarization.
Cost-effective ticket routing for mid-scale support teams
A 12-person SaaS support team handling 800 tickets daily needs to auto-categorize and route incoming requests. Mistral Medium 3.5's $1.50 input pricing means a 500-token ticket costs $0.00075 to classify—$0.60 per 800 tickets, or $18/month at that volume. The model's multimodal capability handles screenshot attachments without a separate vision API call. The threshold: if you're processing under 200 tickets/day, the setup overhead isn't worth it; above 500/day, you're saving $40-60/month versus $3/Mtok alternatives. Without public benchmarks, validate classification accuracy on your ticket taxonomy during a 2-week pilot before committing to production routing.
When extended context memory beats conversation stitching
A 3-person content studio runs 90-minute brand strategy sessions where the AI needs to remember 40+ ideas, client feedback, and evolving direction without losing thread. Mistral Medium 3.5's 262K window holds an entire session transcript (roughly 65,000 words) plus reference docs—no mid-session summarization that flattens nuance. At $7.50/Mtok output, a 5,000-token synthesis response costs $0.0375, manageable for weekly sessions. The catch: if your sessions are under 30 minutes or you're fine with periodic context resets, a 32K model at half the output cost is smarter. Choose this when session continuity and callback to early ideas matter more than per-token cost, and you're running fewer than 20 sessions/month.
Frequently asked
Is Mistral Medium 3.5 good for general text tasks?
Yes, Mistral Medium 3.5 handles general text work well—drafting, summarization, Q&A, light reasoning. It sits between Mistral's small and large tiers, so you get decent quality without paying flagship prices. The 262k token context window means you can throw entire codebases or long documents at it without chunking.
Is Mistral Medium 3.5 cheaper than GPT-4o or Claude Sonnet?
Yes, significantly. At $1.50 input and $7.50 output per million tokens, Mistral Medium 3.5 undercuts GPT-4o ($2.50/$10) and Claude 3.5 Sonnet ($3/$15). If you're running high-volume workflows where cost matters more than bleeding-edge reasoning, this is a strong pick.
Can Mistral Medium 3.5 handle image inputs reliably?
It supports image inputs, but Mistral hasn't published vision benchmarks for this model. Expect basic image understanding—OCR, simple scene description—but don't rely on it for complex visual reasoning or fine-grained object detection. For serious vision work, use GPT-4o or Claude 3.5 Sonnet instead.
How does Mistral Medium 3.5 compare to Mistral Large?
Mistral Large costs more ($2/$6 per Mtok) and delivers stronger reasoning and coding performance. Medium 3.5 is the budget option when you need Mistral's speed and context window but can tolerate slightly weaker outputs. Without public benchmarks, you're trading proven capability for lower cost.
Should I use Mistral Medium 3.5 for production chatbots?
Only if cost is your primary constraint. The lack of public benchmarks means you're flying blind on quality versus alternatives. Test it thoroughly against your use case first. If users notice worse responses compared to GPT-4o mini or Claude Haiku, the cost savings won't matter.