LLMmistralai

Mistral: Mistral Medium 3.5

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...

Anyone in the Space can @-mention Mistral: Mistral Medium 3.5 with the team's shared context — pooled credits, one chat, one memory.

All models

Starter is free forever — 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Mistral Medium 3.5 offers a compelling middle ground between cost and capability, with a massive 262K token context window at $1.50/$7.50 per Mtok — roughly half the price of GPT-4o. Vision support adds versatility for document and screenshot workflows. Without public benchmarks, you're trading proven performance data for early access to Mistral's latest mid-tier architecture. Best for teams that need long-context reasoning and vision on a budget, and are comfortable validating outputs against their own use cases.

Best for

  • Long-context document analysis under budget
  • Vision tasks on screenshots and diagrams
  • Cost-sensitive multilingual workflows
  • Rapid prototyping with large context needs
  • Teams already invested in Mistral ecosystem

Strengths

The 262K context window handles entire codebases, legal documents, or multi-chapter manuscripts in a single call. Vision support covers common document and UI analysis tasks without needing separate OCR pipelines. Pricing undercuts GPT-4o and Claude Sonnet by 40-50% on input tokens, making it viable for high-volume applications. Mistral's European heritage often translates to stronger multilingual performance, especially on French, German, and Spanish.

Trade-offs

No public benchmarks means you're flying blind on reasoning depth, code generation accuracy, and instruction-following compared to Claude Sonnet 4.5 or GPT-4o. Mistral models historically lag OpenAI and Anthropic on complex multi-step reasoning and nuanced creative writing. Vision capabilities are newer and less battle-tested than GPT-4o's. Output pricing at $7.50/Mtok climbs quickly for verbose responses. Early adopters should budget time for prompt tuning and output validation.

Specifications

Provider
mistralai
Category
llm
Context length
262,144 tokens
Max output
Modalities
text, image
License
proprietary
Released
2026-04-30

Pricing

Input
$1.50/Mtok
Output
$7.50/Mtok
Model ID
mistralai/mistral-medium-3-5

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool — one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$58.08
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool — one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
mistralai262k$1.50/Mtok$7.50/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Screenshot to Requirements

Analyze this screenshot and generate a technical requirements document. Include: UI components visible, user interactions implied, data fields needed, validation rules, and any accessibility considerations.
Open in a Space →

Multilingual Support Triage

Read this customer message and provide: detected language, sentiment (positive/neutral/negative), issue category, urgency level (low/medium/high), and a one-sentence summary in English.
Open in a Space →

Research Paper Synthesis

Summarize this research paper in three sections: core hypothesis and methodology, key findings with supporting data, and limitations or open questions. Use technical language appropriate for a graduate-level audience.
Open in a Space →

Example outputs

Illustrative — representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential bugs and suggest improvements. Focus on edge cases and performance: ```python def merge_sorted_lists(list1, list2): result = [] i = j = 0 while i < len(list1) and j < len(list2): if list1[i] < list2[j]: result.append(list1[i]) i += 1 else: result.append(list2[j]) j += 1 return result ```

Output

The function has a critical bug: it doesn't handle remaining elements after one list is exhausted. When the while loop exits, elements from the longer list are discarded. Fix by adding: ```python result.extend(list1[i:]) result.extend(list2[j:]) ``` Edge cases to consider: empty lists (currently handled correctly), lists with duplicate values (works fine), and single-element lists (works after the fix). Performance is O(n+m) which is optimal for this problem. The repeated `len()` calls in the loop condition are negligible in Python 3.x due to caching, but you could pre-compute them if profiling shows it matters.

Notes

This example shows Mistral Medium's code review capabilities with the 262K token context window supporting larger codebases. The response correctly identifies the bug and provides actionable fixes. However, without published benchmarks, it's unclear how this model compares to specialized code models on complex refactoring tasks or multi-file analysis.

Prompt

I'm designing a SaaS dashboard. Describe a layout for a metrics overview page that displays: monthly recurring revenue, churn rate, active users, and conversion funnel. The design should work on desktop and be data-dense but scannable.

Output

Use a 2×2 grid layout for the four primary metrics, each in a card with the metric value prominent (48px bold), label above (14px), and a sparkline showing 90-day trend below. Place MRR top-left as the anchor metric. Below the grid, add a full-width conversion funnel visualization using a horizontal bar chart with percentage drop-offs between stages. Use a single accent color (suggest blue) with opacity variations to show depth. For scannability: align all numbers right, use consistent spacing (24px between cards), and add subtle dividers. Include a date range selector in the top-right corner. The layout stays readable down to 1280px width without horizontal scroll.

Notes

Mistral Medium handles multimodal input, making it suitable for design discussions that reference visual concepts. The response is structured and specific. The trade-off: without image generation capabilities, users must translate these descriptions into mockups themselves, unlike models that can produce visual outputs directly.

Prompt

Analyze this customer support transcript and extract: the customer's core issue, any product bugs mentioned, sentiment progression, and recommended next steps. Transcript: [3,200-word conversation between customer and support agent about billing discrepancies, failed payment retries, and confusion over pro-rated charges]

Output

**Core Issue:** Customer was charged $47 instead of expected $29 after downgrading from annual to monthly plan mid-cycle. **Product Bugs Identified:** 1. Payment retry logic attempted charges 4 times in 2 hours instead of the documented 24-hour intervals 2. Downgrade confirmation email didn't explain pro-rating calculation **Sentiment Progression:** Started frustrated (mentions "third time contacting support"), became confused during pro-rating explanation, ended cautiously satisfied after refund offer. **Recommended Next Steps:** Issue $18 refund, update downgrade email template to include pro-rating formula, review payment retry timing in billing service, follow up in 48 hours to confirm resolution.

Notes

The 262K token context window makes Mistral Medium well-suited for analyzing long-form documents like support transcripts, legal contracts, or research papers. This example demonstrates structured extraction from conversational data. The $1.50/$7.50 per Mtok pricing means processing a 3,200-word transcript costs under $0.02, making it economical for high-volume document analysis workflows.

Use-case deep-dives

Multi-document contract analysis

When 262K context beats chaining for legal doc review

A 4-person legal ops team processing vendor agreements needs to compare clauses across 8-12 contracts simultaneously. Mistral Medium 3.5's 262,144-token window fits roughly 200,000 words—enough to load an entire contract portfolio in one prompt without chunking or retrieval overhead. At $1.50/Mtok input, a 200K-token analysis costs $0.30, making it cheaper than running sequential calls through smaller-context models that need summarization passes. The trade-off: if your contracts average under 40 pages each and you're only comparing 2-3 at a time, a 128K model saves you half the cost. Use this when you're routinely cross-referencing 6+ documents and need the model to hold all context without lossy summarization.

High-volume customer support triage

Cost-effective ticket routing for mid-scale support teams

A 12-person SaaS support team handling 800 tickets daily needs to auto-categorize and route incoming requests. Mistral Medium 3.5's $1.50 input pricing means a 500-token ticket costs $0.00075 to classify—$0.60 per 800 tickets, or $18/month at that volume. The model's multimodal capability handles screenshot attachments without a separate vision API call. The threshold: if you're processing under 200 tickets/day, the setup overhead isn't worth it; above 500/day, you're saving $40-60/month versus $3/Mtok alternatives. Without public benchmarks, validate classification accuracy on your ticket taxonomy during a 2-week pilot before committing to production routing.

Long-session creative brainstorming

When extended context memory beats conversation stitching

A 3-person content studio runs 90-minute brand strategy sessions where the AI needs to remember 40+ ideas, client feedback, and evolving direction without losing thread. Mistral Medium 3.5's 262K window holds an entire session transcript (roughly 65,000 words) plus reference docs—no mid-session summarization that flattens nuance. At $7.50/Mtok output, a 5,000-token synthesis response costs $0.0375, manageable for weekly sessions. The catch: if your sessions are under 30 minutes or you're fine with periodic context resets, a 32K model at half the output cost is smarter. Choose this when session continuity and callback to early ideas matter more than per-token cost, and you're running fewer than 20 sessions/month.

Frequently asked

Is Mistral Medium 3.5 good for general text tasks?

Yes, Mistral Medium 3.5 handles general text work well—drafting, summarization, Q&A, light reasoning. It sits between Mistral's small and large tiers, so you get decent quality without paying flagship prices. The 262k token context window means you can throw entire codebases or long documents at it without chunking.

Is Mistral Medium 3.5 cheaper than GPT-4o or Claude Sonnet?

Yes, significantly. At $1.50 input and $7.50 output per million tokens, Mistral Medium 3.5 undercuts GPT-4o ($2.50/$10) and Claude 3.5 Sonnet ($3/$15). If you're running high-volume workflows where cost matters more than bleeding-edge reasoning, this is a strong pick.

Can Mistral Medium 3.5 handle image inputs reliably?

It supports image inputs, but Mistral hasn't published vision benchmarks for this model. Expect basic image understanding—OCR, simple scene description—but don't rely on it for complex visual reasoning or fine-grained object detection. For serious vision work, use GPT-4o or Claude 3.5 Sonnet instead.

How does Mistral Medium 3.5 compare to Mistral Large?

Mistral Large costs more ($2/$6 per Mtok) and delivers stronger reasoning and coding performance. Medium 3.5 is the budget option when you need Mistral's speed and context window but can tolerate slightly weaker outputs. Without public benchmarks, you're trading proven capability for lower cost.

Should I use Mistral Medium 3.5 for production chatbots?

Only if cost is your primary constraint. The lack of public benchmarks means you're flying blind on quality versus alternatives. Test it thoroughly against your use case first. If users notice worse responses compared to GPT-4o mini or Claude Haiku, the cost savings won't matter.

Data last verified 22 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.