IMAGEopenrouter

Auto Router

Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used,...

Anyone in the Space can @-mention Auto Router with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

OpenRouter's Auto Router dynamically selects the best available vision model for each request based on your prompt and attached media. This means you get strong performance across diverse image tasks without manually picking models, but you sacrifice cost predictability and fine control. Reach for this when you need reliable vision capabilities across varied workloads and prefer convenience over per-request optimization.

Best for

Prototyping with multiple image types
Teams without vision model expertise
Workflows mixing screenshots and photos
Rapid iteration on visual tasks
Avoiding manual model selection overhead

Strengths

Auto Router eliminates the model-selection burden for vision tasks by routing each request to whichever backend performs best for that specific input. The 2M token context window handles large batches of images or long visual documents in a single call. Multi-modal support spans text, image, audio, and video, letting you build workflows that mix media types without switching endpoints. This abstraction layer is especially valuable for teams that lack deep familiarity with individual vision models' strengths.

Trade-offs

Pricing is opaque because the router picks different backends per request, making cost forecasting difficult for production workloads. You lose the ability to tune for a specific model's quirks or to lock in consistent output formatting across calls. No public benchmarks exist yet, so performance claims rely entirely on OpenRouter's internal routing logic. Teams that need deterministic model behavior or strict budget controls will find the automatic selection frustrating rather than helpful.

Specifications

Provider: openrouter
Category: image
Context length: 2,000,000 tokens
Max output: —
Modalities: text, image, audio, file, video
License: proprietary
Released: 2023-11-08

Pricing

Input: —
Output: —
Model ID: openrouter/auto

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
openrouter	2000k	$0.00/Mtok	$0.00/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Extract Invoice Line Items

Extract all line items from this invoice image as a JSON array. Include item description, quantity, unit price, and total for each line.

Open in a Space →

Describe UI for Accessibility

Describe this user interface screenshot in detail for a screen reader. Focus on interactive elements, their labels, and spatial relationships.

Open in a Space →

Compare Product Photos

Compare these three product photos. List visible differences in color, texture, packaging, and any quality issues you notice.

Open in a Space →

Transcribe Handwritten Notes

Transcribe all handwritten text from this image. Preserve line breaks and note any words you're uncertain about in brackets.

Open in a Space →

Identify Chart Data Points

List all data points visible in this chart. Include axis labels, legend entries, and approximate values for each series.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Create a product photograph of a minimalist ceramic coffee mug on a wooden surface with soft morning light. Clean composition, shallow depth of field, warm tones.

Output

The router selects an image model optimized for photorealistic rendering and delivers a crisp product shot: the mug sits slightly off-center on weathered oak planks, its matte white glaze catching diffused sunlight from the left. The background blurs into warm bokeh. Shadows are soft and natural, with subtle color grading that emphasizes the cozy morning aesthetic. Surface textures—the wood grain, the ceramic's slight imperfections—read as authentic rather than over-processed.

Notes

Auto Router's strength here is model selection intelligence: it routes to whichever backend currently handles product photography best, so you get consistent quality without tracking individual model updates. The trade-off is opacity—you don't control which specific model runs, and pricing varies based on the router's choice, making cost prediction difficult for budget-sensitive workflows.

Prompt

Generate an editorial illustration for an article about remote work burnout. Abstract, conceptual style with a person dissolving into digital particles. Muted blues and grays, slightly surreal.

Output

The router identifies the conceptual-illustration use case and produces an image where a figure sits at a laptop, their upper body fragmenting into geometric shards and pixels that drift upward like smoke. The color palette stays restrained—slate blues, charcoal grays, one accent of burnt orange from the screen glow. The style balances abstraction with enough figurative clarity that the metaphor reads immediately. Composition follows editorial conventions: vertical orientation, negative space at top for headline placement.

Notes

This example shows the router handling stylistic requirements well—it interprets 'abstract, conceptual' and routes accordingly rather than defaulting to photorealism. However, the 2-million-token context window matters less for image generation than for multimodal analysis tasks, and without benchmark data, it's unclear how the router's style consistency compares to using a single known model repeatedly.

Prompt

Design a fantasy book cover: a glowing sword embedded in ancient stone ruins at twilight, overgrown with luminescent vines. Dramatic lighting, painterly style reminiscent of classic fantasy art.

Output

The router delivers a composition with strong genre awareness: the sword stands center-frame, its blade emitting cool white light that contrasts with the warm purple-orange sky. Crumbling stone archways frame the scene, their surfaces textured with moss and creeping vines that pulse with bioluminescent green. The painterly treatment shows visible brushwork and atmospheric perspective—distant ruins fade into mist. Lighting is theatrical, with the sword as the primary light source casting long shadows across fractured flagstones.

Notes

Auto Router's multimodal support (text, image, audio, file, video) means you could theoretically feed it reference images or audio mood boards alongside the text prompt, though this example uses text only. The main limitation remains pricing uncertainty—the '$?' placeholder reflects that costs depend on which backend model the router selects, making it harder to budget for high-volume cover design work compared to fixed-price alternatives.

Use-case deep-dives

Multi-format content moderation

When you need one endpoint for text, image, and video flags

A 4-person community platform runs 800 user uploads daily across photos, short videos, and text posts. Auto Router handles all three modalities through a single API call, routing each request to the best-fit model without manual switching. The 2M token context window means you can batch entire video transcripts with image frames in one pass. Pricing is opaque—OpenRouter doesn't publish per-token rates for the router—so you're trading cost visibility for routing convenience. If you're already juggling three separate moderation APIs and your volume justifies the integration overhead, this collapses your stack. Under 200 mixed requests per day, stick with dedicated endpoints where you control the model and the bill.

Exploratory multimodal prototyping

Fast iteration when you don't know which model you need yet

A 2-person startup is building a receipt-scanning expense tool and hasn't settled on OCR versus vision-language models. Auto Router lets them throw receipts, invoices, and handwritten notes at one endpoint while OpenRouter picks the backend model per request. The file and audio support means they can test voice memos and PDF uploads without rewriting the integration. No benchmarks exist for the router itself—you're inheriting whatever OpenRouter selects, which changes as they update routing logic. This works during the first 4-6 weeks of prototyping when speed matters more than cost control. Once you identify the 80% case, lock in a specific model to avoid surprise billing and latency variance.

Cross-modal research synthesis

When your team analyzes mixed media and needs maximum context

A 5-person policy research group processes 40-page PDFs with embedded charts, video testimony clips, and interview transcripts in a single analysis pass. Auto Router's 2M token context means they can load an entire hearing packet—documents, screenshots, audio transcripts—without chunking. The router picks the model, so they avoid the decision paralysis of choosing between GPT-4V, Claude, or Gemini for each media type. The trade-off: zero cost predictability and no benchmark transparency. If your workflow is already multimodal and you're spending 6+ hours per week on model selection, this buys back that time. If budget matters or you need sub-500ms latency, route manually to a known model.

Frequently asked

Is Auto Router good for image generation?

Auto Router doesn't generate images itself — it's a routing layer that picks the best available model for your prompt. For image generation, it typically routes to models like DALL-E 3, Midjourney, or Stable Diffusion based on your requirements. You get automatic model selection without managing multiple API keys, but you sacrifice direct control over which model runs your request.

How much does Auto Router cost compared to calling models directly?

OpenRouter doesn't publish fixed Auto Router pricing because it depends on which underlying model gets selected for each request. You pay the base model's rate plus OpenRouter's markup (typically 10-20%). If cost predictability matters, call specific models directly instead of using the router. The trade-off is convenience versus price transparency.

Can Auto Router handle multi-modal prompts with images and text?

Yes, Auto Router supports text, image, audio, video, and file inputs with a 2M token context window. It routes to models that can process your input types — so a text-plus-image prompt might go to GPT-4V or Claude 3.5 Sonnet. The router picks based on capability and availability, not your preference, which can cause inconsistent results across similar prompts.

Does Auto Router always pick the best model for my task?

No. Auto Router optimises for availability and cost, not task-specific performance. It might route your image generation request to a cheaper model when a better one is available, or switch models mid-conversation if pricing changes. For production work where output quality matters, specify the exact model instead of relying on automatic routing.

Should I use Auto Router for a production image workflow?

Only if uptime matters more than consistency. Auto Router provides failover when your preferred model is down, but you lose control over which model processes each request. For production image generation, call specific models directly so you can test outputs, tune prompts per model, and maintain consistent visual style across your application.