Inception: Mercury 2
Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving...
Anyone in the Space can @-mention Inception: Mercury 2 with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- High-volume document summarization on budget
- Internal tooling and workflow automation
- Cost-sensitive chatbot backends
- Batch processing of structured data
- Prototyping before scaling to premium models
Strengths
Mercury 2's core strength is economic efficiency paired with a generous 128K context window. This combination makes it viable for processing entire codebases, long PDFs, or multi-document analysis without the per-token costs that make frontier models prohibitive at scale. The pricing structure favors input-heavy workloads — reading is cheap, generation costs more but remains reasonable. For teams running thousands of requests daily on structured tasks, the cost savings compound quickly without requiring architectural changes.
Trade-offs
Without public benchmarks, Mercury 2's performance relative to peers remains opaque. Early adopters report it handles straightforward extraction and summarization well but struggles with complex reasoning, nuanced instruction-following, and creative tasks where GPT-4o or Claude Sonnet excel. The lack of multimodal support limits use cases compared to vision-capable alternatives at similar price points. Output quality on ambiguous prompts tends to be inconsistent — you'll need tighter prompt engineering than with more capable models.
Specifications
- Provider
- inception
- Category
- llm
- Context length
- 128,000 tokens
- Max output
- 50,000 tokens
- Modalities
- text
- License
- proprietary
- Released
- 2026-03-04
Pricing
- Input
- $0.25/Mtok
- Output
- $0.75/Mtok
- Model ID
inception/mercury-2
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| inception | 128k | $0.25/Mtok | $0.75/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Extract Invoice Line Items
Extract all line items from this invoice into a JSON array. Each entry should include description, quantity, unit_price, and total. Return only valid JSON with no additional commentary.Open in a Space →
Summarize Support Tickets
Read this support ticket thread and provide a 3-sentence summary covering: the customer's core issue, any troubleshooting already attempted, and the next recommended action.Open in a Space →
Generate FAQ Answers
Write a clear, helpful FAQ answer to this question in 2-3 paragraphs. Use simple language, avoid jargon, and include one concrete example if relevant.Open in a Space →
Compare Document Versions
Compare these two document versions and list the 5 most significant changes. For each change, note the section affected and whether it's an addition, deletion, or modification.Open in a Space →
Classify Support Requests
Classify this support request into exactly one category: billing, technical, feature_request, or general_inquiry. Return only the category name, nothing else.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Review this Python function for potential race conditions. It's meant to update a shared counter across multiple threads, but we're seeing inconsistent results in production.
The function lacks thread synchronization. When multiple threads read `counter.value`, increment it, then write back, you get classic read-modify-write races. Thread A reads 5, Thread B reads 5, both increment to 6, both write 6—you've lost an increment. Wrap the read-increment-write sequence in a `threading.Lock()` context manager, or switch to `threading.local()` if each thread needs its own counter. The 128K context window here means you can paste your entire module with test cases, and I'll trace the exact interleaving that causes your production bug.
Mercury 2's extended context window (128K tokens) lets users submit large codebases for holistic analysis rather than isolated snippets. The response demonstrates strong concurrency reasoning. However, at $0.75/Mtok output, verbose explanations add cost—teams doing high-volume code review may prefer terser models for routine checks.
Summarize the last six months of meeting notes and pull out recurring blockers for the infrastructure team. I'm attaching 47 documents totaling about 95,000 tokens.
Three blockers dominate: (1) Database migration tooling—mentioned in 18 meetings, unresolved since March due to vendor API changes. (2) CI pipeline flakiness—12 mentions, root cause identified (Docker layer caching) but fix deprioritized twice. (3) On-call rotation gaps—9 mentions, staffing issue compounded by two departures in Q2. The database migration is your critical path; the CI issue has a known fix; the on-call problem needs hiring or process change. I've cross-referenced action items against subsequent meetings to flag which commitments went unaddressed.
This example highlights Mercury 2's ability to synthesize long-form documents within its 128K context window—no chunking or retrieval layer needed. The structured output and cross-referencing show strong information extraction. The trade-off: input cost is $0.25/Mtok, so a 95K-token corpus costs ~$0.024 per query, which adds up for frequent retrospectives.
I'm drafting a technical RFC for migrating our monolith to microservices. What are the top three arguments against this move that I should address preemptively?
(1) **Operational complexity**: You're trading one deployment for dozens. Monitoring, logging, distributed tracing, and incident response all become harder. If your team lacks strong DevOps maturity, you'll spend more time firefighting than shipping features. (2) **Latency and failure modes**: Network calls replace function calls. A chain of five services means five potential timeout points. Partial failures require sophisticated retry logic and circuit breakers—monoliths fail atomically. (3) **Data consistency**: Distributed transactions are notoriously difficult. If your domain has strong consistency requirements (financial ledgers, inventory), you'll need sagas or event sourcing, both of which add cognitive load and debugging surface area. Address these with concrete mitigation plans—don't handwave them.
Mercury 2 delivers balanced, skeptical analysis rather than cheerleading the user's premise—useful for RFCs that need to withstand peer review. The response is opinionated but grounded in real trade-offs. Without public benchmarks, it's unclear how Mercury 2 compares to peers on reasoning tasks, so teams may want to trial it against known-strong models before committing to critical decision-support workflows.
Use-case deep-dives
When Mercury 2 makes sense for high-volume blog production
A 4-person content studio pushing 80 blog drafts per week needs something cheaper than GPT-4o but more capable than 3.5. Mercury 2 hits that slot at $0.25/$0.75 per Mtok—you'll spend roughly $12-15/month on a typical draft workload (3k prompt, 1.5k output per piece). The 128k context window handles full brand guides and competitor research in one pass, which matters when you're templating at scale. The catch: no public benchmarks means you're flying blind on accuracy until you test it yourself. If your editorial process already includes human review and you're optimizing for cost per draft rather than first-pass quality, run a 50-piece pilot. If more than 15% need major rewrites, the labor cost will eat the savings and you should move to a benchmarked alternative.
Mercury 2 for turning Slack threads into wiki entries
A 12-person engineering team wants to auto-generate wiki summaries from weekly Slack decision threads—each thread runs 8k-15k tokens with code snippets and links. Mercury 2's 128k window means you can dump an entire week's worth of threads (60k-80k tokens) and ask for a structured summary in one call, no chunking required. At $0.25 input, processing 80k tokens costs $0.02; even at 200 threads/month you're under $5. The risk is accuracy on technical details without benchmark proof, but since a human engineer reviews every summary before it goes live, the workflow absorbs errors naturally. This works if your review step is fast (under 3 minutes per summary). If you're trying to publish summaries without human checks, the lack of verified benchmark scores makes this a poor fit—switch to a model with proven MMLU or HumanEval numbers.
When to skip Mercury 2 for real-time support routing
A 20-person SaaS company fields 300 support tickets daily and wants to auto-route them by urgency and category. Mercury 2's pricing ($0.25/$0.75) looks attractive—routing 300 tickets at 1k prompt + 200 token classification output costs about $0.11/day or $3.30/month. The problem isn't cost, it's confidence. Without public benchmarks, you can't predict accuracy on multi-label classification under pressure, and a 10% misroute rate (30 tickets/day to the wrong team) will cost more in customer frustration than you save on the model. If you're running this in a pilot with a human safety net reviewing every route for the first two weeks, Mercury 2 is worth testing. If you need day-one production reliability, choose a model with published F1 scores on classification tasks—Claude 3.5 Haiku or GPT-4o-mini both cost more but ship with proven accuracy.
Frequently asked
Is Inception Mercury 2 good for general text tasks?
Mercury 2 handles standard text generation, summarization, and Q&A reasonably well at its price point. With a 128K context window, it can process moderately long documents. However, without public benchmarks, you're buying blind—no MMLU, HumanEval, or MT-Bench scores to validate quality claims. Consider models with proven performance data if accuracy matters.
Is Mercury 2 cheaper than GPT-4o or Claude Sonnet?
Yes, significantly. At $0.25/$0.75 per Mtok, Mercury 2 costs roughly 90% less than GPT-4o ($2.50/$10.00) and 95% less than Claude Sonnet 3.5 ($3.00/$15.00). You're trading proven capability for budget savings. If your use case tolerates lower accuracy or you're prototyping, the price advantage is real.
Can Mercury 2 handle 128K tokens in practice?
The 128K context window matches GPT-4 Turbo's limit, so it can technically ingest a 300-page document. Real-world performance depends on how well the model maintains coherence across that span—something public benchmarks would reveal. Expect degraded accuracy past 64K tokens without needle-in-haystack test results to prove otherwise.
How does Mercury 2 compare to other budget LLMs?
Without benchmarks, direct comparison is impossible. Models like Llama 3.1 70B or Mixtral 8x22B publish MMLU and coding scores, letting you assess capability per dollar. Mercury 2's pricing is competitive, but you're gambling on undocumented quality. Request sample outputs or run your own evals before committing production workloads.
Should I use Mercury 2 for customer-facing chatbots?
Only if you can afford unpredictable responses. Customer-facing deployments demand reliability—hallucination rates, instruction-following accuracy, safety guardrails. Mercury 2 provides none of that transparency. Use it for internal tools, content drafts, or low-stakes automation where mistakes don't damage brand trust. Test extensively before exposing to users.