LLManthropic

Anthropic: Claude Sonnet 4

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%),...

Anyone in the Space can @-mention Anthropic: Claude Sonnet 4 with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Claude Sonnet 4 delivers strong general-purpose performance at a mid-tier price point, handling text, images, and files with a massive 1M token context window. It sits between Haiku's speed and Opus's depth, making it the default choice for teams that need reliable reasoning without paying premium rates. The $15/Mtok output cost adds up on long responses, but the input pricing is competitive for document-heavy workflows. Reach for this when you need balanced capability across diverse tasks without specialized requirements.

Best for

  • Multi-document analysis under 1M tokens
  • Cost-sensitive vision tasks with images
  • General-purpose reasoning and drafting
  • File processing workflows with mixed content
  • Teams needing reliable mid-tier performance

Strengths

The 1M token context window handles entire codebases, long PDFs, or multi-file analysis in a single call. Multi-modal support processes screenshots, diagrams, and documents without separate pipelines. At $3/Mtok input, it undercuts GPT-4o and matches Claude Opus on cost while delivering faster responses than Opus. The model handles structured output, function calling, and extended conversations without degradation across the context range.

Trade-offs

Output pricing at $15/Mtok makes verbose responses expensive compared to GPT-4o Mini or Haiku for high-volume generation tasks. Without public benchmarks, performance relative to Sonnet 3.5 or GPT-4o remains unclear on specialized domains like code or math. Teams needing bleeding-edge reasoning should test against Opus or o1; those prioritizing speed over depth should consider Haiku. The proprietary license limits self-hosting or fine-tuning options.

Specifications

Provider
anthropic
Category
llm
Context length
1,000,000 tokens
Max output
64,000 tokens
Modalities
image, text, file
License
proprietary
Released
2025-05-22

Pricing

Input
$3.00/Mtok
Output
$15.00/Mtok
Model ID
anthropic/claude-sonnet-4

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$116.16
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
anthropic1000k$3.00/Mtok$15.00/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Multi-File Codebase Review

Review this codebase for architectural issues. Focus on coupling between modules, potential race conditions, and opportunities to simplify the data flow. Provide specific file and line references.
Open in a Space →

Contract Comparison Analysis

Compare these two contracts and identify all material differences in terms, obligations, and liability clauses. Create a table showing each difference with the relevant section numbers.
Open in a Space →

Screenshot to Implementation

Generate React components that match this screenshot. Include Tailwind classes for styling, proper component hierarchy, and placeholder data. Explain any design decisions you made.
Open in a Space →

Research Paper Synthesis

Synthesize the methodology and findings from these research papers. Identify areas of consensus, conflicting results, and gaps in the current research. Structure your response by theme.
Open in a Space →

Technical Documentation Draft

Write API documentation for this codebase. Include endpoint descriptions, request/response examples, authentication requirements, and common error scenarios. Use clear headings and code blocks.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential race conditions. It's meant to update a shared cache from multiple worker threads, but we're seeing occasional data corruption in production.

Output

The function has a classic check-then-act race condition. Between checking if the key exists (line 14) and writing the value (line 18), another thread can modify the same key. You need atomic operations here. Replace the if-check with a threading.Lock context manager around the entire read-modify-write sequence, or switch to a thread-safe data structure like queue.Queue. The current time.sleep() in the update path makes the race window even wider — that's likely why you're seeing corruption under load. Also note that dict operations aren't atomic in CPython despite the GIL; you can't rely on that for correctness.

Notes

Demonstrates Claude Sonnet 4's ability to identify concurrency bugs from code context alone, without needing a full reproduction case. The 1M token window means you can paste entire codebases for this kind of review. However, without public benchmarks, we can't quantify its accuracy against HumanEval or similar coding tests.

Prompt

I have 47 pages of legal discovery documents (PDFs). Extract every mention of 'Acme Corp' or 'Acme Industries', the date of the mention, and whether it's in a contract, email, or memo. Output as CSV.

Output

Claude Sonnet 4 would process all 47 PDFs in a single request (file modality support), then return a structured CSV with columns: Document_Name, Page, Entity_Name, Date_Mentioned, Document_Type, Context_Snippet. Each row represents one mention, with the surrounding sentence for verification. The model handles OCR inconsistencies (e.g., 'Acm e Corp' with a space) and disambiguates 'Acme' references that aren't the target entities. Output includes ~180 rows, sorted chronologically, ready for import into your case management system.

Notes

Showcases the 1M token context window and multi-file processing — a realistic legal or compliance workflow that would require chunking and reassembly with smaller models. The $3/$15 per Mtok pricing makes this expensive for high-volume document review compared to batch-optimised alternatives, but the single-pass accuracy is the trade-off.

Prompt

Explain how RLHF works to a product manager who understands A/B testing but has no ML background. They need to decide whether to invest in it for our chatbot. 3 paragraphs max.

Output

RLHF (Reinforcement Learning from Human Feedback) is like running an A/B test where humans grade the chatbot's answers, then the model learns to favor responses that score higher. Instead of one test, imagine thousands of micro-tests where the model tries different phrasings, gets feedback, and adjusts. Over time, it learns patterns: 'users prefer concise answers' or 'avoid hedging language in support contexts.' The investment trade-off: you need human raters (expensive, slow) and ML infrastructure to retrain the model continuously. If your chatbot handles high-stakes interactions (medical advice, financial guidance), RLHF reduces harmful outputs better than prompt engineering alone. If you're answering FAQs, the ROI is weaker. For your decision: start with a pilot. Collect 500 chatbot interactions, have your support team rate them, and see if the patterns are consistent enough to train on. If raters disagree more than 30% of the time, your feedback signal is too noisy for RLHF to work well.

Notes

Highlights Claude Sonnet 4's ability to translate technical concepts into business context without dumbing down the nuance. The image+text modality means you could include architecture diagrams in the prompt for even richer explanations. The response stays within the 3-paragraph constraint while delivering actionable advice, though the lack of benchmark data means we can't compare its explanation quality to GPT-4 or other instruction-tuned models.

Use-case deep-dives

Multi-document contract analysis

When Claude Sonnet 4 handles 200-page RFP reviews in one pass

A 12-person procurement team at a mid-size SaaS company needs to compare vendor proposals that arrive as 150-300 page PDFs with exhibits, pricing tables, and legal riders. Claude Sonnet 4's 1M token context window means the entire RFP stack fits in a single prompt—no chunking, no summary chains, no lost cross-references between sections. At $3/Mtok input, a 200-page review costs roughly $1.50 in tokens, which beats the 4 hours of paralegal time it replaces. The model handles native file uploads, so the team drops PDFs straight into Switchy without preprocessing. If your contracts are under 50 pages or you're running 500+ reviews per month, a cheaper model with retrieval makes more sense. For mid-volume, high-stakes document work where context matters, Sonnet 4 is the call.

Customer support ticket triage

Why Sonnet 4 works for support teams under 2,000 tickets/month

A 5-person support team at a B2B fintech startup uses Switchy to auto-tag and route inbound tickets from Intercom. Claude Sonnet 4 reads the ticket text, scans the last 10 messages of conversation history, checks against a 40-page internal knowledge base, and assigns priority + department in under 3 seconds. The image-handling means screenshots of error states get parsed inline without a separate OCR step. At 2,000 tickets/month and an average of 800 tokens per classification, the team spends about $5/month on input tokens. Output is minimal (just tags), so the $15/Mtok rate barely registers. The model's accuracy keeps manual overrides under 8%, which saves 6 hours/week. If you're above 5,000 tickets/month, switch to a faster, cheaper model and accept the 12% override rate.

Technical documentation generation

When Sonnet 4 turns API logs into onboarding guides for dev teams

A 3-person DevRel team at an infrastructure startup needs to write onboarding docs for a new API every 6 weeks. They feed Claude Sonnet 4 the OpenAPI spec (18,000 tokens), 40 example request/response pairs from production logs, and a style guide, then ask for a 4,000-word tutorial with code samples in Python and Go. The model's file-upload support means they drop the YAML spec directly into Switchy; the 1M token window means all examples stay in context so the narrative stays consistent. At $3 input and $15 output per Mtok, one doc costs about $0.15 in tokens and takes 90 seconds to draft. The team edits for 30 minutes and ships. If you're writing 50+ docs per month, the output cost adds up fast—consider a cheaper model and a human editing pass.

Frequently asked

Is Claude Sonnet 4 good for general-purpose work?

Yes. Sonnet 4 handles coding, analysis, writing, and reasoning tasks well. The 1M token context window means you can feed it entire codebases or long documents without chunking. It processes images and files natively, so you're not limited to text-only workflows. It's Anthropic's balanced model—faster and cheaper than Opus, smarter than Haiku.

Is Claude Sonnet 4 cheaper than GPT-4o?

No. GPT-4o costs $2.50 input and $10.00 output per Mtok; Sonnet 4 is $3.00 input and $15.00 output. You're paying 20% more for input and 50% more for output. The trade-off is Anthropic's longer context window and different reasoning style. If cost is the priority, GPT-4o wins.

Can Claude Sonnet 4 handle 1 million tokens in practice?

Yes, but expect slower responses and higher costs on large contexts. At $3 per million input tokens, a full 1M context costs $3 just to load. Anthropic's models maintain coherence across long contexts better than most alternatives, but you'll still see degraded performance on needle-in-haystack tasks past 500k tokens. Use it when you need the window; don't default to max context.

How does Sonnet 4 compare to Claude Opus 3.5?

Sonnet 4 is faster and cheaper than Opus 3.5, with similar context length. Opus 3.5 costs roughly 3x more per token and is slower, but handles the hardest reasoning and coding tasks better. If you're doing complex multi-step logic or need maximum accuracy, Opus justifies the cost. For most work, Sonnet 4 is the better default.

Should I use Claude Sonnet 4 for customer-facing chat?

Yes, if you need multimodal support and long context. Sonnet 4's latency is reasonable for chat, and it handles images and files without preprocessing. The $15/Mtok output cost adds up fast in high-volume chat, though—a 500-word response costs about $0.01. Budget accordingly. For text-only chat with shorter contexts, GPT-4o-mini is cheaper.

Data last verified 7 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.