LLManthropic

Anthropic: Claude Sonnet 4.5

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with...

Anyone in the Space can @-mention Anthropic: Claude Sonnet 4.5 with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Claude Sonnet 4.5 is the default model on Switchy because it wins the calm middle of the curve — strong coding, strong reasoning, not the cheapest token but never the expensive mistake. What we actually notice in Spaces: Sonnet doesn't need spoon-feeding. Hand it a three-file diff and it finds the failure mode; hand it a messy PRD and it turns it into a plan you can ship. It holds 200k tokens of context without getting wobbly past 100k the way some competitors do. Best for: code review and bug investigation; long PRD synthesis; legal and compliance drafting where "close enough" isn't; architecture discussions where being wrong has a blast radius; any task where you'd rather pay a bit more and not re-ask the question. Avoid for: throwaway one-liners (Haiku is cheaper and fast enough); high-volume classification where you're running millions of calls (use the cheapest thing that clears the bar). Pricing frame: at $3/Mtok in, $15/Mtok out, a 5-person team running 200 messages a day with 30% output ratio lands around $95-110/month. That's the cost of one Pro seat per person — for a team, not per seat.

Best for

Production chatbots with complex logic
Multi-document research synthesis
Full-codebase refactoring and analysis
Long-form content generation under budget
Vision tasks on technical diagrams

Strengths

The million-token context window handles entire codebases or document sets in a single call, eliminating chunking overhead. Multimodal support processes screenshots, diagrams, and PDFs alongside text without separate tooling. At $15/Mtok output, it costs 67% less than Claude Opus while maintaining strong reasoning on structured tasks like code review, data extraction, and technical writing. Response quality stays consistent across the full context range, avoiding the degradation seen in some long-context competitors.

Trade-offs

Falls behind GPT-4o and Claude Opus on nuanced creative writing and ambiguous reasoning tasks where subtle interpretation matters. Latency runs higher than GPT-4o Mini or Gemini Flash for simple queries, making it inefficient for high-throughput classification or extraction where a smaller model would suffice. The $3 input cost adds up quickly on iterative workflows with large context reuse—consider caching strategies or switching to a cheaper model for early-stage drafting.

Specifications

Provider: anthropic
Category: llm
Context length: 1,000,000 tokens
Max output: 64,000 tokens
Modalities: text, image, file
License: LicenseRef-Anthropic-Commercial
Released: 2025-09-29

Pricing

Input: $3.00/Mtok
Output: $15.00/Mtok
Model ID: anthropic/claude-sonnet-4.5

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$116.16

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
anthropic	1000k	$3.00/Mtok	$15.00/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Review a diff

Paste a unified diff and ask:

> Review this diff the way a senior engineer would. Flag bugs, risky
> logic, and anything that's going to confuse a reader six months from
> now. Don't comment on style — I have a linter for that.

Open in a Space →

Turn a PRD into a plan

> Here's a PRD. Break it into implementation phases. For each phase,
> list the files that will change, the database migrations, and the
> first test I should write. Call out anything in the PRD that's
> underspecified.

Open in a Space →

Write the commit message I should have written

> Here's the diff for my next commit. Write a commit message that
> follows the format I see in the last 5 commits in this repo. Lead
> with the why, not the what.

Open in a Space →

Reverse-engineer a schema

> I'm looking at this Prisma schema for a codebase I'm new to. Explain
> what each model is for, the invariants, and the 3-4 queries you'd
> expect the app to run most often.

Open in a Space →

Plan a refactor

> This file has grown to 800 lines and I hate touching it. Propose 3
> refactor options: surgical, middle-ground, and ambitious. For each,
> tell me what breaks and what it buys me.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential bugs. It's supposed to batch API requests but sometimes drops the last few items: ```python def batch_requests(items, batch_size=100): batches = [] for i in range(0, len(items), batch_size): batches.append(items[i:i+batch_size]) return batches[:len(items)//batch_size] ```

Output

The bug is in the return statement. You're slicing `batches` to `len(items)//batch_size`, which discards any partial batch at the end. If you have 250 items with batch_size=100, you create 3 batches: [0:100], [100:200], [200:250]. But `250//100 = 2`, so you return only the first 2 batches, dropping the last 50 items. Fix: just return `batches` directly. The list comprehension already handles partial batches correctly. The slicing logic is unnecessary and wrong.

Notes

This example shows Claude Sonnet 4.5's code review capability — it identifies the subtle off-by-one error and explains the math clearly. The 1M token context window means you could paste entire codebases for review, though response quality at maximum context hasn't been independently benchmarked yet.

Prompt

I'm writing a grant proposal for a community garden project. Help me draft a 150-word paragraph explaining why urban green spaces improve mental health, citing the stress-reduction hypothesis. Keep it accessible for a general audience but credible enough for a city council review.

Output

Urban green spaces offer measurable mental health benefits through what researchers call the stress-reduction hypothesis. When people spend time in natural settings — even small parks or community gardens — their bodies show lower cortisol levels and reduced heart rate variability, both markers of stress. The theory suggests that natural environments provide a restorative contrast to urban stimuli: instead of processing traffic, crowds, and screens, our attention can rest on organic patterns like leaves, soil, and growth cycles. For community garden participants specifically, this effect compounds with the benefits of physical activity, social connection, and purposeful work. Studies across multiple cities show that residents with access to green spaces report fewer symptoms of anxiety and depression, making gardens not just recreational amenities but public health infrastructure.

Notes

Demonstrates the model's ability to hit a specific word count while balancing technical accuracy with readability. The output maintains appropriate hedging ('theory suggests', 'studies show') rather than overstating claims — important for grant writing. At $15/Mtok output, longer documents add up quickly compared to cheaper alternatives.

Prompt

Analyze this customer support transcript and tell me: (1) what the actual problem was, (2) whether the agent solved it, (3) one thing the agent could improve. [transcript of 40-message conversation attached as file]

Output

**Actual problem:** Customer's subscription renewed at the old $49/month rate instead of the advertised $29/month rate they saw during signup. The billing system applied a legacy pricing tier from a previous account under the same email. **Resolution status:** Partially solved. The agent issued a $20 credit for this month and flagged the account for manual review, but didn't confirm the rate would be corrected going forward. Customer accepted this but expressed uncertainty. **Improvement opportunity:** The agent asked the customer to repeat their account details three separate times across messages 8, 15, and 23. Implementing a CRM note-taking system or screen-pop would eliminate this friction and reduce the 40-message thread length significantly.

Notes

Shows the model's file-handling and multi-document reasoning — it can process long transcripts and extract structured insights. The analysis identifies both explicit issues (pricing) and implicit patterns (repeated questions). However, without public benchmarks, it's unclear how this compares to GPT-4 or other models on customer-support-specific tasks.

Use-case deep-dives

Multi-document contract negotiation

When 1M-token context justifies the $15/Mtok output premium

A 4-person legal ops team at a Series B SaaS company needs to cross-reference 12 vendor MSAs, 8 internal policy docs, and 20 email threads to draft redline positions. Claude Sonnet 4.5's 1M-token window fits the entire corpus in a single prompt—no chunking, no retrieval layer, no context-loss errors. The $3 input cost is negligible for batch work; the $15/Mtok output matters only if you're generating 50k+ token summaries daily. If your negotiation cycles run weekly and output stays under 10k tokens per session, this is the cleanest architecture: load everything once, ask follow-ups in thread, export the final position. Below 200k tokens of source material, you're overpaying for headroom you won't use—switch to a 128k model and save 60% on input.

Design QA with screenshot context

Image + text analysis for 10-person product teams under deadline

A product team ships 6 feature branches a week and needs to validate UI consistency against a 40-page brand guide and 15 reference screenshots. Claude Sonnet 4.5 ingests the PDF guide, all screenshots, and the Figma export URLs in one call, then answers "does this modal match our error-state pattern?" without a human re-uploading assets. The multimodal input keeps the workflow in Slack; designers get answers in 8 seconds instead of scheduling a sync. At $3/Mtok input, a 200-image sprint review costs $0.60—cheaper than one minute of meeting time. The caveat: if your QA is purely text (accessibility audits, microcopy checks), you're paying for vision you don't need. Benchmark data isn't public yet, so treat this as a known-good baseline until head-to-head tests confirm it beats GPT-4o on your specific design vocabulary.

High-frequency customer support triage

Why the output cost kills this model at 500+ tickets daily

A 12-person support team triages 600 Zendesk tickets a day, routing them to L1/L2/L3 queues based on sentiment, product area, and SLA risk. Each ticket averages 400 tokens in, 150 tokens out (routing tags + one-line summary). Claude Sonnet 4.5 would cost $270/day on output alone ($15/Mtok × 90M tokens/month), versus $54/day on a $3/Mtok output model with identical accuracy. The 1M-token context is wasted here—ticket history rarely exceeds 8k tokens, and you're not doing cross-ticket reasoning. Use this model only if your triage logic requires loading a 200k-token knowledge base per call and you can't precompute embeddings. Otherwise, you're paying a 5× penalty for context you'll never fill. At 500+ daily tickets, the math is unambiguous: pick the cheaper model and bank the $65k annual difference.

Frequently asked

Is Claude Sonnet 4.5 good for complex reasoning tasks?

Yes. Sonnet 4.5 sits in Anthropic's mid-tier slot, designed for production workloads that need strong reasoning without Opus pricing. It handles multi-step logic, code generation, and document analysis well. The 1M token context window means you can feed it entire codebases or long research papers. If you need the absolute ceiling performance, Opus exists, but Sonnet 4.5 covers 90% of use cases at a third of the output cost.

Is Claude Sonnet 4.5 cheaper than GPT-4o?

Yes, significantly on output tokens. Sonnet 4.5 costs $15/Mtok output versus GPT-4o's $15/Mtok (similar) but GPT-4o's input is $5/Mtok versus Sonnet's $3/Mtok. For read-heavy tasks like document summarization or code review where you send large inputs, Sonnet 4.5 saves 40% on the input side. For generation-heavy work, they're roughly equivalent. The real win is Anthropic's longer context window at this price point.

Can Claude Sonnet 4.5 handle 1 million token inputs reliably?

Yes, the full 1M context is usable in production. Anthropic built this tier for document-heavy workflows, so retrieval quality stays consistent even at the upper end of the window. Latency scales linearly with input size, so a 500K token prompt takes roughly twice as long as 250K. For most teams, the practical ceiling is around 600-800K tokens where response times stay under 30 seconds. Beyond that, consider chunking or using retrieval.

How does Sonnet 4.5 compare to Sonnet 3.5?

Sonnet 4.5 extends the context window from 200K to 1M tokens and improves instruction-following for structured outputs. The pricing stayed flat at $3/$15 per Mtok, so you get 5x the context budget at the same cost. If your workload fits in 200K tokens, the quality delta is incremental—better formatting adherence, slightly fewer refusals. The real upgrade is for teams hitting context limits with 3.5 or stitching multiple calls together.

Should I use Sonnet 4.5 for customer-facing chatbots?

Yes, if your chat needs document grounding or multi-turn memory. The 1M context lets you keep entire conversation histories in-context without vector search, and the $3 input pricing makes that affordable. Latency is acceptable for async chat (2-5 seconds for typical turns) but not real-time voice. If you need sub-second responses or don't use the long context, GPT-4o mini at $0.15/$0.60 per Mtok is faster and cheaper.

Compare with

Compare with anything else →