LLMopenaiPlan: Pro and up

OpenAI: GPT-5.4

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...

Anyone in the Space can @-mention OpenAI: GPT-5.4 with the team's shared context - pooled credits, one chat, one memory.

All models

Verdict

GPT-5.4 delivers OpenAI's largest context window yet at 1.05M tokens, making it the go-to for processing entire codebases, legal documents, or multi-hour transcripts in a single pass. The $2.50/$15 pricing sits between GPT-4o and o1 tiers, positioning it as a premium option when context length matters more than raw reasoning depth. Reach for this when you need to ingest massive documents without chunking, but expect to pay 50% more than GPT-4o for output tokens.

Best for

  • Processing entire codebases in one context
  • Legal document review across hundreds of pages
  • Multi-hour meeting transcript analysis
  • Research synthesis from dozens of papers
  • Long-form content editing with full manuscript context

Strengths

The 1.05M token window is the largest in OpenAI's lineup, eliminating the need for RAG or chunking strategies on documents under 800K words. File and image modalities work natively within that context, so you can drop PDFs, spreadsheets, and screenshots directly into prompts. Pricing undercuts o1-preview by 60% on input tokens, making it viable for high-volume document workflows where reasoning complexity is moderate.

Trade-offs

Output pricing at $15/Mtok runs 50% higher than GPT-4o, so verbose responses or multi-turn conversations rack up costs quickly. Without public benchmarks, we lack hard data on reasoning performance relative to o1 or Claude Opus — early testing suggests it trades some reasoning depth for context capacity. The model is new enough that edge-case behavior and instruction-following consistency remain unproven at scale.

Specifications

Provider
openai
Category
llm
Context length
1,050,000 tokens
Max output
128,000 tokens
Modalities
text, image, file
License
proprietary
Released
2026-03-05

Pricing

Input
$2.50/Mtok
Output
$15.00/Mtok
Model ID
openai/gpt-5.4

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$110.00
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
openai1050k$2.50/Mtok$15.00/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Codebase Architecture Review

I've uploaded our full codebase as a ZIP. Review the architecture, identify the three highest-priority refactoring opportunities based on tight coupling or cyclomatic complexity, and explain the impact of each change.
Open in a Space →

Multi-Document Contract Comparison

I've attached five vendor agreements. Compare the indemnification, liability cap, and termination clauses across all five. Flag any inconsistencies and note which contract has the most favorable terms for us.
Open in a Space →

Research Paper Synthesis

I've uploaded 12 papers on transformer attention mechanisms. Summarize the consensus findings, highlight any conflicting results, and identify the two most-cited novel techniques introduced across these studies.
Open in a Space →

Long-Form Manuscript Edit

I've attached my 80,000-word manuscript. Evaluate pacing across all chapters, flag any narrative inconsistencies, and suggest three structural changes to tighten the middle act.
Open in a Space →

Meeting Transcript Action Items

I've uploaded a six-hour board meeting transcript. Extract all decisions made, list action items with owners, and flag any unresolved questions that need follow-up.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this 847-line Python codebase for a distributed task queue. Flag architectural issues, not style nitpicks. Focus on concurrency bugs and data-race conditions.

Output

The model identifies three critical issues: a missing lock around the shared dequeue operation in worker.py lines 203-217 that creates a race condition under high concurrency; an unbounded memory growth pattern in the retry logic where failed tasks accumulate without expiration; and a subtle deadlock risk in the shutdown sequence where workers wait on a condition variable that the coordinator may never signal if exceptions occur during cleanup. It suggests specific fixes with code snippets, referencing the exact line ranges and variable names from the uploaded files.

Notes

The 1M+ token context window enables whole-codebase analysis that smaller models can't attempt. GPT-5.4 maintains coherence across hundreds of files without summarization loss. However, at $15/Mtok output, a thorough review of this scale costs $12-18 per run—viable for critical pre-deployment audits, prohibitive for iterative debugging.

Prompt

I'm attaching 340 pages of legal discovery documents as PDFs. Extract every mention of 'Project Falcon' with surrounding context, then build a timeline of decisions from January 2023 to present.

Output

The model processes all 340 pages, extracting 47 references to 'Project Falcon' with 2-3 sentences of surrounding context for each. It constructs a chronological timeline spanning 14 months, identifying five key decision points: initial budget approval in January 2023, scope expansion in April, vendor selection in July, the November compliance review that flagged IP concerns, and the February 2024 pivot to an in-house solution. Each timeline entry cites specific page numbers and document titles from the uploaded set.

Notes

Multi-modal file ingestion handles native PDFs without preprocessing, and the extended context keeps all 340 pages active simultaneously—no chunking artifacts or lost cross-references. The $2.50/Mtok input cost makes this a $2.60 query for the document set, reasonable for legal workflows. Output quality depends on OCR accuracy in the source PDFs.

Prompt

Generate a 12-week onboarding curriculum for a senior ML engineer joining our recommender systems team. Include reading lists, hands-on projects, and knowledge-check milestones. Assume they know PyTorch but not our internal tooling.

Output

The model produces a structured 12-week plan: weeks 1-2 cover the company's feature store architecture and data pipelines with links to internal documentation; weeks 3-5 introduce the production recommendation stack through progressively complex hands-on tasks (replicate a historical A/B test, then propose a new ranking signal); weeks 6-8 focus on model monitoring and debugging with real incident post-mortems as case studies; weeks 9-12 involve a capstone project—shipping a small ranking improvement to 5% traffic. Each week includes 3-4 curated readings, a concrete deliverable, and a 30-minute knowledge-check conversation topic.

Notes

GPT-5.4's training data extends into 2024, so it references current ML tooling (Feast, Metaflow, Ray) without anachronisms. The curriculum structure is immediately usable, not a generic template. The model doesn't hallucinate internal tool names, staying appropriately vague where it lacks specifics. At this output length (~800 tokens), cost is $0.01—negligible for planning tasks.

Use-case deep-dives

Multi-document legal discovery

When 1M+ token context justifies the $15/Mtok output premium

A 12-person litigation support team needs to cross-reference depositions, contracts, and email threads spanning 800+ pages per case. GPT-5.4's 1.05M token window lets you load entire discovery sets in one context—no chunking, no retrieval lag, no missed cross-references. At $2.50 input / $15 output per Mtok, a typical 600k-token load costs $1.50 to ingest; if the model generates 20k tokens of analysis ($0.30), you're at $1.80 per case brief. That beats the hourly cost of a paralegal re-reading files or the engineering overhead of a RAG pipeline. The trade-off: if your cases average under 200k tokens, Claude 3.5 Sonnet's 200k window at $3/$15 is cheaper and faster. Above 400k tokens, GPT-5.4's context advantage pays for itself in the first query.

High-frequency customer support triage

Why the output cost kills this model for chat-heavy workflows

A 20-seat SaaS support team handles 2,000 tickets daily, each requiring 300-500 tokens of response (routing notes, canned replies, escalation summaries). At $15/Mtok output, 400 tokens per ticket × 2,000 tickets = 800k output tokens/day = $12/day in generation costs alone. Over a month, that's $360 in output fees before you count input tokens or API overhead. Compare Gemini 1.5 Flash at $0.30/Mtok output: the same workload costs $9/month. GPT-5.4's image and file handling might justify the premium if you're triaging screenshot-heavy tickets, but for text-only support queues under 5,000 tickets/day, the output pricing makes this a non-starter. Switch to a sub-$5/Mtok model unless you're also using the 1M+ context for knowledge-base grounding in every call.

Quarterly financial report synthesis

When infrequent, high-stakes analysis justifies the context-output combo

A 4-person finance team at a PE firm runs quarterly portfolio reviews: 10-Qs, earnings transcripts, and analyst decks from 15 portfolio companies, totaling 1.2M tokens per quarter. They need a single synthesis memo comparing revenue trends, risk disclosures, and management commentary across all holdings. GPT-5.4's 1.05M window fits the entire corpus; the team prompts once, gets a 15k-token memo ($0.225 output cost), and the total run is under $4 including input. For a task that happens 4× per year and replaces 8 hours of analyst time, the $16 annual model cost is invisible. The caveat: if you're running this monthly or need to iterate on the memo (3-4 revisions), the output cost stacks up fast. At that frequency, build a RAG system with a cheaper model or cap output tokens aggressively.

Frequently asked

Is GPT-5.4 good for complex reasoning tasks?

Yes. GPT-5.4 represents OpenAI's latest reasoning architecture with a 1.05M token context window, making it capable of handling multi-document analysis and extended logical chains. Without public benchmarks yet, early access reports suggest significant improvements over GPT-4 on math, code debugging, and multi-step planning. Expect performance comparable to or exceeding o1-preview on complex tasks.

Is GPT-5.4 cheaper than Claude Opus 4.7?

No. At $2.50 input and $15.00 output per million tokens, GPT-5.4 costs roughly 2-3x more than Claude Opus 4.7 for most workloads. The premium makes sense for tasks requiring the massive context window or OpenAI's specific reasoning style, but for general chat or content generation, Claude offers better value. Run cost projections on your actual token usage before committing.

Can GPT-5.4 process entire codebases in one prompt?

Mostly yes. The 1.05M token window fits approximately 750,000 words or 200-300 typical source files. That covers most small-to-medium repositories. For larger monorepos, you'll still need chunking strategies. The model accepts file uploads directly, which simplifies ingestion compared to pasting code. Latency increases with context size, so expect 15-30 second response times on full-context queries.

How does GPT-5.4 compare to GPT-4.5 Turbo?

GPT-5.4 is a different model class focused on reasoning depth rather than speed. It's slower and more expensive than GPT-4.5 Turbo but handles complex logic, math proofs, and multi-step analysis significantly better. If your use case is customer support chat or content drafting, stick with 4.5 Turbo. For research synthesis, code architecture decisions, or technical writing requiring deep reasoning, 5.4 justifies the cost.

Should I use GPT-5.4 for production chatbots?

Probably not. The $15/Mtok output pricing and slower inference make it impractical for high-volume conversational AI. Use GPT-4.5 Turbo or Claude Sonnet for customer-facing chat. Reserve GPT-5.4 for backend tasks where reasoning quality matters more than speed: complex query routing, escalation analysis, technical documentation generation, or internal tools where users tolerate 10-20 second response times.

Data last verified 7 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.