LLMopenaiPlan: Pro and up

OpenAI: GPT-5 Codex

GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

Anyone in the Space can @-mention OpenAI: GPT-5 Codex with the team's shared context - pooled credits, one chat, one memory.

All models

Verdict

GPT-5 Codex targets code generation and technical reasoning with a 400K context window that handles entire codebases in one pass. At $1.25/$10 per Mtok, it sits between budget and premium tiers — cheaper than o1 for input, pricier on output. The lack of public benchmarks makes this a wait-and-see pick until independent evals surface, but the context size and multimodal support position it for repository-scale refactoring and technical documentation work where you need to reference dozens of files simultaneously.

Best for

Whole-codebase refactoring and analysis
Technical documentation from screenshots
Multi-file code review with context
Repository-wide dependency mapping
Architecture diagrams to implementation

Strengths

The 400K context window lets you load an entire small-to-medium repository without chunking or retrieval tricks. Multimodal input means you can paste architecture diagrams, UI mockups, or error screenshots alongside code. Input pricing at $1.25/Mtok undercuts o1-preview by 75%, making exploratory passes over large codebases economical. The Codex lineage suggests strong function-calling and structured output for tool integration.

Trade-offs

Output costs at $10/Mtok climb fast if you generate long diffs or documentation — a 10K-token refactor costs $0.10, triple what you'd pay with Gemini 2.0 Flash. No public benchmarks yet, so claims about code quality or reasoning depth remain unverified against HumanEval, MBPP, or SWE-bench. Vision capabilities are listed but unspecified — unclear if it matches GPT-4o's OCR or diagram parsing fidelity. Early-access models often ship with rate limits that throttle production use.

Specifications

Provider: openai
Category: llm
Context length: 400,000 tokens
Max output: 128,000 tokens
Modalities: text, image
License: proprietary
Released: 2025-09-23

Pricing

Input: $1.25/Mtok
Output: $10.00/Mtok
Model ID: openai/gpt-5-codex

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$68.20

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
openai	400k	$1.25/Mtok	$10.00/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Codebase Dependency Audit

Analyze all files in this repository. List every function that's called from more than three different modules, then suggest which ones should be extracted into a shared utility layer.

Open in a Space →

Screenshot to Component

Generate a React functional component that replicates this UI screenshot. Use Tailwind for styling and include prop types for any dynamic content you identify.

Open in a Space →

Multi-File Refactor Plan

I want to rename the `UserService` class to `AccountService` across this codebase. Generate a refactor plan that lists every file to change, in dependency order, with before-and-after diffs.

Open in a Space →

API Documentation from Code

Scan all Express route files and generate an OpenAPI 3.0 spec. Include example request bodies and response schemas for each endpoint.

Open in a Space →

Architecture Diagram Explanation

This diagram shows our microservices architecture. For each service box, describe its responsibilities, the APIs it exposes, and which databases it connects to.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Refactor this React component to use server actions instead of client-side fetch. Keep the error handling but simplify the loading states.

Output

The model would produce a clean refactor moving data fetching to a server action with proper TypeScript types, eliminating useState for loading while preserving error boundaries. It typically structures the response with inline comments explaining the architectural shift, uses modern Next.js 14+ patterns, and suggests file organization (separating the action into actions/user.ts). The code maintains the original component's UI contract while reducing client bundle size.

Notes

This showcases GPT-5 Codex's strength in architectural refactoring across framework boundaries. With a 400k token context window, it can hold entire codebases in memory to ensure the refactor doesn't break distant imports. The trade-off: at $10/Mtok output, generating large refactors costs more than specialized code models, so it's best for complex migrations rather than routine edits.

Prompt

Analyze this PostgreSQL query plan and explain why the nested loop is causing the timeout. Suggest an index strategy that works with our existing schema constraints.

Output

The model would identify that the nested loop scans 2.3M rows because the join condition lacks a covering index on the timestamp column, then explain how the planner chose nested loop over hash join due to outdated statistics. It would propose a partial index on active records with the timestamp included, show the expected plan change with EXPLAIN output, and note that the index won't help the monthly report query that needs full table scans anyway.

Notes

This demonstrates deep reasoning about database internals and schema trade-offs — GPT-5 Codex connects query plans to real-world constraints rather than suggesting generic 'add an index' advice. The multimodal capability means you can paste screenshots of pgAdmin explain visualizations. The limitation: no access to your actual table statistics, so recommendations assume typical data distributions.

Prompt

I have a flowchart image showing our deployment pipeline. Write Terraform modules that implement this architecture, including the approval gates and rollback logic shown in the diagram.

Output

The model would parse the flowchart's boxes and arrows to generate modular Terraform with separate files for each stage (build.tf, staging.tf, prod.tf). It would implement the manual approval gates using null_resource with local-exec calling a webhook, add lifecycle rules matching the rollback arrows, and include a README explaining how the generated code maps back to each flowchart element. Variable defaults would match the environment labels visible in the image.

Notes

This highlights the image understanding capability — GPT-5 Codex can translate visual architecture diagrams into working infrastructure code, saving the translation step. The 400k context window means it can reference both the image and your existing Terraform state in one prompt. Trade-off: it can't verify the generated Terraform against your actual AWS account, so you'll need to run plan before applying.

Use-case deep-dives

Multi-file refactoring sessions

When 400k context justifies the $10/Mtok output cost on refactors

A 12-person product team needs to refactor a legacy monorepo spanning 80+ files without losing type safety or breaking API contracts. GPT-5 Codex's 400k context window means you can load the entire module graph—controllers, models, tests, and config—into a single session and ask for cross-file rewrites that preserve call signatures. The $10/Mtok output price stings on exploratory work, but when you're generating 50k tokens of production-ready diff in one pass, the time saved on manual reconciliation pays back the cost. If your refactor touches fewer than 20 files or you're prototyping rather than shipping, drop to a cheaper model with 128k context and do the stitching yourself.

Technical documentation generation

Why the image modality matters for auto-generating API docs from screenshots

A 4-person SaaS startup ships a new dashboard every two weeks and needs to keep Notion docs in sync with the UI. GPT-5 Codex's image input lets you paste a screenshot of the settings panel alongside the React component source, then generate the user-facing explanation in one prompt. The model cross-references what it sees in the image with the prop definitions in code, so the output matches both the visual hierarchy and the actual behavior. At $1.25/Mtok input, processing 10 screenshots per sprint costs under a dollar. If you're only documenting APIs without a UI layer, the image modality is wasted—use a text-only model at half the input cost.

High-frequency code review

When GPT-5 Codex is too expensive for PR comment automation

A 30-person engineering org wants to auto-comment on every pull request with style fixes and logic suggestions. At 200 PRs/week averaging 8k tokens of diff each, you're generating 1.6M tokens of output per week—$16 at GPT-5 Codex's $10/Mtok rate. The model's 400k context is overkill for single-PR review, and the lack of public benchmarks means you're paying premium pricing without proof it outperforms cheaper alternatives on linting or bug-spotting tasks. For this volume, switch to a model priced under $2/Mtok output and reserve GPT-5 Codex for the 5% of PRs that touch architectural boundaries where the extra context actually matters.

Frequently asked

Is GPT-5 Codex good for coding?

Yes, GPT-5 Codex is purpose-built for code generation and debugging. With a 400K token context window, it can hold entire codebases in memory while suggesting changes. The multimodal support lets you paste screenshots of error messages or UI mockups. Expect strong performance on refactoring, test generation, and explaining legacy code.

Is GPT-5 Codex cheaper than Claude Sonnet for code tasks?

No. At $10/Mtok output, GPT-5 Codex costs roughly 4x more than Claude Sonnet 3.5 for generated code. If you're writing thousands of lines daily, Sonnet wins on price. Use GPT-5 Codex when you need the 400K context window for large repos or when image input matters for debugging UI code.

Can GPT-5 Codex handle a full monorepo in one prompt?

Depends on repo size. The 400K token context fits roughly 300K words or 120K lines of code. Most mid-sized monorepos fit; enterprise-scale repos with millions of lines won't. For repos that exceed the limit, chunk by module or use retrieval-augmented generation to pull relevant files only.

How does GPT-5 Codex compare to GPT-4 Turbo for code?

GPT-5 Codex doubles GPT-4 Turbo's context window (400K vs 200K) and adds native image understanding for debugging screenshots. Pricing is similar per token. Without public benchmarks yet, anecdotal reports suggest better reasoning on complex algorithms and fewer hallucinated function names. Upgrade if context size blocks you today.

Should I use GPT-5 Codex for production code review automation?

Yes, if you can afford the output cost. The large context window means it reviews entire pull requests without truncation. Image support helps when PRs include UI changes. Set up structured output formatting to get consistent JSON or markdown reports. Budget $0.01–0.05 per review depending on PR size.