LLMopenai

OpenAI: GPT-4

OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning...

Anyone in the Space can @-mention OpenAI: GPT-4 with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

GPT-4 was the flagship model that set the bar for reasoning and instruction-following when it launched in 2023, but its 8K context window and $30/$60 per Mtok pricing now look dated against newer alternatives. It still delivers reliable performance on complex reasoning tasks and maintains strong safety guardrails, but you'll hit context limits quickly with longer documents. Reach for this if you need proven stability on established workflows, but consider GPT-4o or Claude Sonnet 4 for new projects where cost and context matter.

Best for

  • Legacy workflows requiring GPT-4 compatibility
  • Complex multi-step reasoning tasks
  • Applications with strict safety requirements
  • Tasks under 6K tokens of context

Strengths

GPT-4 established the template for modern instruction-following and excels at breaking down complex problems into logical steps. Its training emphasized safety and refusal behavior, making it predictable in production environments where content filtering matters. The model handles nuanced instructions well and maintains coherent reasoning across multi-turn conversations within its context window. Its maturity means extensive community knowledge and debugging resources.

Trade-offs

The 8K context window is the primary limitation — you'll exhaust it with a single long document or a few turns of conversation with code snippets. At $30 input and $60 output per million tokens, it costs 6-10x more than GPT-4o while delivering similar or inferior performance on most tasks. It lacks vision capabilities and function calling feels less polished than in GPT-4 Turbo or GPT-4o. For new projects, the cost-performance ratio rarely justifies choosing this over newer OpenAI models.

Specifications

Provider
openai
Category
llm
Context length
8,191 tokens
Max output
4,096 tokens
Modalities
text
License
proprietary
Released
2023-05-28

Pricing

Input
$30.00/Mtok
Output
$60.00/Mtok
Model ID
openai/gpt-4

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$686.40
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
openai8k$30.00/Mtok$60.00/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Multi-Step Problem Breakdown

Break down this problem into discrete steps, explaining your reasoning at each stage: [describe your problem]. Number each step and show dependencies between them.
Open in a Space →

Nuanced Policy Interpretation

Read this policy document and identify edge cases where the rules might conflict or require judgment calls: [paste policy text]. Explain your interpretation for each case.
Open in a Space →

Structured Data Extraction

Extract key entities from this text and return them as JSON with fields for name, role, and relationship: [paste text]. Ensure valid JSON syntax.
Open in a Space →

Code Review with Explanations

Review this code for logic errors, edge cases, and maintainability issues: [paste code]. For each issue, explain the potential impact and suggest a fix.
Open in a Space →

Argument Evaluation

Analyze the logical structure of this argument and identify any fallacies, unsupported claims, or gaps in reasoning: [paste argument]. Rate the overall strength of the conclusion.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Refactor this Python function to use list comprehensions and remove the nested loops. Keep the same logic but make it more Pythonic.

Output

The model would produce a clean refactor that replaces nested for-loops with a single list comprehension or generator expression, preserving edge cases like empty inputs. It typically adds inline comments explaining non-obvious transformations and suggests type hints if the original code lacked them. The refactored version maintains readability while reducing line count by 30-40%.

Notes

GPT-4 excels at idiomatic rewrites that balance brevity with clarity. Its 8K context window handles medium-sized functions comfortably, though it may truncate explanations when refactoring larger modules. The model occasionally over-optimizes for conciseness at the expense of beginner readability.

Prompt

Explain the CAP theorem to a product manager who needs to choose between PostgreSQL and Cassandra for a new feature. Focus on practical trade-offs, not theory.

Output

The model would frame CAP theorem as a decision tree: PostgreSQL guarantees consistency and tolerates partition failures by sacrificing some availability (transactions may block). Cassandra prioritizes availability and partition tolerance, meaning reads might return stale data briefly. It would then map these to product scenarios—'use Postgres if you need strict inventory counts; use Cassandra if you need a global activity feed that can't go down.' The explanation avoids academic jargon.

Notes

GPT-4 translates technical concepts into business context effectively, a strength for cross-functional documentation. The 8K window allows it to include 2-3 concrete examples without truncation. However, it sometimes hedges excessively ('it depends') rather than making a clear recommendation when one exists.

Prompt

Draft a three-paragraph email declining a vendor proposal. Tone: polite but firm. Reason: their pricing model doesn't align with our usage patterns, but we want to stay in touch.

Output

The model would produce a structured email opening with appreciation for the proposal, a second paragraph diplomatically explaining the pricing mismatch (e.g., 'our intermittent usage makes flat-rate licensing cost-prohibitive'), and a closing paragraph expressing interest in future conversations if their model evolves. The tone balances professionalism with warmth—no corporate clichés, no over-apologizing.

Notes

GPT-4 handles nuanced communication tasks well, capturing tone constraints without sounding robotic. At $60/Mtok output, this use case is expensive relative to simpler models that could draft emails adequately. The model shines when the task requires reading subtext or navigating political sensitivity.

Use-case deep-dives

Multi-round client proposal editing

When GPT-4 justifies the premium on high-stakes writing

A 4-person consulting shop sends 8-12 proposals a month, each going through 3-4 revision cycles with client feedback. GPT-4 at $30/$60 per Mtok costs roughly $0.18 per 3,000-token proposal draft—negligible against a $40k contract. The 8k context window holds the full RFP, previous proposal sections, and client comments in one prompt, so the model rewrites with all context in view. Cheaper models at this context length either hallucinate requirements or lose thread between sections. The cost threshold: if you're drafting under 20 documents a month where each mistake costs you a deal, GPT-4's reliability pays for itself. Beyond that volume, test whether GPT-4o or Claude 3.5 Sonnet hit your quality bar at half the output cost.

Structured data extraction from PDFs

Where GPT-4 loses to newer models on repetitive extraction

A 10-person finance team processes 200 invoices a week, pulling vendor names, line items, and totals into a database. GPT-4 can handle the task but at $30 input per Mtok, scanning 200 PDFs (average 4k tokens each) costs $24/week or $1,248/year. The 8k context cap means multi-page invoices need chunking, which introduces errors at page boundaries. GPT-4o runs the same workload at $2.50 input per Mtok—$2/week—with a 128k window that swallows entire documents. GPT-4's instruction-following was best-in-class in 2023, but for high-volume structured tasks where the schema is fixed, newer models with lower input pricing and larger windows are the correct call. Use GPT-4 here only if you're on a legacy integration that hasn't migrated to the newer endpoint.

Real-time Slack support triage

Why GPT-4's latency and cost don't fit live chat workflows

A 15-person SaaS startup wants an AI to read incoming Slack support threads and tag them as billing, technical, or sales before routing. GPT-4 handles the classification but averages 4-6 seconds per response and costs $0.24 per 1,000 customer messages at typical token counts. Over 500 messages a day, that's $120/month and a noticeable lag that frustrates customers expecting instant acknowledgment. GPT-4o delivers sub-2-second responses at $15/month for the same volume, and the 128k context window lets the model see the full thread history without truncation. GPT-4 made sense when it was the only reliable classifier; now it's the wrong tool for any workflow where humans are waiting on the model to respond. Save it for the complex escalations that need the extra reasoning depth.

Frequently asked

Is GPT-4 still good for general text tasks in 2025?

Yes, but it's outclassed by newer models in most categories. GPT-4 handles reasoning, summarisation, and creative writing competently, but GPT-4 Turbo, Claude 3.5 Sonnet, and Gemini 1.5 Pro all deliver better performance at lower cost. Use GPT-4 only if you need the original model for consistency with existing workflows or specific fine-tuned behaviour.

Is GPT-4 cheaper than GPT-4 Turbo or Claude?

No. At $30 input and $60 output per Mtok, GPT-4 costs roughly 3× more than GPT-4 Turbo ($10/$30) and 6× more than Claude 3.5 Sonnet ($3/$15). The 8K context window makes it even less economical for document-heavy work. Unless you're locked into the original GPT-4 API for legacy reasons, switch to a newer model.

Can GPT-4 handle long documents with its 8K context?

Not effectively. 8,191 tokens is roughly 6,000 words — enough for short reports but inadequate for research papers, legal contracts, or codebases. GPT-4 Turbo offers 128K tokens, Claude 3.5 Sonnet gives you 200K, and Gemini 1.5 Pro reaches 2M. For anything beyond basic chat, the context limit is a deal-breaker.

How does GPT-4 compare to GPT-4 Turbo?

GPT-4 Turbo is faster, cheaper, and has 16× the context window. The original GPT-4 was OpenAI's flagship in 2023, but Turbo replaced it as the default for good reason. Performance is comparable on most tasks, with Turbo occasionally scoring higher on coding and maths benchmarks. Stick with Turbo unless you need exact GPT-4 behaviour for reproducibility.

Should I use GPT-4 for production chatbots?

Only if you're maintaining a legacy system. The 8K context means conversations truncate quickly, and the $60/Mtok output pricing makes high-volume chat expensive. GPT-4 Turbo, Claude 3.5 Sonnet, or even GPT-3.5 Turbo deliver better cost-per-conversation metrics. For new deployments, choose a model with larger context and lower output costs.

Data last verified 7 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.