OpenAI: gpt-oss-20b
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...
Anyone in the Space can @-mention OpenAI: gpt-oss-20b with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- High-volume API calls on tight budgets
- Document analysis under 130k tokens
- Prototyping before scaling to larger models
- Teams locked into OpenAI tooling
Strengths
Pricing is the headline: at $0.03 input and $0.14 output per million tokens, this undercuts GPT-4o mini by approximately 75% on input and 50% on output. The 131k context window covers most single-document tasks without splitting. OpenAI API compatibility means drop-in replacement for existing integrations — no SDK rewrites, no new auth flows. For teams running thousands of daily requests on simpler tasks, the cost savings compound quickly.
Trade-offs
No public benchmarks means you're flying blind on reasoning depth, factual accuracy, and instruction-following relative to established alternatives like GPT-4o mini or Claude Haiku. Early OpenAI releases sometimes ship with unannounced capability gaps or rate-limit quirks. The 20B parameter count suggests lighter reasoning than flagship models, so expect weaker performance on multi-step logic, nuanced tone control, or domain-specific jargon. If your workflow depends on proven accuracy, wait for independent evals before committing production traffic.
Specifications
- Provider
- openai
- Category
- llm
- Context length
- 131,072 tokens
- Max output
- —
- Modalities
- text
- License
- proprietary
- Released
- 2025-08-05
Pricing
- Input
- $0.03/Mtok
- Output
- $0.14/Mtok
- Model ID
openai/gpt-oss-20b
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| openai | 131k | $0.03/Mtok | $0.14/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Summarize Long Documents
Read the attached document in full. Extract the three most important findings, each with a one-sentence explanation and supporting evidence from the text.Open in a Space →
Batch Email Drafts
Write five short email drafts for a product launch announcement. Vary the tone from formal to casual. Each email should be under 100 words and include a clear call-to-action.Open in a Space →
Extract Structured Data
Parse the following customer feedback and return a JSON array with fields: customer_name, issue_category, sentiment (positive/negative/neutral), and priority (high/medium/low).Open in a Space →
Rewrite for Clarity
Rewrite the following technical paragraph for a general audience. Use simple sentences, avoid jargon, and keep it under 80 words.Open in a Space →
Generate Test Data
Generate 20 realistic user profiles in JSON format. Each profile should include: name, email, age (18-65), city, and a short bio (one sentence).Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Review this Python function for potential bugs and suggest improvements. Focus on edge cases and performance: def merge_sorted_lists(list1, list2): result = [] i = j = 0 while i < len(list1) and j < len(list2): if list1[i] < list2[j]: result.append(list1[i]) i += 1 else: result.append(list2[j]) j += 1 return result
The function has a critical bug: it doesn't handle remaining elements after one list is exhausted. When the while loop exits, any remaining items in list1 or list2 are silently dropped. Add `result.extend(list1[i:])` and `result.extend(list2[j:])` after the loop. For performance, consider that repeated `len()` calls in the loop condition are unnecessary — cache these values beforehand. The function also lacks input validation; passing non-list types or unsorted lists will produce incorrect results without warning. Consider adding type hints and a docstring explaining the sorted-input requirement.
This example demonstrates gpt-oss-20b's code review capability with attention to correctness over style. The 131K token context window supports reviewing entire modules, though this example stays focused. The model identifies the algorithmic flaw immediately but doesn't provide a complete rewrite — useful for learning, less so for copy-paste fixes.
Explain how OAuth 2.0 authorization code flow works to a junior developer who understands HTTP but hasn't worked with authentication systems. Include the key security considerations.
OAuth 2.0's authorization code flow keeps your app from ever seeing the user's password. Here's the sequence: (1) Your app redirects the user to the authorization server with your client_id and requested scopes. (2) User logs in there and approves access. (3) Authorization server redirects back to your app with a temporary authorization code. (4) Your app exchanges that code for an access token by making a server-to-server request that includes your client_secret. (5) You use the access token to call APIs on the user's behalf. The key security win: the access token exchange happens server-side where the client_secret stays hidden. Never expose client_secret in browser code. Always validate the redirect URI to prevent authorization code interception. Use PKCE extension for mobile apps where you can't secure a client_secret.
This showcases the model's ability to translate technical concepts into accessible explanations while preserving accuracy. The response balances conceptual flow with security specifics. At $0.14/Mtok output pricing, longer explanatory responses like this remain cost-effective compared to premium models, making gpt-oss-20b suitable for documentation generation or internal training materials.
Generate a SQL query to find the top 5 customers by total purchase amount in the last 90 days, including their email and the count of orders. Tables: customers (id, email, name), orders (id, customer_id, created_at, total_amount).
```sql SELECT c.email, c.name, SUM(o.total_amount) AS total_spent, COUNT(o.id) AS order_count FROM customers c INNER JOIN orders o ON c.id = o.customer_id WHERE o.created_at >= CURRENT_DATE - INTERVAL '90 days' GROUP BY c.id, c.email, c.name ORDER BY total_spent DESC LIMIT 5; ``` This query joins customers with their orders, filters to the 90-day window, aggregates by customer, and returns the top 5 by spend. Note: `c.id` is included in GROUP BY to ensure correct grouping even if emails aren't unique.
The model produces syntactically correct SQL with appropriate aggregation logic. The query uses standard SQL with PostgreSQL-style interval syntax — the model doesn't clarify dialect assumptions, which could cause issues on MySQL or SQL Server. The 131K context window means you can paste entire schema definitions for more accurate queries, though this simple example doesn't require it.
Use-case deep-dives
When gpt-oss-20b makes sense for 24/7 ticket routing at scale
A 12-person SaaS company handling 800+ support tickets daily needs instant first-pass triage without breaking the budget. gpt-oss-20b delivers here: at $0.03/Mtok input, you can route every ticket through intent classification and urgency scoring for roughly $15/day assuming 200-token average tickets. The 131k context window means you can include the last 20 customer interactions in each classification call, catching escalation patterns most models miss at this price point. Output cost sits at $0.14/Mtok, so if you're generating short labels (not full responses), your monthly bill stays under $600 even at this volume. The trade-off: no public benchmarks means you'll spend a week testing accuracy against your existing routing rules before going live. If classification accuracy falls below 85% in your tests, step up to a benchmarked alternative.
Why gpt-oss-20b handles 50-page policy documents without chunking
A 4-person policy consultancy turns 40-60 page government RFPs into 2-page executive briefs for clients, processing 15-20 documents monthly. gpt-oss-20b's 131k token window fits most full documents in a single call—no chunking, no context loss at section boundaries. At $0.03 input per million tokens, each 50-page document (roughly 35k tokens) costs about $1.05 to ingest, and a 1,500-token summary runs $0.21 to generate. Total per-document cost: $1.26 versus $4-8 for models with smaller windows that require map-reduce patterns. The risk: without published benchmark scores on long-context retrieval tasks, you're flying blind on whether it actually uses that full 131k effectively. Run a two-week pilot on past RFPs where you know the correct answer before committing your client workflow to this model.
When gpt-oss-20b wins on overnight comment queue processing
A 20-person online community platform wakes up to 3,000+ user comments each morning that need toxicity scoring before going live. gpt-oss-20b's pricing structure favors this batch pattern: you can process all 3,000 comments (averaging 80 tokens each) for roughly $7.20 in input costs, and if you're generating simple JSON verdicts (50 tokens per response), output runs another $2.10. Total daily moderation cost: under $10, or $300/month for a problem that would cost $800+ on mainstream alternatives. The 131k context window also lets you include the user's last 30 comments in each moderation call, catching repeat-offender patterns. The catch: this only works if you can tolerate 2-4 hour processing windows overnight. If you need real-time moderation (under 500ms), the lack of streaming support and unknown latency characteristics make this the wrong call.
Frequently asked
Is GPT-OSS-20B good for general text tasks?
Yes, it handles standard text generation, summarization, and Q&A well. The 131k token context window means you can process long documents without chunking. At $0.03 input / $0.14 output per Mtok, it's positioned as a budget option for high-volume workloads where you don't need frontier reasoning capabilities.
Is GPT-OSS-20B cheaper than GPT-4o mini?
Yes, significantly. GPT-4o mini costs $0.15 input / $0.60 output per Mtok — 5x more on input, 4x more on output. If you're running batch jobs or internal tools where GPT-4-class reasoning isn't required, OSS-20B saves real money. For customer-facing chat where quality matters, pay the premium.
Can GPT-OSS-20B handle code generation?
It can write basic scripts and fix syntax errors, but don't expect GPT-4 Turbo or Claude Sonnet performance. Without public benchmarks, assume it's closer to GPT-3.5 level — fine for boilerplate, weak on complex refactoring or architecture decisions. Use a dedicated code model if programming is the primary use case.
How does GPT-OSS-20B compare to the previous GPT-3.5 models?
OpenAI hasn't published direct comparisons, so treat this as a cost-optimized alternative rather than a capability upgrade. The 131k context window is the main advantage over GPT-3.5 Turbo's 16k. If you need long-context summarization at scale and can tolerate GPT-3.5-era quality, it's worth testing.
Should I use GPT-OSS-20B for high-volume content moderation?
Yes, if you're classifying or filtering text at scale. The low input cost makes it viable for scanning thousands of messages per hour. Pair it with a smaller classifier model for the first pass, then use OSS-20B for nuanced decisions. Latency should be acceptable for async queues, not real-time chat.