LLMopenai

OpenAI: gpt-oss-20b

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Anyone in the Space can @-mention OpenAI: gpt-oss-20b with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

GPT-OSS-20B targets teams needing OpenAI API compatibility at aggressive pricing — $0.03/$0.14 per Mtok undercuts GPT-4o mini by roughly 75%. The 131k context window handles most document workflows without chunking. Trade-off: no public benchmarks yet, so performance relative to GPT-4o mini or Claude Haiku remains unverified. Best for cost-conscious teams already on OpenAI infrastructure who can tolerate early-adopter risk on a model still proving itself in production.

Best for

  • High-volume API calls on tight budgets
  • Document analysis under 130k tokens
  • Prototyping before scaling to larger models
  • Teams locked into OpenAI tooling

Strengths

Pricing is the headline: at $0.03 input and $0.14 output per million tokens, this undercuts GPT-4o mini by approximately 75% on input and 50% on output. The 131k context window covers most single-document tasks without splitting. OpenAI API compatibility means drop-in replacement for existing integrations — no SDK rewrites, no new auth flows. For teams running thousands of daily requests on simpler tasks, the cost savings compound quickly.

Trade-offs

No public benchmarks means you're flying blind on reasoning depth, factual accuracy, and instruction-following relative to established alternatives like GPT-4o mini or Claude Haiku. Early OpenAI releases sometimes ship with unannounced capability gaps or rate-limit quirks. The 20B parameter count suggests lighter reasoning than flagship models, so expect weaker performance on multi-step logic, nuanced tone control, or domain-specific jargon. If your workflow depends on proven accuracy, wait for independent evals before committing production traffic.

Specifications

Provider
openai
Category
llm
Context length
131,072 tokens
Max output
Modalities
text
License
proprietary
Released
2025-08-05

Pricing

Input
$0.03/Mtok
Output
$0.14/Mtok
Model ID
openai/gpt-oss-20b

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$1.10
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
openai131k$0.03/Mtok$0.14/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Summarize Long Documents

Read the attached document in full. Extract the three most important findings, each with a one-sentence explanation and supporting evidence from the text.
Open in a Space →

Batch Email Drafts

Write five short email drafts for a product launch announcement. Vary the tone from formal to casual. Each email should be under 100 words and include a clear call-to-action.
Open in a Space →

Extract Structured Data

Parse the following customer feedback and return a JSON array with fields: customer_name, issue_category, sentiment (positive/negative/neutral), and priority (high/medium/low).
Open in a Space →

Rewrite for Clarity

Rewrite the following technical paragraph for a general audience. Use simple sentences, avoid jargon, and keep it under 80 words.
Open in a Space →

Generate Test Data

Generate 20 realistic user profiles in JSON format. Each profile should include: name, email, age (18-65), city, and a short bio (one sentence).
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential bugs and suggest improvements. Focus on edge cases and performance: def merge_sorted_lists(list1, list2): result = [] i = j = 0 while i < len(list1) and j < len(list2): if list1[i] < list2[j]: result.append(list1[i]) i += 1 else: result.append(list2[j]) j += 1 return result

Output

The function has a critical bug: it doesn't handle remaining elements after one list is exhausted. When the while loop exits, any remaining items in list1 or list2 are silently dropped. Add `result.extend(list1[i:])` and `result.extend(list2[j:])` after the loop. For performance, consider that repeated `len()` calls in the loop condition are unnecessary — cache these values beforehand. The function also lacks input validation; passing non-list types or unsorted lists will produce incorrect results without warning. Consider adding type hints and a docstring explaining the sorted-input requirement.

Notes

This example demonstrates gpt-oss-20b's code review capability with attention to correctness over style. The 131K token context window supports reviewing entire modules, though this example stays focused. The model identifies the algorithmic flaw immediately but doesn't provide a complete rewrite — useful for learning, less so for copy-paste fixes.

Prompt

Explain how OAuth 2.0 authorization code flow works to a junior developer who understands HTTP but hasn't worked with authentication systems. Include the key security considerations.

Output

OAuth 2.0's authorization code flow keeps your app from ever seeing the user's password. Here's the sequence: (1) Your app redirects the user to the authorization server with your client_id and requested scopes. (2) User logs in there and approves access. (3) Authorization server redirects back to your app with a temporary authorization code. (4) Your app exchanges that code for an access token by making a server-to-server request that includes your client_secret. (5) You use the access token to call APIs on the user's behalf. The key security win: the access token exchange happens server-side where the client_secret stays hidden. Never expose client_secret in browser code. Always validate the redirect URI to prevent authorization code interception. Use PKCE extension for mobile apps where you can't secure a client_secret.

Notes

This showcases the model's ability to translate technical concepts into accessible explanations while preserving accuracy. The response balances conceptual flow with security specifics. At $0.14/Mtok output pricing, longer explanatory responses like this remain cost-effective compared to premium models, making gpt-oss-20b suitable for documentation generation or internal training materials.

Prompt

Generate a SQL query to find the top 5 customers by total purchase amount in the last 90 days, including their email and the count of orders. Tables: customers (id, email, name), orders (id, customer_id, created_at, total_amount).

Output

```sql SELECT c.email, c.name, SUM(o.total_amount) AS total_spent, COUNT(o.id) AS order_count FROM customers c INNER JOIN orders o ON c.id = o.customer_id WHERE o.created_at >= CURRENT_DATE - INTERVAL '90 days' GROUP BY c.id, c.email, c.name ORDER BY total_spent DESC LIMIT 5; ``` This query joins customers with their orders, filters to the 90-day window, aggregates by customer, and returns the top 5 by spend. Note: `c.id` is included in GROUP BY to ensure correct grouping even if emails aren't unique.

Notes

The model produces syntactically correct SQL with appropriate aggregation logic. The query uses standard SQL with PostgreSQL-style interval syntax — the model doesn't clarify dialect assumptions, which could cause issues on MySQL or SQL Server. The 131K context window means you can paste entire schema definitions for more accurate queries, though this simple example doesn't require it.

Use-case deep-dives

High-volume customer support triage

When gpt-oss-20b makes sense for 24/7 ticket routing at scale

A 12-person SaaS company handling 800+ support tickets daily needs instant first-pass triage without breaking the budget. gpt-oss-20b delivers here: at $0.03/Mtok input, you can route every ticket through intent classification and urgency scoring for roughly $15/day assuming 200-token average tickets. The 131k context window means you can include the last 20 customer interactions in each classification call, catching escalation patterns most models miss at this price point. Output cost sits at $0.14/Mtok, so if you're generating short labels (not full responses), your monthly bill stays under $600 even at this volume. The trade-off: no public benchmarks means you'll spend a week testing accuracy against your existing routing rules before going live. If classification accuracy falls below 85% in your tests, step up to a benchmarked alternative.

Long-context research summarization

Why gpt-oss-20b handles 50-page policy documents without chunking

A 4-person policy consultancy turns 40-60 page government RFPs into 2-page executive briefs for clients, processing 15-20 documents monthly. gpt-oss-20b's 131k token window fits most full documents in a single call—no chunking, no context loss at section boundaries. At $0.03 input per million tokens, each 50-page document (roughly 35k tokens) costs about $1.05 to ingest, and a 1,500-token summary runs $0.21 to generate. Total per-document cost: $1.26 versus $4-8 for models with smaller windows that require map-reduce patterns. The risk: without published benchmark scores on long-context retrieval tasks, you're flying blind on whether it actually uses that full 131k effectively. Run a two-week pilot on past RFPs where you know the correct answer before committing your client workflow to this model.

Batch content moderation

When gpt-oss-20b wins on overnight comment queue processing

A 20-person online community platform wakes up to 3,000+ user comments each morning that need toxicity scoring before going live. gpt-oss-20b's pricing structure favors this batch pattern: you can process all 3,000 comments (averaging 80 tokens each) for roughly $7.20 in input costs, and if you're generating simple JSON verdicts (50 tokens per response), output runs another $2.10. Total daily moderation cost: under $10, or $300/month for a problem that would cost $800+ on mainstream alternatives. The 131k context window also lets you include the user's last 30 comments in each moderation call, catching repeat-offender patterns. The catch: this only works if you can tolerate 2-4 hour processing windows overnight. If you need real-time moderation (under 500ms), the lack of streaming support and unknown latency characteristics make this the wrong call.

Frequently asked

Is GPT-OSS-20B good for general text tasks?

Yes, it handles standard text generation, summarization, and Q&A well. The 131k token context window means you can process long documents without chunking. At $0.03 input / $0.14 output per Mtok, it's positioned as a budget option for high-volume workloads where you don't need frontier reasoning capabilities.

Is GPT-OSS-20B cheaper than GPT-4o mini?

Yes, significantly. GPT-4o mini costs $0.15 input / $0.60 output per Mtok — 5x more on input, 4x more on output. If you're running batch jobs or internal tools where GPT-4-class reasoning isn't required, OSS-20B saves real money. For customer-facing chat where quality matters, pay the premium.

Can GPT-OSS-20B handle code generation?

It can write basic scripts and fix syntax errors, but don't expect GPT-4 Turbo or Claude Sonnet performance. Without public benchmarks, assume it's closer to GPT-3.5 level — fine for boilerplate, weak on complex refactoring or architecture decisions. Use a dedicated code model if programming is the primary use case.

How does GPT-OSS-20B compare to the previous GPT-3.5 models?

OpenAI hasn't published direct comparisons, so treat this as a cost-optimized alternative rather than a capability upgrade. The 131k context window is the main advantage over GPT-3.5 Turbo's 16k. If you need long-context summarization at scale and can tolerate GPT-3.5-era quality, it's worth testing.

Should I use GPT-OSS-20B for high-volume content moderation?

Yes, if you're classifying or filtering text at scale. The low input cost makes it viable for scanning thousands of messages per hour. Pair it with a smaller classifier model for the first pass, then use OSS-20B for nuanced decisions. Latency should be acceptable for async queues, not real-time chat.

Data last verified 7 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.