LLMopenai

OpenAI: gpt-oss-20b (free)

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Anyone in the Space can @-mention OpenAI: gpt-oss-20b (free) with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

OpenAI's gpt-oss-20b is a free-tier model with a 128K context window, making it useful for prototyping and cost-sensitive workflows where zero spend matters more than performance. Expect capabilities below GPT-3.5-turbo—this trades accuracy and reasoning depth for unlimited free usage. Reach for it when you're testing prompts, building demos, or running high-volume tasks where occasional errors are acceptable and budget is locked at zero.

Best for

Prototyping prompts before production deployment
High-volume tasks with zero budget
Educational projects and experimentation
Simple classification or extraction jobs

Strengths

The 128K context window handles moderately long documents without chunking, and zero cost per token removes financial friction from experimentation. Useful for students, hobbyists, and teams validating ideas before committing to paid models. The OpenAI API integration means existing tooling works without modification, lowering switching costs for quick tests.

Trade-offs

Performance lags behind GPT-3.5-turbo and all frontier models—expect weaker reasoning on multi-step problems, less reliable instruction-following, and more frequent hallucinations. No public benchmarks exist, so you're flying blind on capability boundaries. Free tier often means rate limits and potential throttling during peak usage, making it unreliable for production workloads or time-sensitive tasks.

Specifications

Provider: openai
Category: llm
Context length: 131,072 tokens
Max output: 8,192 tokens
Modalities: text
License: proprietary
Released: 2025-08-05

Pricing

Input: $0.00/Mtok
Output: $0.00/Mtok
Model ID: openai/gpt-oss-20b:free

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

Freeno token cost

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
openai	131k	$0.00/Mtok	$0.00/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Extract Key Points

Read the following text and list the 5 most important points in bullet format. Keep each point to one sentence.

Open in a Space →

Draft Email Reply

Write a polite reply to this email. Keep it under 100 words and maintain a professional but friendly tone.

Open in a Space →

Categorize Support Tickets

Read this support ticket and assign it to one category: Billing, Technical, Feature Request, or General. Output only the category name.

Open in a Space →

Summarize Long Documents

Summarize this document in 200 words. Focus on methodology, key findings, and practical implications.

Open in a Space →

Generate Test Data

Generate 10 realistic customer reviews for a coffee shop. Vary the ratings from 1-5 stars and make each review 2-3 sentences.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential bugs. It's supposed to merge two sorted lists but users report occasional incorrect ordering in the output.

Output

The function assumes both input lists are already sorted, but doesn't validate this precondition. If either list contains unsorted elements, the merge logic will preserve that disorder. Additionally, the comparison operator on line 7 uses `>` instead of `>=`, which means equal elements from the second list always appear before those from the first list—this breaks stability. Consider adding input validation and switching to `>=` to maintain stable merge behavior when elements are equal.

Notes

This example shows the model's code review capability with a 131K token context window—useful for analyzing entire codebases in one prompt. As a free tier model, response speed may vary under load, and the lack of public benchmarks means quality on complex reasoning tasks is unverified compared to paid alternatives.

Prompt

Summarize the key regulatory changes in this 40-page compliance document, focusing on what our engineering team needs to implement by Q2.

Output

The document introduces three main changes: (1) Data retention policies now require 90-day audit logs for all API requests involving personal data, with specific fields mandated in section 4.2. (2) New encryption standards (AES-256-GCM minimum) apply to data at rest by March 31st. (3) Third-party data processors must complete certification by April 15th—this affects your current analytics vendor. The engineering team should prioritize the logging infrastructure first, as it has the longest implementation timeline and affects multiple services.

Notes

The 131K token context window handles long documents without chunking, making this model suitable for policy analysis and technical documentation review. Being free means no per-token cost for large document ingestion, though output quality on nuanced legal interpretation hasn't been independently benchmarked.

Prompt

Generate a customer support response template for users reporting slow dashboard load times. Keep it empathetic but technical enough to gather useful debugging info.

Output

Thanks for reporting the slow dashboard performance—we know how frustrating that is when you're trying to get work done. To help us investigate, could you share: (1) Which browser and version you're using, (2) Approximately how many projects are in your workspace, (3) Whether the slowness happens on initial load or when switching between views, and (4) Your approximate location or timezone. In the meantime, try clearing your browser cache or switching to a different browser as a temporary workaround. We'll prioritize this based on what we learn from your setup.

Notes

This demonstrates the model's ability to balance tone and technical precision in customer-facing content. The free pricing makes it viable for high-volume support template generation. However, without benchmark data, consistency across different writing styles and domains remains untested compared to established paid models.

Use-case deep-dives

Prototype chatbot development

When free inference makes sense for early-stage product testing

A 3-person startup building a customer support chatbot needs to run 200+ test conversations daily without burning runway on API costs. gpt-oss-20b at $0.00/Mtok lets you iterate on prompt engineering, test conversation flows, and validate your product hypothesis before committing to paid inference. The 131k context window handles full support ticket histories without truncation. The trade-off: no public benchmarks means you're flying blind on quality until you test it yourself. If your prototype shows traction and you need predictable response quality, budget for a switch to GPT-4o-mini or Claude Haiku. Use this model to prove the concept exists, not to ship production traffic.

Student project text generation

Free tier for academic work with no budget constraints

A university team building a research paper summarizer for their capstone project has zero budget and needs to process 50-100 papers per week. gpt-oss-20b gives them unlimited inference to experiment with summarization prompts and compare outputs across different paper types. The 131k window fits most academic papers in a single call. The risk: without benchmark data, you can't cite model performance in your final report or compare it to published baselines. If your project needs reproducible results or you're publishing findings, switch to a benchmarked model like Llama 3.1 8B where you can reference MMLU scores. For exploratory coursework where the process matters more than the metrics, this model removes the cost barrier entirely.

High-stakes legal document review

Why zero-cost inference is the wrong call for compliance work

A 12-person law firm wants to use AI for contract redlining to cut paralegal hours. gpt-oss-20b's free pricing looks attractive, but the absence of public benchmarks means you have no third-party validation of accuracy on legal reasoning tasks. Contract errors cost more than API fees—missing a liability clause in a $2M deal erases years of savings. Models like Claude 3.5 Sonnet score 88.7% on the GPQA benchmark (graduate-level reasoning) and cost $3/Mtok input, giving you auditable performance data and vendor SLAs. Use free models for internal meeting notes or draft emails, but any document that touches client risk or regulatory compliance needs a benchmarked model with a support contract behind it.

Frequently asked

Is gpt-oss-20b good for general text tasks?

Yes, with caveats. At 20B parameters and 131K context window, it handles most standard text work—summarization, Q&A, basic reasoning. No public benchmarks exist, so performance is unverified. Being free makes it worth testing for non-critical workloads, but expect lower quality than GPT-4 or Claude for complex tasks.

Is gpt-oss-20b actually free to use?

Yes. OpenAI lists this at $0.00 per million tokens for both input and output. This likely means experimental access or a community preview model. No usage caps are documented here, but OpenAI may throttle or sunset it without notice. For production, budget for paid alternatives.

Can gpt-oss-20b handle long documents with its 131K context?

Technically yes—131,072 tokens fits roughly 100,000 words. That's enough for most books or legal documents in one pass. However, smaller models often lose coherence past 50K tokens even when the window allows more. Test your specific document types before committing to long-context workflows.

How does gpt-oss-20b compare to GPT-3.5 Turbo?

Unknown without benchmarks. GPT-3.5 Turbo runs on a larger, production-tuned model with proven MMLU and HumanEval scores. This 20B variant is likely weaker but offers a longer context window (131K vs 16K) and zero cost. Use GPT-3.5 for quality; use this for experimentation or budget-constrained prototypes.

Should I use gpt-oss-20b for a customer-facing chatbot?

No. The lack of benchmarks, unclear support timeline, and 'oss' naming suggest this is experimental. Customer-facing apps need reliability and quality guarantees. Use GPT-4o-mini ($0.15/$0.60 per Mtok) or Claude Haiku instead—both cost pennies per conversation and deliver consistent, tested performance.