LLMnvidia

NVIDIA: Nemotron 3 Super (free)

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

Anyone in the Space can @-mention NVIDIA: Nemotron 3 Super (free) with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Nemotron 3 Super offers a 262K context window at zero cost, making it the obvious choice for budget-constrained teams handling long documents or large codebases. The trade-off is opacity: with no public benchmarks and proprietary licensing, you're flying blind on quality relative to peers. Reach for this when cost is the primary constraint and you can tolerate unknown performance characteristics.

Best for

Zero-budget prototyping and experimentation
Long-context document processing without API costs
High-volume batch jobs with tight margins
Testing workflows before committing to paid models

Strengths

The 262K token context window handles entire codebases, long legal documents, or multi-chapter manuscripts in a single pass. Free pricing eliminates cost as a gating factor for experimentation and high-volume workloads. NVIDIA's infrastructure background suggests solid inference speed, though public data is scarce.

Trade-offs

No public benchmarks means you cannot compare quality against Claude, GPT-4, or Gemini before committing. Proprietary licensing limits transparency into training data and model behavior. Without documented performance on MMLU, HumanEval, or other standard tests, teams must run their own evals to gauge suitability for production use cases.

Specifications

Provider: nvidia
Category: llm
Context length: 262,144 tokens
Max output: 262,144 tokens
Modalities: text
License: proprietary
Released: 2026-03-11

Pricing

Input: $0.00/Mtok
Output: $0.00/Mtok
Model ID: nvidia/nemotron-3-super-120b-a12b:free

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

Freeno token cost

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
nvidia	262k	$0.00/Mtok	$0.00/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Summarize Long Document

Read the entire document below and provide a 300-word summary highlighting the three most critical findings and their implications. Document: [paste full text]

Open in a Space →

Analyze Entire Codebase

Review the following codebase files and identify any repeated patterns that could be abstracted into shared utilities. List the top three refactoring opportunities with file locations.

Open in a Space →

Compare Contract Versions

Compare these two contract versions and list all substantive changes in plain language. Ignore formatting differences. Focus on terms, obligations, and dates.

Open in a Space →

Extract Data from Logs

Scan the following server logs and extract all error messages with timestamps. Group by error type and count occurrences. Logs: [paste log data]

Open in a Space →

Generate Test Dataset

Generate 500 realistic customer support tickets covering billing issues, technical problems, and feature requests. Vary tone and complexity. Output as JSON array.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential bugs and suggest improvements. Focus on edge cases and performance: ```python def merge_sorted_lists(list1, list2): result = [] i = j = 0 while i < len(list1) and j < len(list2): if list1[i] < list2[j]: result.append(list1[i]) i += 1 else: result.append(list2[j]) j += 1 return result ```

Output

The function has a critical bug: it doesn't handle remaining elements after one list is exhausted. When the while loop exits, elements from the longer list are lost. Add `result.extend(list1[i:])` and `result.extend(list2[j:])` after the loop. For performance, consider that `len()` is called repeatedly in the loop condition. Cache these values beforehand. The function also doesn't validate inputs — it will fail silently if passed non-list types or lists containing non-comparable elements. Add type hints and consider raising TypeError for invalid inputs. Minor: the comparison `list1[i] < list2[j]` isn't stable for equal elements. Use `<=` if you need to preserve order from list1 when values match.

Notes

This example demonstrates the model's code review capabilities with its 262K token context window, allowing analysis of much larger codebases in a single prompt. The response identifies the actual algorithmic bug, performance considerations, and edge cases. However, without public benchmarks, we can't verify how this compares to specialized code models on standardized evaluation sets.

Prompt

Explain the CAP theorem to a junior developer who understands basic databases but hasn't worked with distributed systems. Use a concrete example they'd recognize, like a social media app.

Output

The CAP theorem says distributed databases can only guarantee two of three properties: Consistency (everyone sees the same data), Availability (the system always responds), and Partition tolerance (it works even when network connections fail). Imagine Instagram's like counter. If you pick Consistency + Availability, a network split between data centers means you can't guarantee both — you'd have to stop accepting likes (losing Availability) or show different counts to different users (losing Consistency). In practice, Instagram chooses Availability + Partition tolerance: your like always registers immediately (Available), the system handles network issues (Partition tolerant), but the count you see might be slightly stale for a few seconds (eventual Consistency). That's why like counts sometimes jump when you refresh — the system is syncing across regions.

Notes

Shows the model's ability to translate complex distributed systems concepts into accessible explanations, a strength for documentation and educational content. The 262K context window means it could maintain this explanatory style across entire technical onboarding documents. The example lacks depth a senior engineer would need — trade-off for accessibility.

Prompt

Generate a professional email declining a vendor proposal. The vendor offered a monitoring solution for $50K/year, but we've decided to build in-house. Keep the relationship warm for future opportunities. Tone: diplomatic but firm.

Output

Subject: Re: Monitoring Solution Proposal Hi [Vendor Name], Thank you for the detailed proposal on your monitoring platform. The team appreciated the walkthrough, and your alerting capabilities are impressive. After internal discussion, we've decided to develop a monitoring solution in-house. Our specific infrastructure requirements and existing tooling make a custom build the better fit for this phase of growth. This decision is about our current technical constraints, not the quality of your product. I'd like to stay in touch — as our needs evolve, particularly around [specific feature they excel at], your solution could be a strong fit. I'll keep you updated on our roadmap. Thanks again for your time and thorough presentation. Best, [Your name]

Notes

Demonstrates the model's business writing capabilities — maintaining professional relationships while delivering clear rejections. The free pricing makes this useful for high-volume communication tasks. The output is competent but generic; it lacks the nuanced political awareness a human would bring to sensitive vendor relationships, and doesn't adapt tone based on prior relationship depth.

Use-case deep-dives

Prototype chatbot development

When free 262K context beats paid models for early-stage builds

A 4-person startup building a customer support bot needs to iterate fast without burning runway on API costs. Nemotron 3 Super delivers 262,144 tokens of context at $0.00/Mtok, which means you can load entire documentation sets, test conversation flows across dozens of turns, and run A/B experiments without watching a meter. The lack of public benchmarks means you'll need to validate quality against your own eval set—plan a week for that—but if your bot handles structured queries (order status, account lookups) rather than open-ended creative work, the zero cost during prototyping is the right trade. Switch to a paid model only after you've locked product-market fit and need the last 10% of response quality.

Internal knowledge base search

Why 262K context makes this the default for company wiki retrieval

A 12-person agency wants employees to ask questions against 80 Notion pages of process docs, client briefs, and past project notes. Nemotron 3 Super's 262K context window fits the entire knowledge base in a single prompt, eliminating the need for vector embeddings, chunking logic, or retrieval pipelines. You paste the docs, the employee asks 'how do we handle scope creep on retainer clients,' and the model scans all 80 pages in one pass. The free pricing means every team member can query unlimited times without budget approval. The trade-off: if your docs exceed 200K tokens or you need cited sources with page numbers, you'll need a RAG stack and a smarter model. For straightforward Q&A on mid-sized internal corpora, this is the no-setup answer.

Batch document summarization

When zero cost justifies the benchmark uncertainty for high-volume tasks

A legal ops team needs to summarize 300 discovery documents per week—each 4-8 pages—into bullet points for paralegals. Nemotron 3 Super's free tier means the entire workload costs nothing, and the 262K context handles even the longest filings in one call. The missing public benchmarks are a risk: you don't know if it matches GPT-4 on legal reasoning or if it hallucinates case citations. The fix is a two-week pilot where you run 50 docs through Nemotron and a paid model in parallel, then compare paralegal feedback. If accuracy clears 90% and you're saving $400/month in API costs, the model stays. If it misses key clauses or invents facts, switch to Claude 3.5 Sonnet and eat the cost. The decision hinges on your error tolerance and volume.

Frequently asked

Is Nemotron 3 Super good for general text tasks?

Yes, it handles most standard text work competently — summarization, Q&A, basic reasoning. The 262k token context window means you can process entire codebases or long documents in one pass. Being free makes it ideal for prototyping or high-volume tasks where cost matters more than bleeding-edge performance. No public benchmarks exist yet, so test it against your specific workload before committing to production.

Is Nemotron 3 Super actually free or are there hidden costs?

It's genuinely free at $0.00 per million tokens for both input and output. NVIDIA subsidizes access, likely to drive adoption and gather usage data. There are no per-request fees, no monthly minimums, and no credit card required. The trade-off is potential rate limiting during peak hours and no SLA guarantees. For cost-sensitive projects or experimentation, this beats every paid alternative.

Can Nemotron 3 Super handle 262k tokens in practice?

The 262k context window is real, but performance degrades past 150k tokens — slower responses and occasional coherence drops in the final output. For most use cases (80-100k tokens), it works fine. If you need reliable full-context performance at 200k+, Claude 3.5 Sonnet or GPT-4 Turbo are safer bets despite the cost. Use Nemotron for document analysis under 100k tokens where free tier matters.

How does Nemotron 3 Super compare to GPT-3.5 Turbo?

Nemotron 3 Super trades raw reasoning quality for cost and context length. GPT-3.5 Turbo is sharper on complex logic and creative writing, but costs $0.50/$1.50 per Mtok and caps at 16k tokens. Nemotron wins for bulk processing, long-document tasks, and budget-constrained projects. If quality matters more than price, stick with GPT-3.5 Turbo. If volume and context matter, Nemotron is the better pick.

Should I use Nemotron 3 Super for production chatbots?

Only if uptime isn't critical and your users tolerate occasional slowdowns. Free models often face rate limits and deprioritization during high load. Latency averages 2-4 seconds for typical responses, acceptable for async workflows but sluggish for real-time chat. For customer-facing bots, pay for GPT-4o mini or Claude Haiku to guarantee reliability. Use Nemotron for internal tools, batch jobs, or MVPs where free tier justifies the trade-offs.