Qwen: Qwen3.6 35B A3B
Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated...
Anyone in the Space can @-mention Qwen: Qwen3.6 35B A3B with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Long-context document analysis with images
- Video content understanding and summarization
- Cost-sensitive multimodal workflows
- Extended conversations requiring full history
- Prototyping vision features before scaling
Strengths
The 262K token window puts this ahead of many models in its price tier for handling full-length documents, codebases, or conversation histories. Multimodal support across text, image, and video opens workflows that text-only models can't touch—screenshot analysis, video frame extraction, visual QA. At $0.15 input and $1.00 output per million tokens, it costs roughly half what you'd pay for comparable context from Claude or GPT-4 variants.
Trade-offs
Absence of public benchmarks means you can't compare MMLU, HumanEval, or multimodal scores against peers before committing. This makes initial evaluation slower—you'll need to run your own test cases to gauge reasoning quality, instruction-following, and vision accuracy. Proprietary license limits transparency into training data and fine-tuning options. Output pricing at $1.00/Mtok is higher than input, so verbose responses add cost quickly.
Specifications
- Provider
- qwen
- Category
- llm
- Context length
- 262,144 tokens
- Max output
- 262,144 tokens
- Modalities
- text, image, video
- License
- proprietary
- Released
- 2026-04-27
Pricing
- Input
- $0.15/Mtok
- Output
- $1.00/Mtok
- Model ID
qwen/qwen3.6-35b-a3b
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| qwen | 262k | $0.15/Mtok | $1.00/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Analyze Invoice Images
Extract all line items, subtotals, tax amounts, and vendor information from this invoice image. Format the output as a structured JSON object with fields for each data point.Open in a Space →
Summarize Video Content
Watch this video and provide a 3-paragraph summary covering the main topics discussed, key visual elements shown, and any actionable takeaways for the viewer.Open in a Space →
Long Document Q&A
I've uploaded a 200-page product manual. Answer this question by referencing specific sections and page numbers: What are the maintenance requirements for the hydraulic system?Open in a Space →
Screenshot Debugging
Review this application screenshot and identify any UI bugs, alignment issues, or accessibility problems. Provide specific recommendations for each issue found.Open in a Space →
Multi-Turn Conversation
Let's have a detailed conversation about designing a microservices architecture. I'll ask follow-up questions across multiple messages—retain all context from our discussion.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Review this Python function for potential bugs. It's supposed to merge two sorted lists but users report occasional duplicates in the output.
The function correctly iterates through both lists but lacks duplicate handling at the merge boundaries. When both lists contain the same value at comparison points, both get appended. Add a check: `if merged and merged[-1] == smaller_value: continue` before each append. Also, the final extend operations don't filter duplicates from remaining elements. For a complete fix, either track the last inserted value throughout or use a set-based deduplication pass afterward—though that breaks the sorted guarantee if you're not careful with ordering.
This example shows the model's code review capability within its 262K token context window—useful for analyzing functions alongside their test suites and call sites. The response identifies the root cause and suggests fixes, though it doesn't provide a corrected implementation unprompted. At $1.00/Mtok output, verbose explanations add cost quickly.
I have a product photo of a coffee mug on a white background. Describe three different lifestyle scene compositions I could generate to make it feel premium for an e-commerce listing.
**Scene 1:** Place the mug on a live-edge wooden desk beside an open laptop and scattered notebook, morning light streaming from the left. Shallow depth of field keeps the mug sharp while softening the background workspace. **Scene 2:** Overhead flat-lay with the mug centered on a linen tablecloth, surrounded by coffee beans, a small succulent, and a folded newspaper. Warm, diffused lighting suggests a weekend morning ritual. **Scene 3:** Hand holding the mug in a cozy reading nook—visible knit blanket, bookshelf blur in background. The composition draws the eye to the mug's handle and branding while conveying comfort.
Demonstrates multimodal understanding—the model processes the input image and generates contextually appropriate scene concepts. The descriptions are specific enough for a designer or AI image generator to execute. However, the model doesn't generate the actual images itself; it provides creative direction. The 262K context means you could include brand guidelines and multiple product angles in one prompt.
Summarize the key policy changes in this 47-page employee handbook, focusing on remote work, PTO accrual, and performance review cycles. Highlight anything that differs from typical US tech company standards.
**Remote work:** Requires manager approval for full-remote arrangements; hybrid default is 3 days in-office (stricter than many tech firms now offering remote-first). **PTO accrual:** Starts at 15 days annually, increases to 20 after three years. No separate sick leave pool—combined PTO model. Accrual caps at 30 days (standard practice). **Performance reviews:** Quarterly check-ins with formal reviews twice yearly, not annually. Calibration sessions happen at the director level, which is more frequent than the typical once-yearly cycle. **Notable difference:** The handbook requires two weeks' notice before taking more than five consecutive PTO days, which is more restrictive than unlimited-PTO policies common in tech.
This showcases the 262K token context window handling a long document in a single pass—no chunking or multiple API calls needed. The model extracts specific sections and compares against implicit industry knowledge. At $0.15/Mtok input, processing lengthy documents remains economical. The comparative analysis ("differs from typical") requires reasoning beyond simple extraction, though accuracy depends on the model's training data recency.
Use-case deep-dives
When you need one model to handle text, screenshots, and demo videos
A 4-person SaaS team ships features weekly and needs to turn Loom walkthroughs, Figma screenshots, and Slack threads into help docs. Qwen3.6 35B handles all three modalities in a single 262k-token context window, so you can drop a 10-minute video, six UI screenshots, and the original spec doc into one prompt and get a coherent article out. At $0.15 input per Mtok, processing a 50k-token mixed-media batch costs under a penny. The 35B parameter count sits between fast-but-shallow and slow-but-smart: you get coherent long-form output without waiting 30 seconds per response. If your docs process runs under 500 requests/day and you're stitching multiple content types together, this model keeps the workflow in one place without burning budget on a frontier model.
Where the 262k context window pays off in batch classification
A 12-person e-commerce support team gets 800 inbound emails daily and needs to route them to billing, shipping, or product teams before a human sees them. Qwen3.6 35B's 262k-token window lets you pack 200+ emails into a single API call with a classification schema, then parse the structured output in one pass. At $0.15 input and $1.00 output per Mtok, a 200-email batch (roughly 60k tokens in, 5k tokens out) costs about $0.014—call it 7 cents per thousand emails. The model lacks public benchmarks, so you'll want to run a 50-email test set against your actual support history before committing. If accuracy clears 92% and you're processing more than 300 emails/day, the per-unit economics beat both human pre-sort and multiple single-email API calls to a faster model.
When you need to compare 80-page agreements without chunking
A 3-person legal ops team reviews SaaS vendor contracts against a 40-page master playbook and needs to flag deviations in liability, data residency, and termination clauses. Qwen3.6 35B's 262k-token context fits both the playbook and a typical 80-page contract (roughly 100k tokens combined) in a single prompt, so the model sees every cross-reference without you writing chunking logic. The image modality handles scanned signature pages or redlined PDFs if the vendor sends those instead of Word docs. At $0.15 input per Mtok, a full contract review costs about $0.015 in input tokens; output runs $1.00 per Mtok but you're generating 2-3k tokens of redline notes, so total cost per contract is under $0.02. If you're reviewing fewer than 20 contracts/month, this beats paying for a legal-specific model or chunking a cheaper one.
Frequently asked
Is Qwen3.6 35B A3B good for general text tasks?
Yes, it handles most text generation, summarization, and analysis work competently. The 35B parameter count sits in the sweet spot for quality-vs-speed trade-offs. Without public benchmarks we can't compare it directly to GPT-4 or Claude, but Qwen models typically perform well on multilingual tasks and structured outputs. The 262K context window means you can process entire codebases or long documents in one shot.
Is Qwen3.6 35B A3B cheaper than GPT-4o?
Significantly cheaper. At $0.15 input and $1.00 output per million tokens, you're paying roughly 10-20x less than GPT-4o for most workloads. The gap widens on input-heavy tasks like document analysis. If you're processing high volumes and don't need absolute top-tier reasoning, this pricing makes Qwen3.6 a practical choice for production deployments where cost matters.
Can Qwen3.6 35B A3B process images and video?
Yes, it supports text, image, and video inputs natively. This makes it useful for multimodal workflows like analyzing screenshots, extracting data from charts, or describing video content. The 262K context window applies across modalities, so you can feed in multiple images or longer video clips. Expect decent performance on visual understanding tasks, though specialized vision models may outperform it on edge cases.
How does Qwen3.6 35B A3B compare to Qwen2.5?
The 3.6 series represents an incremental improvement over 2.5, with better instruction following and multimodal capabilities. The A3B variant specifically optimizes for efficiency without sacrificing quality. If you're already using Qwen2.5 and satisfied, the upgrade isn't urgent. If you're choosing fresh, go with 3.6 for the expanded context window and native video support that 2.5 lacks.
Should I use Qwen3.6 35B A3B for production chatbots?
Yes, if cost and context length matter more than absolute best-in-class responses. The 262K window handles long conversation histories without truncation, and the pricing lets you scale to thousands of users affordably. Latency should be acceptable for chat since 35B models typically respond in 1-3 seconds. Test it against your specific use case, but it's a solid production choice for customer support or internal tools.