Qwen: Qwen3.5 397B A17B
The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...
Anyone in the Space can @-mention Qwen: Qwen3.5 397B A17B with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Cost-sensitive multimodal document processing
- Long-context video analysis tasks
- High-throughput text generation workloads
- Teams evaluating alternatives to GPT-4o
- Prototyping with large parameter models
Strengths
The 397B parameter architecture positions this as one of the larger openly-accessible models, likely delivering strong reasoning on complex tasks. Multimodal support spanning text, images, and video in a single model eliminates the need for separate vision pipelines. The 131K context window handles book-length documents or extended video transcripts without chunking. Pricing undercuts GPT-4o by roughly 50%, making it viable for high-volume applications where cost per token drives architecture decisions.
Trade-offs
Absence of public benchmark data means you're flying blind relative to Claude Sonnet 4.5 or GPT-4o on standard evals like MMLU, HumanEval, or MMMU. The proprietary license limits deployment flexibility compared to open-weight alternatives like Llama or Qwen's own Apache-licensed models. At 397B parameters, inference latency will be higher than smaller models unless you're hitting a heavily-optimized endpoint. Video understanding quality remains unproven without published results on benchmarks like Video-MME.
Specifications
- Provider
- qwen
- Category
- llm
- Context length
- 131,072 tokens
- Max output
- —
- Modalities
- text, image, video
- License
- proprietary
- Released
- 2026-02-16
Pricing
- Input
- $0.39/Mtok
- Output
- $2.45/Mtok
- Model ID
qwen/qwen3.5-397b-a17b
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| qwen | 131k | $0.39/Mtok | $2.45/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Analyze Multi-Page Contract
Review this contract and list all payment obligations, termination clauses, and liability caps. Flag any unusual terms that deviate from standard SaaS agreements.Open in a Space →
Summarize Video Transcript
Watch this 45-minute product demo video and create a bullet-point summary of the three main features discussed, including any technical limitations mentioned by the presenter.Open in a Space →
Debug Code with Context
Here's a Python repo with 8 files. The /api/users.py endpoint returns 500 errors intermittently. Trace the issue through the stack and suggest a fix.Open in a Space →
Extract Data from Scanned Forms
This image shows a handwritten intake form. Extract the patient name, date of birth, insurance provider, and chief complaint into a JSON object.Open in a Space →
Compare Product Screenshots
I'm attaching screenshots of our checkout flow and a competitor's. List three UX patterns they use that we don't, and assess whether each would improve our conversion rate.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Review this Python function for potential race conditions. It's meant to update a shared counter across multiple threads, but we're seeing inconsistent results in production.
The function lacks thread synchronization. When multiple threads call `increment_counter()` simultaneously, they can read the same value before any writes complete, causing lost updates. The read-modify-write sequence isn't atomic. Wrap the critical section with `threading.Lock()` or use `threading.local()` for per-thread counters if aggregation happens elsewhere. Also consider whether you need a lock-free approach with `multiprocessing.Value` if this runs across processes, not just threads.
This example shows the model's ability to diagnose concurrency issues with specific technical detail. The 262K token context window means you could paste entire codebases for this kind of review. However, without public benchmarks, we can't verify how it performs against specialized code models on standardized tests.
I have a 45-minute video of our product demo. Extract the key feature announcements with timestamps, then write a 200-word blog summary highlighting the three most significant updates.
The model would process the video input, identify segments where features are introduced (typically marked by visual transitions or presenter emphasis), and generate a structured list like: '03:24 - Real-time collaboration mode; 18:47 - API rate limit increase to 10K/min; 31:15 - Dark mode with custom themes.' The blog summary would synthesize these into narrative form, emphasizing user impact and technical improvements while maintaining the product's voice. The output would reference specific demo moments to ground claims in what was actually shown.
Multimodal capability across text, image, and video makes this model suitable for content workflows that span media types. The large context window handles long-form video analysis without chunking. Trade-off: at $2.34/Mtok output, generating extensive summaries from video adds up quickly compared to text-only models.
Translate this technical support ticket from Japanese, preserving the customer's tone and urgency. Then draft a response addressing their SSL certificate renewal concern.
Translation: 'Our SSL certificate expires in 3 days and the renewal process keeps failing with error code 403. This is blocking our production deployment. We've tried the documentation steps twice—still no success. Can someone help urgently?' Draft response: 'Thank you for reaching out. Error 403 during SSL renewal typically indicates a permissions issue with your domain verification file. Please confirm the `.well-known/acme-challenge/` directory is publicly accessible and not blocked by your firewall rules. I'm escalating this to our infrastructure team for immediate review given your timeline.'
Demonstrates multilingual understanding with context preservation—the model captures both technical content and emotional urgency. The follow-up task (drafting a response) shows practical workflow integration. Without benchmark data, we can't compare translation accuracy to specialized models, but the end-to-end capability reduces tool-switching.
Use-case deep-dives
When 262K context beats chunking for contract review teams
A 4-person legal ops team processing M&A due diligence needs to cross-reference clauses across 40+ contracts in a single pass. Qwen3.5 397B handles the entire document set in one 262K-token context window—no chunking, no embedding lookups, no missed cross-references. At $0.39/Mtok input, loading 200K tokens costs $0.08 per analysis run. The trade-off: $2.34/Mtok output makes this expensive if you're generating long summaries (a 5K-token report costs $0.012). If your workflow is read-heavy with short outputs—flagging conflicts, extracting key dates—this model eliminates the retrieval layer entirely. For teams running 20+ discovery sessions per week where context integrity matters more than output volume, the architectural simplicity justifies the cost.
Native video input for high-throughput safety review at scale
A 12-person trust & safety team at a UGC platform reviews 800 flagged videos daily, each 30-90 seconds. Qwen3.5 397B's native video modality means no separate transcription or frame-extraction step—upload the video, get policy violation analysis in one API call. At current pricing, processing a 60-second clip (roughly 15K tokens equivalent) costs ~$0.006 input. The model's 262K context also handles batch review: queue 15 videos in a single request for pattern detection across uploads. The threshold: if you're generating detailed incident reports (2K+ tokens per video), output costs add up fast at $2.34/Mtok. This works best for binary classification or short-form tagging where you're reading more than writing. For teams above 500 videos/day needing native multimodal without preprocessing overhead, this is the simplest pipeline.
When massive context beats RAG for internal knowledge bases
A 3-person DevRel team maintains 180K tokens of API docs, migration guides, and troubleshooting runbooks. Engineers ask questions like 'how does auth work across v2 and v3?' that require synthesizing 6+ documents. Qwen3.5 397B loads the entire knowledge base into context—no vector search, no retrieval ranking, no hallucinated citations from chunked results. Input cost for a full KB load is $0.07; if you're answering 50 questions/day with 500-token responses, daily output cost is $0.06. The model's 262K window also future-proofs growth (you can 2x your docs before hitting limits). The boundary: if your KB exceeds 220K tokens or you need sub-second response times, a RAG setup with a faster model wins. For small teams with dense, interconnected docs where answer accuracy trumps speed, this eliminates the retrieval layer's failure modes entirely.
Frequently asked
Is Qwen3.5 397B good for complex reasoning tasks?
Yes, the 397B parameter count puts it in the heavyweight class for reasoning. With 262K token context, it can handle long-form analysis and multi-step problems. The A17B designation suggests active parameters optimized for inference speed without sacrificing capability. Expect strong performance on math, code, and logic-heavy workflows where smaller models fail.
Is Qwen3.5 397B cheaper than GPT-4o or Claude Sonnet?
At $0.39 input and $2.34 output per million tokens, Qwen3.5 undercuts most frontier models significantly. GPT-4o runs roughly $2.50/$10.00 and Claude Sonnet 4 around $3.00/$15.00. For high-volume applications generating long outputs, Qwen3.5 costs 5-7x less. The trade-off is less brand recognition and fewer third-party integrations.
Can Qwen3.5 397B process images and video natively?
Yes, it supports text, image, and video inputs directly. This makes it viable for multimodal tasks like document analysis with charts, video content moderation, or visual question answering. The 262K context window means you can feed substantial visual content alongside text prompts without hitting limits early.
How does Qwen3.5 397B compare to Qwen2.5 72B?
The 397B model offers dramatically more parameters and reasoning depth than the 72B predecessor. Expect better performance on ambiguous instructions, nuanced writing, and complex multi-turn conversations. The cost is 3-4x higher per token, so use 72B for straightforward tasks and 397B when quality matters more than budget.
Should I use Qwen3.5 397B for production chatbots?
If latency isn't critical and you need strong multilingual support or cost efficiency at scale, yes. The A17B architecture suggests reasonable inference speed for its size. However, without public benchmark data, test thoroughly before deploying. For customer-facing chat where sub-second responses matter, consider smaller models or wait for latency benchmarks.