LLMqwen

Qwen: Qwen3.5 397B A17B

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...

Anyone in the Space can @-mention Qwen: Qwen3.5 397B A17B with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Qwen3.5 397B is a massive multimodal model with 131K context and aggressive pricing at $0.39/$2.45 per Mtok — roughly half the cost of GPT-4o. The 397B parameter count suggests strong reasoning capability, though public benchmarks aren't yet available to confirm performance against Claude or GPT-4 class models. Best for teams that need multimodal understanding across text, images, and video at scale, and are willing to validate performance in-house before committing production workloads.

Best for

Cost-sensitive multimodal document processing
Long-context video analysis tasks
High-throughput text generation workloads
Teams evaluating alternatives to GPT-4o
Prototyping with large parameter models

Strengths

The 397B parameter architecture positions this as one of the larger openly-accessible models, likely delivering strong reasoning on complex tasks. Multimodal support spanning text, images, and video in a single model eliminates the need for separate vision pipelines. The 131K context window handles book-length documents or extended video transcripts without chunking. Pricing undercuts GPT-4o by roughly 50%, making it viable for high-volume applications where cost per token drives architecture decisions.

Trade-offs

Absence of public benchmark data means you're flying blind relative to Claude Sonnet 4.5 or GPT-4o on standard evals like MMLU, HumanEval, or MMMU. The proprietary license limits deployment flexibility compared to open-weight alternatives like Llama or Qwen's own Apache-licensed models. At 397B parameters, inference latency will be higher than smaller models unless you're hitting a heavily-optimized endpoint. Video understanding quality remains unproven without published results on benchmarks like Video-MME.

Specifications

Provider: qwen
Category: llm
Context length: 131,072 tokens
Max output: —
Modalities: text, image, video
License: proprietary
Released: 2026-02-16

Pricing

Input: $0.39/Mtok
Output: $2.45/Mtok
Model ID: qwen/qwen3.5-397b-a17b

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$17.68

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
qwen	131k	$0.39/Mtok	$2.45/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Analyze Multi-Page Contract

Review this contract and list all payment obligations, termination clauses, and liability caps. Flag any unusual terms that deviate from standard SaaS agreements.

Open in a Space →

Summarize Video Transcript

Watch this 45-minute product demo video and create a bullet-point summary of the three main features discussed, including any technical limitations mentioned by the presenter.

Open in a Space →

Debug Code with Context

Here's a Python repo with 8 files. The /api/users.py endpoint returns 500 errors intermittently. Trace the issue through the stack and suggest a fix.

Open in a Space →

Extract Data from Scanned Forms

This image shows a handwritten intake form. Extract the patient name, date of birth, insurance provider, and chief complaint into a JSON object.

Open in a Space →

Compare Product Screenshots

I'm attaching screenshots of our checkout flow and a competitor's. List three UX patterns they use that we don't, and assess whether each would improve our conversion rate.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential race conditions. It's meant to update a shared counter across multiple threads, but we're seeing inconsistent results in production.

Output

The function lacks thread synchronization. When multiple threads call `increment_counter()` simultaneously, they can read the same value before any writes complete, causing lost updates. The read-modify-write sequence isn't atomic. Wrap the critical section with `threading.Lock()` or use `threading.local()` for per-thread counters if aggregation happens elsewhere. Also consider whether you need a lock-free approach with `multiprocessing.Value` if this runs across processes, not just threads.

Notes

This example shows the model's ability to diagnose concurrency issues with specific technical detail. The 262K token context window means you could paste entire codebases for this kind of review. However, without public benchmarks, we can't verify how it performs against specialized code models on standardized tests.

Prompt

I have a 45-minute video of our product demo. Extract the key feature announcements with timestamps, then write a 200-word blog summary highlighting the three most significant updates.

Output

The model would process the video input, identify segments where features are introduced (typically marked by visual transitions or presenter emphasis), and generate a structured list like: '03:24 - Real-time collaboration mode; 18:47 - API rate limit increase to 10K/min; 31:15 - Dark mode with custom themes.' The blog summary would synthesize these into narrative form, emphasizing user impact and technical improvements while maintaining the product's voice. The output would reference specific demo moments to ground claims in what was actually shown.

Notes

Multimodal capability across text, image, and video makes this model suitable for content workflows that span media types. The large context window handles long-form video analysis without chunking. Trade-off: at $2.34/Mtok output, generating extensive summaries from video adds up quickly compared to text-only models.

Prompt

Translate this technical support ticket from Japanese, preserving the customer's tone and urgency. Then draft a response addressing their SSL certificate renewal concern.

Output

Translation: 'Our SSL certificate expires in 3 days and the renewal process keeps failing with error code 403. This is blocking our production deployment. We've tried the documentation steps twice—still no success. Can someone help urgently?' Draft response: 'Thank you for reaching out. Error 403 during SSL renewal typically indicates a permissions issue with your domain verification file. Please confirm the `.well-known/acme-challenge/` directory is publicly accessible and not blocked by your firewall rules. I'm escalating this to our infrastructure team for immediate review given your timeline.'

Notes

Demonstrates multilingual understanding with context preservation—the model captures both technical content and emotional urgency. The follow-up task (drafting a response) shows practical workflow integration. Without benchmark data, we can't compare translation accuracy to specialized models, but the end-to-end capability reduces tool-switching.

Use-case deep-dives

Multi-document legal discovery

When 262K context beats chunking for contract review teams

A 4-person legal ops team processing M&A due diligence needs to cross-reference clauses across 40+ contracts in a single pass. Qwen3.5 397B handles the entire document set in one 262K-token context window—no chunking, no embedding lookups, no missed cross-references. At $0.39/Mtok input, loading 200K tokens costs $0.08 per analysis run. The trade-off: $2.34/Mtok output makes this expensive if you're generating long summaries (a 5K-token report costs $0.012). If your workflow is read-heavy with short outputs—flagging conflicts, extracting key dates—this model eliminates the retrieval layer entirely. For teams running 20+ discovery sessions per week where context integrity matters more than output volume, the architectural simplicity justifies the cost.

Video content moderation pipeline

Native video input for high-throughput safety review at scale

A 12-person trust & safety team at a UGC platform reviews 800 flagged videos daily, each 30-90 seconds. Qwen3.5 397B's native video modality means no separate transcription or frame-extraction step—upload the video, get policy violation analysis in one API call. At current pricing, processing a 60-second clip (roughly 15K tokens equivalent) costs ~$0.006 input. The model's 262K context also handles batch review: queue 15 videos in a single request for pattern detection across uploads. The threshold: if you're generating detailed incident reports (2K+ tokens per video), output costs add up fast at $2.34/Mtok. This works best for binary classification or short-form tagging where you're reading more than writing. For teams above 500 videos/day needing native multimodal without preprocessing overhead, this is the simplest pipeline.

Technical documentation synthesis

When massive context beats RAG for internal knowledge bases

A 3-person DevRel team maintains 180K tokens of API docs, migration guides, and troubleshooting runbooks. Engineers ask questions like 'how does auth work across v2 and v3?' that require synthesizing 6+ documents. Qwen3.5 397B loads the entire knowledge base into context—no vector search, no retrieval ranking, no hallucinated citations from chunked results. Input cost for a full KB load is $0.07; if you're answering 50 questions/day with 500-token responses, daily output cost is $0.06. The model's 262K window also future-proofs growth (you can 2x your docs before hitting limits). The boundary: if your KB exceeds 220K tokens or you need sub-second response times, a RAG setup with a faster model wins. For small teams with dense, interconnected docs where answer accuracy trumps speed, this eliminates the retrieval layer's failure modes entirely.

Frequently asked

Is Qwen3.5 397B good for complex reasoning tasks?

Yes, the 397B parameter count puts it in the heavyweight class for reasoning. With 262K token context, it can handle long-form analysis and multi-step problems. The A17B designation suggests active parameters optimized for inference speed without sacrificing capability. Expect strong performance on math, code, and logic-heavy workflows where smaller models fail.

Is Qwen3.5 397B cheaper than GPT-4o or Claude Sonnet?

At $0.39 input and $2.34 output per million tokens, Qwen3.5 undercuts most frontier models significantly. GPT-4o runs roughly $2.50/$10.00 and Claude Sonnet 4 around $3.00/$15.00. For high-volume applications generating long outputs, Qwen3.5 costs 5-7x less. The trade-off is less brand recognition and fewer third-party integrations.

Can Qwen3.5 397B process images and video natively?

Yes, it supports text, image, and video inputs directly. This makes it viable for multimodal tasks like document analysis with charts, video content moderation, or visual question answering. The 262K context window means you can feed substantial visual content alongside text prompts without hitting limits early.

How does Qwen3.5 397B compare to Qwen2.5 72B?

The 397B model offers dramatically more parameters and reasoning depth than the 72B predecessor. Expect better performance on ambiguous instructions, nuanced writing, and complex multi-turn conversations. The cost is 3-4x higher per token, so use 72B for straightforward tasks and 397B when quality matters more than budget.

Should I use Qwen3.5 397B for production chatbots?

If latency isn't critical and you need strong multilingual support or cost efficiency at scale, yes. The A17B architecture suggests reasonable inference speed for its size. However, without public benchmark data, test thoroughly before deploying. For customer-facing chat where sub-second responses matter, consider smaller models or wait for latency benchmarks.