Qwen: Qwen3.5 Plus 2026-02-15
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...
Anyone in the Space can @-mention Qwen: Qwen3.5 Plus 2026-02-15 with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Cost-sensitive long-context analysis
- Multilingual content generation
- Processing entire codebases in one pass
- Video and image understanding tasks
- High-volume API workloads on budget
Strengths
The million-token context window rivals Gemini 1.5 Pro while pricing sits 60% below GPT-4o on input tokens. Multimodal support spans text, images, and video—rare at this price point. Qwen models historically excel at Chinese and other non-English languages, making this a strong pick for multilingual teams. The 2026-02-15 release suggests recent training data, reducing knowledge cutoff concerns that plague older checkpoints.
Trade-offs
Absence of public benchmarks means you're flying blind on math reasoning, code generation quality, and instruction-following compared to models with MMLU, HumanEval, or GPQA scores. Qwen models sometimes lag OpenAI and Anthropic on nuanced English tasks like creative writing or complex multi-step reasoning. Video understanding is newer territory for the Qwen family—expect less polish than GPT-4o or Gemini on video-heavy workflows until community testing validates performance.
Specifications
- Provider
- qwen
- Category
- llm
- Context length
- 1,000,000 tokens
- Max output
- 65,536 tokens
- Modalities
- text, image, video
- License
- proprietary
- Released
- 2026-02-16
Pricing
- Input
- $0.26/Mtok
- Output
- $1.56/Mtok
- Model ID
qwen/qwen3.5-plus-02-15
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| qwen | 1000k | $0.26/Mtok | $1.56/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Analyze Entire Codebase
Review this codebase for architectural issues, code smells, and refactoring opportunities. Focus on modularity, naming conventions, and potential performance bottlenecks. Provide a prioritized list of improvements.Open in a Space →
Multilingual Content Localization
Translate this product description into Spanish, French, and Simplified Chinese. Adapt idioms and cultural references for each market while preserving brand voice and key messaging.Open in a Space →
Long-Form Document Synthesis
Read these three research papers on transformer architectures and synthesize the key innovations, conflicting findings, and open research questions into a 500-word executive summary.Open in a Space →
Video Content Analysis
Watch this product demo video and create a bullet-point summary of features shown, UI elements highlighted, and any pain points the presenter addresses. Note timestamps for key moments.Open in a Space →
Cost-Optimized Data Extraction
Extract all company names, funding amounts, and investor names from this batch of startup news articles. Return results as JSON with fields: company, amount, currency, investors (array), date.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Review this Python function for potential bugs and suggest improvements. Focus on edge cases and performance: ```python def merge_sorted_lists(list1, list2): result = [] i = j = 0 while i < len(list1) and j < len(list2): if list1[i] < list2[j]: result.append(list1[i]) i += 1 else: result.append(list2[j]) j += 1 return result ```
The model identifies three issues: (1) the function silently drops remaining elements when one list is exhausted, requiring `result.extend(list1[i:])` and `result.extend(list2[j:])` after the loop; (2) repeated `len()` calls in the while condition are inefficient for large lists—caching lengths upfront would help; (3) no input validation means passing non-list types or unsorted lists produces undefined behavior. The model suggests adding type hints, a docstring explaining the sorted-input requirement, and unit tests covering empty lists and single-element cases. It rewrites the function with these fixes in a clean, commented style.
This example highlights the model's code review capabilities across correctness, performance, and maintainability. The 1M token context window means it could handle reviewing entire codebases in one prompt. However, without public benchmarks, we can't verify how its bug-detection accuracy compares to specialized code models like Claude or GPT-4.
I have 47 customer support tickets from last week. Summarize the top 3 recurring issues, estimate severity (low/medium/high), and draft a one-paragraph internal memo explaining what engineering should prioritize. The tickets mention login failures, slow dashboard loads, broken CSV exports, and minor UI glitches.
The model parses the ticket themes and returns: (1) Login failures (high severity)—affects user access entirely, mentioned in 18 tickets; (2) Slow dashboard loads (medium severity)—degrades experience but doesn't block work, 15 tickets; (3) Broken CSV exports (medium severity)—blocks reporting workflows, 9 tickets. It then drafts a concise memo: 'Engineering should prioritize login stability this sprint, as it's blocking nearly 40% of reported issues. Dashboard performance and CSV export bugs are next—both impact daily workflows but have workarounds. UI glitches can wait until the next maintenance cycle.' The tone is direct and action-oriented.
This showcases the model's ability to synthesize unstructured input into prioritized insights—a common business use case. The multimodal support (text, image, video) means users could paste screenshots of tickets or upload screen recordings. The $1.56/Mtok output pricing is steep for high-volume summarization tasks compared to cheaper alternatives like GPT-4o-mini.
Generate a product description for an e-commerce site. Product: ergonomic wireless mouse with 6 programmable buttons, 3200 DPI sensor, USB-C charging. Target audience: remote workers and gamers. Tone: professional but approachable. 80-100 words.
The model produces: 'Meet your new desk companion. This ergonomic wireless mouse adapts to marathon work sessions and intense gaming alike, with a precision 3200 DPI sensor that tracks every movement. Six programmable buttons let you customize shortcuts for your workflow—whether that's Slack commands or ability macros. The sculpted shape reduces wrist strain during long hours, and USB-C charging means no more battery swaps mid-meeting. It's the kind of tool that disappears into your routine, letting you focus on what matters.' The output hits the word count exactly and balances feature callouts with benefit-driven language.
This demonstrates the model's content generation for marketing copy, maintaining a consistent tone while weaving in technical specs naturally. The model follows the word-count constraint precisely, which matters for CMS character limits. However, the $0.26 input pricing makes it less economical than GPT-4o for bulk content tasks where you're generating hundreds of product descriptions per session.
Use-case deep-dives
When 1M-token context beats splitting contracts across calls
A 4-person legal ops team at a Series B SaaS company needs to compare vendor agreements against their master template—typically 8-12 contracts per review cycle, each 40-80 pages. Qwen3.5 Plus handles the entire batch in one context window (1M tokens = roughly 750k words), so you paste all contracts plus the template and ask for discrepancies in a single prompt. At $0.26/Mtok input, a 600k-token comparison costs $0.16 versus the $2-4 you'd spend routing chunks through GPT-4 or Claude. The trade-off: no public benchmarks yet, so test accuracy on your template language before committing. If your contracts involve edge-case legal reasoning where you need proven performance, wait for independent evals. Otherwise, the context capacity and price make this the default for high-volume document comparison where you can validate outputs.
Why multimodal input matters for small trust-and-safety teams
A 3-person moderation team at a creator platform reviews 200-300 flagged videos daily—each 30-90 seconds, flagged for potential policy violations. Qwen3.5 Plus ingests video directly (no transcript-only workaround), so you feed the raw file and get a violation assessment in one call. At $1.56/Mtok output, a 500-token moderation report costs under $0.001, meaning you can process the entire daily queue for under $0.50 in API costs. Compare that to hiring a transcription service ($0.10-0.25/minute) then running text-only models. The catch: video understanding is harder to benchmark than text, and you'll need a human-in-the-loop for the first 100 reviews to catch false negatives. If your platform is under 500 videos/day and you can afford that validation phase, this is the most cost-effective multimodal moderation path.
When sub-$0.30 input pricing unlocks always-on ticket routing
A 10-person support team at a B2B logistics tool gets 400-600 inbound tickets daily via email, Slack, and web form. They need instant routing to the right specialist (billing, integrations, or ops) without a human reading every message. Qwen3.5 Plus runs on every incoming ticket: you send the full message history (often 2k-5k tokens with quoted threads) and get back a category tag plus urgency score. At $0.26/Mtok input and $1.56/Mtok output, a 3k-input/200-output-token classification costs $0.001—cheap enough to run on 100% of tickets without budget anxiety. The risk: no public accuracy benchmarks, so you'll want to shadow-run it against your current manual triage for two weeks and measure misroute rate. If you're above 90% accuracy after tuning your prompt, turn it on and reclaim 6-8 hours/day of triage labor.
Frequently asked
Is Qwen3.5 Plus good for long document analysis?
Yes. The 1M token context window handles entire codebases, legal documents, or multi-hour meeting transcripts in a single prompt. At $0.26/Mtok input, processing 500K tokens costs $0.13 — cheaper than splitting across multiple calls. The multimodal support means you can mix PDFs, screenshots, and text without preprocessing.
Is Qwen3.5 Plus cheaper than GPT-4o or Claude Sonnet?
Significantly cheaper on input ($0.26 vs $2.50-$3.00/Mtok), but output costs $1.56/Mtok compared to $10-$15 for frontier models. If your workload is retrieval-heavy or summarization (high input, low output), Qwen wins on cost. For generation-heavy tasks like creative writing, the gap narrows but Qwen still undercuts competitors by 40-60%.
Can Qwen3.5 Plus handle video understanding tasks?
Yes, it accepts video input alongside text and images. Without public benchmarks, real-world performance is unverified, but the modality support exists. For production video analysis, test it against your specific use case — frame extraction quality and temporal reasoning vary across models claiming video support.
How does the 2026-02-15 version compare to earlier Qwen releases?
This is the Plus tier of Qwen3.5, released February 2026. Earlier Qwen2.5 models topped out at 128K context; this jumps to 1M tokens. Pricing is roughly 3-4x lower than Qwen2.5 Turbo was at launch. No public benchmarks yet, so capability improvements over Qwen2.5-72B or Qwen2.5-Coder remain unquantified.
Should I use Qwen3.5 Plus for production chatbots?
Depends on your latency and quality requirements. The pricing makes it viable for high-volume chat, but without MMLU, HumanEval, or MT-Bench scores, you're flying blind on reasoning and instruction-following quality. Run evals on your domain before committing. If cost matters more than cutting-edge performance, it's worth testing.