Qwen: Qwen Plus 0728
Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.
Anyone in the Space can @-mention Qwen: Qwen Plus 0728 with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- High-volume batch processing on tight budgets
- Long-context document analysis and summarization
- Internal tooling where cost scales matter
- Multilingual tasks with Chinese language support
- Prototyping before committing to pricier models
Strengths
The million-token context window puts entire codebases or technical manuals in scope without retrieval hacks. At $0.26 per million input tokens, you can run 3-4x the volume you'd afford with GPT-4o for the same spend. Qwen models historically punch above their weight on multilingual benchmarks, particularly Chinese-English tasks. The 0728 release suggests active iteration from Alibaba's research team, though public benchmark data remains sparse.
Trade-offs
Without published benchmark scores, you're flying blind relative to Claude or GPT-4o on reasoning-heavy tasks like multi-step math or complex code generation. Anecdotal reports suggest weaker instruction-following on edge cases compared to Anthropic's models. Alibaba Cloud's API reliability and latency vary by region—expect slower response times outside Asia-Pacific. The proprietary license locks you into their ecosystem with no self-hosting escape hatch if pricing or terms shift.
Specifications
- Provider
- qwen
- Category
- llm
- Context length
- 1,000,000 tokens
- Max output
- 32,768 tokens
- Modalities
- text
- License
- proprietary
- Released
- 2025-09-08
Pricing
- Input
- $0.26/Mtok
- Output
- $0.78/Mtok
- Model ID
qwen/qwen-plus-2025-07-28
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| qwen | 1000k | $0.26/Mtok | $0.78/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Summarize Technical Documentation
Read the entire API documentation below and produce a 2-page executive summary covering: core endpoints, authentication flow, rate limits, and common integration patterns. Prioritize clarity for non-technical stakeholders.Open in a Space →
Batch Process Customer Feedback
Analyze this customer support ticket. Output JSON with fields: sentiment (positive/neutral/negative), primary_issue (one of: billing, technical, feature_request, other), urgency (low/medium/high), and a 15-word summary.Open in a Space →
Translate Technical Content
Translate this Chinese technical specification to English. Preserve all technical terms, maintain formatting, and flag any ambiguous phrases where context is needed for accurate translation.Open in a Space →
Audit Codebase for Patterns
Review this codebase and list all instances where we're using deprecated library functions or insecure patterns (SQL injection risks, hardcoded secrets, unvalidated user input). Group findings by severity.Open in a Space →
Generate Internal Wiki Content
Write a 500-word onboarding guide for new engineers covering: repo structure, local dev setup, CI/CD pipeline, and where to find help. Use a friendly, direct tone and include a troubleshooting FAQ section.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Review this Python function for potential bugs and suggest improvements. Focus on edge cases and performance: def calculate_discount(price, discount_pct): return price - (price * discount_pct / 100)
The function has several issues: 1. **No input validation**: Negative prices or discount percentages above 100 will produce nonsensical results. Add checks: `if price < 0 or discount_pct < 0 or discount_pct > 100: raise ValueError()` 2. **Float precision**: For currency, use `Decimal` from the decimal module to avoid rounding errors like `19.99 * 0.15` giving unexpected results. 3. **Type hints missing**: Add `def calculate_discount(price: float, discount_pct: float) -> float:` for better IDE support. 4. **Performance**: The calculation is already O(1), but consider caching if called repeatedly with same discount_pct in a loop. Improved version would validate inputs, use Decimal for money calculations, and include docstring with examples.
This example shows the model's code review capability within its 1M token context window — useful for analyzing entire codebases. The response balances practical fixes with performance considerations. However, at $0.78/Mtok output, lengthy code reviews become expensive compared to models with lower output pricing.
Summarize the key regulatory changes in this 47-page EU AI Act compliance document, then create a checklist for our ML team to ensure our recommendation system meets the new requirements.
**Key Changes:** The Act introduces risk-based classification (unacceptable/high/limited/minimal). Your recommendation system likely falls under "high-risk" due to employment/credit decisions. New requirements: (1) human oversight mechanisms, (2) bias testing across demographic groups, (3) technical documentation for auditors, (4) incident reporting within 15 days. **ML Team Checklist:** - [ ] Implement override controls for human reviewers - [ ] Document training data sources and demographic composition - [ ] Run fairness metrics (demographic parity, equalized odds) quarterly - [ ] Establish audit trail for model decisions - [ ] Create incident response protocol - [ ] Designate compliance officer - [ ] Schedule external audit before Q2 2025 deadline Priority items are human oversight and bias testing — enforcement begins there.
Demonstrates long-context processing strength — the 1M token window handles entire regulatory documents without chunking. The model extracts actionable items and prioritizes them. The low input cost ($0.26/Mtok) makes this economical for document analysis workflows, though users should verify legal interpretations independently.
I'm teaching a workshop on transformer architecture to engineers who understand neural networks but haven't worked with attention mechanisms. Explain self-attention using a concrete analogy, then show the math.
**Analogy:** Imagine reading a sentence where each word "looks around" at all other words to understand its meaning. In "The animal didn't cross the street because it was too tired," the word "it" attends strongly to "animal" (not "street") by computing relevance scores. **The Math:** For input sequence X, we create three matrices: - Query (Q = XW_Q): "what am I looking for?" - Key (K = XW_K): "what do I contain?" - Value (V = XW_V): "what do I output?" Attention(Q,K,V) = softmax(QK^T / √d_k)V The QK^T multiplication scores how much each word should attend to others. Division by √d_k prevents gradient vanishing. Softmax normalizes to probabilities. Finally, we weight the Values by these attention scores. For your workshop, live-code a minimal implementation showing how "it" gets high attention weights for "animal."
Shows the model's ability to adapt technical explanations to audience level — mixing intuition with rigor. The teaching structure (analogy → math → practical tip) reflects pedagogical awareness. With text-only modality, the model can't generate diagrams that would strengthen this explanation, unlike multimodal alternatives.
Use-case deep-dives
When 1M context beats chunking for compliance teams
A 4-person compliance shop needs to cross-reference 200-page vendor contracts against internal policy docs without losing thread. Qwen Plus 0728 handles the full 1M token window at $0.26/Mtok input—you load the entire contract set, all policy manuals, and the question in one pass. No RAG pipeline, no chunking errors, no context-window gymnastics. Output at $0.78/Mtok means a 50k-token summary costs $0.04. The trade-off: if you're running this analysis 500 times a day, Claude 3.5 Sonnet's faster speed justifies the higher per-token cost. Below 100 runs daily, Qwen's window and price make it the straightforward call for teams that need to see everything at once.
High-volume support routing on a tight budget
A 12-person SaaS support team triages 800 inbound emails daily into Zendesk tags (billing, bug, feature request). Qwen Plus 0728 at $0.26 input handles the classification pass cheaply—each email averages 600 tokens, so 800 emails cost $0.12 in input tokens. Output is minimal (one tag per email, maybe 20 tokens), adding $0.01. Total daily spend: $0.13 versus $1.20 on GPT-4o. The model lacks public benchmarks, so test accuracy on 50 real emails before committing. If classification error rate exceeds 8%, step up to a proven model. Below that threshold, Qwen's price advantage funds itself in two days of operation.
When context size matters more than benchmark rank
A 20-person engineering team maintains 600k tokens of Confluence docs (architecture decisions, runbooks, onboarding guides). Qwen Plus 0728 loads the entire knowledge base in context, so engineers ask questions without pre-filtering which doc to search. At $0.26/Mtok input, a 600k-token load plus 1k-token question costs $0.16 per query. Output (2k-token answer) adds $0.002. The 1M window eliminates the retrieval step that RAG systems introduce—no embedding drift, no chunk boundary issues. The risk: without public benchmarks, you're trusting vendor claims on reasoning quality. Run a 20-question eval against your actual docs before rolling out. If answers match human-written responses 85%+ of the time, Qwen's window and price make it the default internal Q&A engine.
Frequently asked
Is Qwen Plus 0728 good for long-document analysis?
Yes. The 1M token context window handles entire codebases, legal documents, or research papers in a single prompt. At $0.26/Mtok input, you can process a 500-page document for about $0.13. The lack of public benchmarks means you should test it on your specific use case before committing to production workloads.
Is Qwen Plus 0728 cheaper than GPT-4o or Claude Sonnet?
Yes, significantly. At $0.26 input and $0.78 output per Mtok, Qwen Plus costs roughly 80-90% less than GPT-4o and Claude Sonnet 3.5 for equivalent token volumes. The trade-off is zero published benchmark data, so you're betting on Alibaba's internal testing. For high-volume, cost-sensitive applications where you can validate quality yourself, the savings are substantial.
Can Qwen Plus 0728 handle multilingual tasks effectively?
Likely yes, given Qwen's Chinese origin and Alibaba's focus on Asian markets. The model should excel at Chinese-English translation and mixed-language documents. However, without MMLU or multilingual benchmark scores, you'll need to run your own evals. If you're working primarily in Chinese, Japanese, or Korean alongside English, Qwen Plus is worth testing against Western alternatives.
How does Qwen Plus 0728 compare to the previous Qwen versions?
The 0728 date stamp suggests a July 2024 training cutoff, making this a recent iteration. Without benchmark deltas from earlier Qwen releases, we can't quantify improvements. The 1M context window is table stakes now, not a differentiator. If you're already using an older Qwen model, migration makes sense for the updated knowledge cutoff alone.
Should I use Qwen Plus 0728 for customer-facing chatbots?
Only after thorough testing. The pricing makes it attractive for high-volume chat, but the absence of safety benchmarks or published quality metrics is a red flag for customer-facing deployments. Start with internal tools or non-critical applications. Monitor for hallucinations, off-brand responses, and edge-case failures before exposing it to end users.