LLMqwen

Qwen: Qwen3 VL 235B A22B Thinking

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math....

Anyone in the Space can @-mention Qwen: Qwen3 VL 235B A22B Thinking with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Qwen3 VL 235B A22B Thinking is a vision-language model with a 131K context window and multimodal reasoning capability. At $0.26/$2.60 per Mtok, it sits in the mid-tier pricing band but lacks public benchmark data to validate performance claims. The 235B parameter count suggests strong capability, though the 'A22B Thinking' designation implies an active-parameter architecture that may trade raw throughput for reasoning depth. Reach for this when you need vision + text reasoning at scale and can tolerate sparse performance documentation.

Best for

  • Multimodal document analysis with images
  • Vision-language tasks requiring long context
  • Cost-sensitive image understanding workflows
  • Reasoning over screenshots and diagrams
  • Batch processing of visual content

Strengths

The 131K token context window handles lengthy multimodal documents without chunking. Pricing undercuts GPT-4o and Claude Sonnet 4 on input tokens by 60-75%, making it viable for high-volume vision tasks. The 235B parameter base suggests competitive capability on complex reasoning, and the 'Thinking' designation implies chain-of-thought or extended reasoning modes that can improve accuracy on multi-step problems.

Trade-offs

No public benchmark data makes it impossible to compare against Claude Sonnet 4.5, GPT-4o, or Gemini 1.5 Pro on standard vision-language tasks. The proprietary license limits deployment flexibility. Output pricing at $2.60/Mtok is 3-4x higher than input, penalizing verbose responses. Without MMMU, VQA, or OCRBench scores, you're flying blind on actual vision performance—plan to run your own evals before committing production workloads.

Specifications

Provider
qwen
Category
llm
Context length
131,072 tokens
Max output
32,768 tokens
Modalities
text, image
License
proprietary
Released
2025-09-23

Pricing

Input
$0.26/Mtok
Output
$2.60/Mtok
Model ID
qwen/qwen3-vl-235b-a22b-thinking

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$16.93
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
qwen131k$0.26/Mtok$2.60/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Analyze Technical Diagram

Examine this diagram and list every component, connection, and labeled value. Then explain the system's function in two sentences.
Open in a Space →

Multi-Page Document Summary

Summarize this 40-page report, highlighting key findings and any charts or tables that support the main argument.
Open in a Space →

Screenshot Debugging

Review this screenshot for usability problems, visual bugs, or accessibility issues. Prioritize by severity.
Open in a Space →

Invoice Data Extraction

Extract vendor name, invoice number, line items with quantities and prices, subtotal, tax, and total from this image.
Open in a Space →

Visual Reasoning Chain

This image shows a physics problem setup. Identify all forces, write the equations, then solve for the unknown variable step-by-step.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Analyze this product photo and suggest three specific improvements to lighting and composition that would increase conversion rates for an e-commerce listing.

Output

Looking at the current setup, the primary issue is the harsh overhead lighting creating a shadow under the product's base, which obscures detail and reduces perceived quality. First, add a diffused fill light at 45 degrees from the left to eliminate that shadow while maintaining definition. Second, shift the camera angle down by approximately 15 degrees—the current high angle makes the product appear smaller than it is. Third, introduce a subtle rim light from behind-right to separate the product from the background and add depth. The background itself is good—neutral without competing—but these lighting changes would lift the product's perceived value by 20-30% based on typical A/B testing results for similar categories.

Notes

This example demonstrates the model's multimodal reasoning—combining visual analysis with domain knowledge about e-commerce photography. The 131K context window allows it to reference detailed product photography principles while maintaining coherent recommendations. However, without published vision benchmarks, we can't quantify its accuracy against specialized vision models.

Prompt

I'm refactoring a Python service that processes 10M records daily. Walk me through converting this synchronous database query pattern to async/await without breaking existing error handling.

Output

Start by wrapping your current connection pool with `asyncpg` instead of `psycopg2`—the API surface is similar enough that you'll recognize the patterns. Your existing `try/except` blocks translate directly: `async with pool.acquire() as conn` replaces the synchronous context manager, and exceptions propagate the same way. The critical change is in your batch processing loop: instead of `for record in records`, you'll use `asyncio.gather()` with a semaphore to limit concurrency—I'd start with 50 concurrent queries based on your volume. Keep your existing retry logic but add `asyncio.sleep()` for backoff delays. One gotcha: if you're using SQLAlchemy ORM, you'll need the async extension, which changes session management significantly. For raw queries like yours, the migration is straightforward—expect 3-4x throughput improvement with the same error visibility.

Notes

Shows the model's ability to provide architectural guidance grounded in real-world constraints (10M records, existing error handling). The thinking mode likely helps it reason through migration trade-offs. At $2.60/Mtok output, this 150-word response costs ~$0.0004—reasonable for complex technical guidance, though repeated queries add up quickly.

Prompt

Explain why this quarterly revenue chart shows concerning trends even though top-line growth is 15% YoY. Focus on what the CFO should ask the sales team next week.

Output

The 15% headline growth masks two structural problems visible in the data. First, customer acquisition cost increased 28% quarter-over-quarter while average contract value only grew 6%—your unit economics are deteriorating. Second, the revenue composition shifted heavily toward Q4, with 61% of annual bookings landing in December versus a historical 35-40% pattern. This suggests either aggressive discounting to hit targets or deals pulled forward that would naturally close in Q1. The CFO should ask: (1) What percentage of Q4 deals included non-standard terms or extended payment schedules? (2) What's the 90-day retention rate for Q4 cohorts versus Q2-Q3? (3) Are sales comp plans creating incentives for quarter-end stuffing? The growth is real, but if it's coming from margin compression and timing games, you're building on sand.

Notes

Demonstrates analytical reasoning across both quantitative data and business context—a strength of large-context models that can hold multiple data series in working memory. The thinking component likely helps it identify non-obvious patterns. Trade-off: without vision benchmark data, we can't verify how accurately it would parse actual chart images versus described data.

Use-case deep-dives

Multi-page invoice extraction

When 131K context handles full vendor packet analysis without chunking

A 9-person procurement team processes 200+ vendor invoices weekly, each with 8-12 pages of line items, terms, and compliance attachments. Qwen3 VL 235B fits entire invoice packets in one 131K-token pass—no chunking, no context loss across pages. The vision layer reads tables and handwritten notes that pure-text models miss. At $0.26 input per million tokens, a 40-page packet (roughly 60K tokens with images) costs $0.016 to process. Output is expensive at $2.60/Mtok, so keep extraction templates tight: structured JSON, not verbose summaries. If your invoices average under 20 pages and you're processing 500+/day, the input savings beat GPT-4V. Above that volume, test whether the output cost (generating 2K tokens = $0.005/invoice) still pencils. Buy this model when document density and context continuity matter more than raw speed.

Technical support ticket triage

Why vision + reasoning works for screenshot-heavy support queues

A 14-person SaaS support team handles 400 tickets daily, half with user-submitted screenshots of error states, config panels, or broken UI. Qwen3 VL parses the image, reads error codes from the screenshot, cross-references against the text description, and routes to the right specialist—all in one model call. The 131K context window holds the full ticket thread (10-15 back-and-forth messages) plus 3-4 screenshots without truncation. Input cost is negligible: a typical ticket with 2 images and 5K tokens of text runs $0.0015. Output triage (300 tokens: severity, category, suggested owner) costs $0.00078. Total per-ticket cost under $0.0025 makes this viable at scale. The thinking layer helps with ambiguous cases where the screenshot contradicts the user's description. If your tickets are text-only, skip the vision tax and use a cheaper text model. If screenshots are central, this is the call.

Competitive intelligence research

When long-context reasoning beats multi-pass summarization for analyst reports

A 4-person market research consultancy synthesizes 80-page competitor filings, product roadmaps, and earnings transcripts into client briefings. Qwen3 VL's 131K window ingests the full document set in one pass—no lossy summarization chains, no hallucinated connections between sections 60 pages apart. The thinking layer surfaces non-obvious strategic pivots (a pricing change on page 12 that contradicts the growth narrative on page 71). Input cost for an 80-page PDF (roughly 120K tokens) is $0.031. Generating a 4K-token executive brief costs $0.0104. At $0.041 per report, you're paying for coherence across the full document. If your reports are under 30 pages, a cheaper 32K-context model will do. Above 60 pages, the single-pass accuracy justifies the spend. The vision modality is a bonus for charts and tables, but the real win is reasoning over the full context without stitching.

Frequently asked

Is Qwen3 VL 235B A22B Thinking good for vision-language tasks?

Yes, it's built for multimodal work combining text and images. The 131K token context window lets you process multiple images with detailed prompts in one request. Without public benchmarks we can't compare it directly to GPT-4V or Claude 3.5 Sonnet, but the 235B parameter count suggests strong reasoning capability for visual analysis, OCR, and document understanding.

Is Qwen3 VL 235B A22B Thinking cheaper than GPT-4 Turbo with vision?

Yes, significantly. At $0.26 input and $2.60 output per million tokens, it's roughly 10x cheaper than GPT-4 Turbo's vision pricing. The output cost is higher than input, so it works best for tasks where you send images and get concise responses rather than generating long text. For bulk image analysis, the savings add up fast.

Can Qwen3 VL 235B A22B Thinking handle long documents with images?

The 131K token context window gives you room for roughly 30-40 pages of mixed text and images, depending on image resolution. That's enough for most reports, slide decks, or technical documentation. You'll hit limits with full books or large datasets, but for typical business documents it handles the entire file in one pass without chunking.

How does the A22B Thinking variant differ from standard Qwen3 VL?

The A22B designation likely indicates an active parameter subset or distilled architecture from the full 235B model, optimized for inference speed. The "Thinking" label suggests extended chain-of-thought reasoning, similar to OpenAI's o1 approach. Without benchmarks, expect slower responses than standard Qwen3 VL but potentially better accuracy on complex visual reasoning tasks that benefit from step-by-step analysis.

Should I use Qwen3 VL 235B A22B Thinking for production image classification?

Only if you need complex reasoning over images. For simple classification or object detection, a fine-tuned vision model or smaller multimodal model will be faster and cheaper. Use this when you need the model to explain what it sees, compare multiple images, or extract structured data from complex visual layouts like invoices or diagrams.

Data last verified 8 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.