LLMx-ai

xAI: Grok 4.3

Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual...

Anyone in the Space can @-mention xAI: Grok 4.3 with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Grok 4.3 offers a 1M token context window at $1.25/$2.50 per Mtok — roughly half the cost of Claude Sonnet 4.5 for comparable context depth. Vision support and competitive pricing make it viable for document-heavy workflows where cost matters. The trade-off: no public benchmarks yet, so performance on reasoning-heavy tasks remains unproven against GPT-4o or Claude. Reach for this when you need long-context analysis on a budget and can tolerate some uncertainty on complex reasoning.

Best for

  • Long-context document analysis under budget
  • Vision tasks on screenshots and diagrams
  • Cost-sensitive multi-document summarization
  • Exploratory work with large codebases

Strengths

The 1M token window handles entire codebases or multi-document sets in a single call. At $1.25 input per Mtok, it undercuts Claude Sonnet 4.5 by roughly 50% for long-context use cases. Vision support lets you drop screenshots or diagrams directly into prompts. The pricing structure favors read-heavy workflows where you're feeding large contexts but generating shorter outputs.

Trade-offs

No public benchmarks means you're flying blind on reasoning quality relative to GPT-4o or Claude Sonnet 4.5. Early xAI models showed uneven performance on math and code — Grok 4.3 may have closed that gap, but we lack the data to confirm. If your task demands proven accuracy on MMLU or HumanEval-level reasoning, you're taking a risk. Output pricing at $2.50 per Mtok also penalizes generation-heavy tasks.

Specifications

Provider
x-ai
Category
llm
Context length
1,000,000 tokens
Max output
Modalities
text, image
License
proprietary
Released
2026-04-30

Pricing

Input
$1.25/Mtok
Output
$2.50/Mtok
Model ID
x-ai/grok-4.3

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$28.60
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
x-ai1000k$1.25/Mtok$2.50/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Analyze Codebase Architecture

Review the attached codebase. Describe the overall architecture in two paragraphs, then list the five most-coupled modules and suggest one refactor to reduce interdependencies.
Open in a Space →

Extract Data from Screenshot

The screenshot shows a dashboard with metrics. Extract all numerical values, their labels, and the timestamp. Return as a JSON object with keys for each metric.
Open in a Space →

Compare Contract Versions

Two contract versions are below. List every clause that changed between them, noting the section number and a one-sentence summary of what shifted.
Open in a Space →

Generate FAQ from Documentation

Read the attached product documentation. Write 10 FAQ entries that cover the most-common setup questions, each with a two-sentence answer.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential race conditions. It's meant to handle concurrent user sessions in a web app, but we're seeing intermittent data corruption in production.

Output

The core issue is in your session dictionary access pattern. You're checking `if session_id in active_sessions` then immediately accessing `active_sessions[session_id]` without a lock — another thread can delete that key between your check and access, causing a KeyError that corrupts your error-handling state. Wrap both operations in a threading.Lock context manager. Also, your timestamp update at line 47 isn't atomic; use a queue-based approach or Redis for production session management. The corruption you're seeing likely happens when two requests for the same session arrive within milliseconds.

Notes

Grok 4.3's 1M token context window lets it hold entire codebases in memory, making it effective at spotting patterns across multiple files. This example shows its strength in practical debugging — identifying the specific race condition and suggesting concrete fixes. The model tends toward direct, technical explanations without excessive hedging.

Prompt

I have a photo of a handwritten recipe from my grandmother. Can you extract the ingredients list and convert the measurements to metric? The handwriting is pretty messy.

Output

The model would process the uploaded image and produce a structured ingredients list, interpreting cursive handwriting with reasonable accuracy. It would identify '2 cups flour' as '250g flour', '1 stick butter' as '113g butter', and handle abbreviations like 'tsp' and 'tbsp' correctly. For ambiguous characters — say, a '1' that could be a '7' — it would flag uncertainty: 'baking powder: 1 tsp (or possibly 7 tsp — please verify, as 7 would be unusual)'. The output would be formatted as a clean markdown table.

Notes

The image understanding capability handles real-world document extraction tasks, not just caption generation. At $1.25/Mtok input, processing images is economical for batch workflows. However, without published vision benchmarks, accuracy on degraded or stylised handwriting remains unverified — test with your specific document types before committing to production use.

Prompt

Explain how gradient descent works, but write it as if you're teaching a 10-year-old who likes video games. Keep it under 100 words.

Output

Imagine you're blindfolded on a giant hill in a video game, trying to find the lowest valley to get treasure. You can only feel which way is downhill right where you're standing. So you take a small step downhill, feel again, take another step downhill, and keep going. Sometimes you take big steps when it's really steep, and tiny steps when it's almost flat. Eventually, you reach a low spot and stop. That's gradient descent — a computer 'feeling' which way makes its mistake smaller, then stepping that direction over and over until it's really good at its job.

Notes

This demonstrates Grok's ability to shift register and maintain constraints — the 10-year-old framing stays consistent, the video game metaphor is apt, and it hits exactly 98 words. The model handles creative reframing tasks well, though it sometimes over-explains when not given strict length limits. Useful for documentation, education, and customer-facing content where tone matters.

Use-case deep-dives

Million-token legal document review

When Grok 4.3 handles entire case files in one context

A 4-person litigation support team needs to cross-reference depositions, contracts, and email threads spanning 800K tokens without chunking or retrieval overhead. Grok 4.3's 1M-token context window means the entire case file loads in a single prompt—no vector database, no summary chains, no context-loss risk. At $1.25/Mtok input, a full-file review costs $1.00 per run, roughly half what you'd pay for comparable context on GPT-4 Turbo. The trade-off: no public benchmarks yet, so you're betting on xAI's reputation over proven accuracy scores. If your workflow already involves manual QA and you value context integrity over third-party validation, Grok 4.3 is the call for document-heavy legal work.

High-frequency customer support triage

Why Grok 4.3 loses to cheaper models on repetitive support

A 12-person SaaS support team fields 300 tickets daily, each requiring 2K-token context (ticket history, KB articles, product docs). Grok 4.3's $2.50/Mtok output pricing means each 500-token response costs $1.25 in generation alone—$375/day at scale. Compare that to Claude 3.5 Haiku at $1.00/Mtok output ($150/day) or GPT-4o mini at $0.60/Mtok ($90/day), both with proven accuracy on classification and summarization tasks. Grok 4.3's context window is overkill here; most support queries fit in 8K tokens. Unless you're also running image-based troubleshooting (screenshots, UI mockups) and need multimodal in the same model, route this workload to a cheaper specialist and save $200+/day.

Multimodal product research synthesis

Grok 4.3 for teams combining user screenshots with interview transcripts

A 5-person product team runs weekly user research: 40-minute interview transcripts (12K tokens each) plus 15-20 annotated screenshots per session. They need a single model to read the transcript, parse the UI screenshots, and generate a 2-page insight memo. Grok 4.3's text+image support means no model-switching or preprocessing—load the full session (transcript + images) in one 30K-token prompt, generate a 1K-token memo for $0.04 input + $0.0025 output per session. The 1M-token context also future-proofs for quarterly meta-analysis (combining 12 sessions in one prompt). Without public benchmarks, validate output quality on 3-5 sessions before committing. If the memos pass your editorial bar, Grok 4.3 is the simplest multimodal research stack at this price point.

Frequently asked

Is Grok 4.3 good for long-form content generation?

Yes, with a 1M token context window Grok 4.3 handles book-length documents, multi-chapter reports, and extended conversations without losing thread. It processes images alongside text, so you can feed it PDFs with diagrams or screenshots for analysis. The context size puts it in the same league as Claude and Gemini for document-heavy work.

Is Grok 4.3 cheaper than GPT-4o or Claude Sonnet?

Grok 4.3 costs $1.25/$2.50 per Mtok, making it cheaper than GPT-4o ($2.50/$10) on output and competitive with Claude Sonnet 4 ($3/$15). For high-output tasks like code generation or creative writing, you'll save 75% compared to GPT-4o. Input costs are roughly half of most frontier models.

Can Grok 4.3 handle technical code review and debugging?

Likely yes, though xAI hasn't published coding benchmarks yet. The 1M context window means it can ingest entire codebases, and the multimodal capability lets it read architecture diagrams or error screenshots. Without HumanEval or SWE-bench scores, you're testing blind — but the context size and pricing make it worth trying for large refactors.

How does Grok 4.3 compare to Grok 3 or earlier versions?

xAI hasn't released public benchmark deltas between Grok versions, so performance improvements are unverified. The 4.3 designation suggests iterative tuning rather than a ground-up rewrite. If you're already using Grok 3, test 4.3 on your actual prompts — version numbers alone don't tell you if reasoning, factuality, or instruction-following improved.

Should I use Grok 4.3 for real-time customer support chatbots?

Maybe, but latency and reliability are unknowns without published metrics. The pricing is attractive for high-volume chat, and the context window handles long conversation histories. However, xAI's API uptime and response-time SLAs aren't as battle-tested as OpenAI or Anthropic. Run a pilot with fallback to a proven model before committing production traffic.

Data last verified 7 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.