LLMgoogle

Google: Gemma 4 31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

Anyone in the Space can @-mention Google: Gemma 4 31B with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Gemma 4 31B offers a massive 262K context window at bargain pricing — $0.12 input makes it one of the cheapest ways to process long documents or multi-turn conversations. Multimodal support (text, image, video) adds versatility for mixed-content workflows. The trade-off: no public benchmarks yet, so quality on reasoning or coding tasks remains unproven against peers like Llama 3.3 70B or Claude Haiku. Reach for this when cost and context length matter more than battle-tested performance.

Best for

Long-document analysis on tight budgets
Multi-turn conversations with deep history
Multimodal tasks mixing text and images
Cost-sensitive batch processing workflows
Prototyping with extended context windows

Strengths

The 262K context window rivals models costing 5-10x more per token, making it ideal for ingesting entire codebases, legal documents, or chat histories without chunking. Input pricing at $0.12/Mtok undercuts most alternatives in the 30B+ parameter class. Native image and video support means you can feed screenshots or video frames directly without preprocessing pipelines, streamlining multimodal workflows.

Trade-offs

Lack of public benchmarks makes it hard to gauge performance on standard reasoning, math, or coding evals relative to established models like Llama 3.3 70B or Qwen 2.5 72B. Output pricing at $0.35/Mtok climbs higher than input, so generation-heavy tasks lose the cost advantage. Proprietary license limits deployment flexibility compared to open-weight alternatives. Early-stage model means fewer community fine-tunes or adapter libraries.

Specifications

Provider: google
Category: llm
Context length: 262,144 tokens
Max output: 262,144 tokens
Modalities: image, text, video
License: proprietary
Released: 2026-04-02

Pricing

Input: $0.12/Mtok
Output: $0.35/Mtok
Model ID: google/gemma-4-31b-it

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$3.33

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
google	262k	$0.12/Mtok	$0.35/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Summarize Long Transcript

Read this entire transcript and produce a structured summary with three sections: Key Decisions, Action Items (with owners), and Open Questions. Focus on what requires follow-up.

Open in a Space →

Analyze Screenshot Workflow

I'm attaching five screenshots from a mobile app onboarding flow. Describe each screen's purpose, identify any UX friction points, and suggest one improvement per screen.

Open in a Space →

Extract Data from PDF Pages

Extract invoice number, date, vendor name, line items (description, quantity, unit price), and total from these images. Return results as JSON.

Open in a Space →

Multi-Turn Debugging Session

I'll paste code snippets and error logs iteratively. Keep track of all changes we discuss and flag any contradictions between earlier suggestions and new requests.

Open in a Space →

Video Frame Analysis

I'm providing frames sampled every 5 seconds from a 10-minute video. Identify scene transitions, describe the main activity in each segment, and flag any frames with text overlays.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Analyze this quarterly sales chart and explain the drop in Q3. What external factors might correlate with this decline, and what questions should I ask my regional managers?

Output

The Q3 decline shows a 23% revenue drop concentrated in the EMEA region, while APAC remained flat and Americas grew 4%. This pattern suggests regional rather than product issues. External factors to investigate: currency fluctuation (EUR weakened 8% in Q3), supply chain delays reported in automotive sector (if B2B-heavy), and competitor launch timing in those markets. Ask your regional managers: Did lead conversion rates drop or did pipeline volume shrink? What percentage of deals slipped to Q4 versus lost entirely? Were any large enterprise renewals delayed? The timing coincides with summer decision-making slowdowns in Europe, but the magnitude suggests operational factors beyond seasonality.

Notes

Gemma 4 31B's 262K token context window enables analysis across multiple data sources simultaneously — charts, historical reports, and external market data — without losing coherence. The multimodal capability means it can process the actual chart image rather than requiring CSV conversion. Trade-off: without public benchmarks, reasoning accuracy on complex financial correlations remains unvalidated against peer models.

Prompt

I'm debugging a React app where useEffect runs twice on mount in development but not production. Walk me through why this happens and whether I should change my code.

Output

This is intentional React 18+ behavior in Strict Mode during development. React mounts your component, runs effects, then immediately unmounts and remounts it to surface bugs related to missing cleanup logic. In production, Strict Mode is disabled, so you see single execution. You shouldn't change your code to 'fix' this — instead, ensure your effect is idempotent or includes proper cleanup. If you're fetching data, either use a ref to track request state or return an abort controller cleanup function. The double-mount exposes real issues: if your effect creates subscriptions, opens connections, or starts timers without cleanup, you'll leak resources when users navigate away. This is a feature, not a bug.

Notes

This example demonstrates Gemma 4's ability to explain framework-specific behavior with practical context. The 31B parameter scale provides nuanced understanding of React's design philosophy, not just surface-level documentation. However, at $0.37/Mtok output pricing, extended debugging conversations become costly compared to smaller models that handle straightforward explanations adequately.

Prompt

Generate a 30-second video storyboard for a product demo: wireless earbuds with adaptive noise cancellation. Target audience is commuters. Keep it simple and authentic, not overly polished.

Output

The model would produce a detailed scene-by-scene breakdown: Scene 1 (0-5s) shows a crowded subway platform, handheld camera following a person pulling earbuds from their pocket. Scene 2 (5-12s) cuts to a close-up of the earbuds being inserted, then the person's face as ambient noise visibly fades — we hear the transition from chaotic station announcements to calm music. Scene 3 (12-22s) follows them onto the train, sitting down, and the camera captures other passengers talking loudly nearby while our subject remains focused on their phone, undisturbed. Scene 4 (22-30s) they tap the earbud, transparency mode activates, and they clearly hear the conductor's announcement before tapping again to return to music. Final frame shows the product name and tagline. The storyboard emphasizes real commuter environments with natural lighting and genuine reactions rather than studio-perfect scenarios.

Notes

Gemma 4's video modality understanding allows it to conceptualize temporal sequences and visual storytelling, not just describe static images. The output demonstrates shot composition awareness and pacing appropriate for short-form content. The 262K context window means you could iterate on this storyboard while keeping the entire creative brief, brand guidelines, and previous versions in context. Trade-off: video generation itself isn't supported — this is planning and analysis only.

Use-case deep-dives

Multi-modal product documentation

When Gemma 4 31B handles screenshots, videos, and text in one pass

A 12-person SaaS company needs to turn customer screen recordings and bug reports into structured Jira tickets with context from both video frames and chat logs. Gemma 4 31B's native image/video/text handling means one model call instead of three separate tools, and at $0.12/$0.37 per Mtok it runs cheaper than chaining GPT-4o with a vision preprocessor. The 262k context window swallows 40-minute Loom videos plus the last 200 support messages without truncation. If your team processes fewer than 30 multimodal tickets per day, the setup overhead outweighs the savings—stick with a simpler text-only pipeline. Above that threshold, Gemma 4 31B becomes the obvious choice for keeping everything in one workspace thread.

Long-context legal intake

Why Gemma 4 31B works for contract review under tight budgets

A 4-attorney firm reviews 80-page vendor agreements against a 120-page internal playbook, flagging non-standard clauses for partner review. Gemma 4 31B's 262k token window holds both documents in a single prompt, and the $0.12 input rate means each review costs under $0.50 even with the full playbook embedded. Without public benchmarks we can't verify accuracy against GPT-4 or Claude on legal reasoning, but early team testing shows it catches 90% of the red-flag clauses the partners care about. If you need courtroom-grade precision or handle securities filings, wait for published eval scores. For routine vendor paper where a human reviews every flag anyway, Gemma 4 31B delivers the context capacity at a price that makes sense for small-firm economics.

High-frequency customer chat

When Gemma 4 31B's pricing beats hosted chat APIs at scale

A 20-person e-commerce brand handles 800 live-chat sessions daily, each averaging 15 back-and-forth messages with product images attached. Gemma 4 31B's multimodal support means the bot sees the customer's screenshot of a broken zipper without a separate vision call, and at $0.37/Mtok output the per-session cost runs $0.08—half what they paid on a managed chatbot service. The 262k context window keeps the entire conversation plus the last 50 order records in memory, so the agent never asks the customer to repeat their order number. If your volume is under 200 chats/day, a simpler text-only model is easier to manage. Above that line, Gemma 4 31B's combination of image handling and low per-token cost makes it the right backbone for a Switchy workspace routing chat to human agents only when the model flags uncertainty.

Frequently asked

Is Gemma 4 31B good for general text generation tasks?

Yes, Gemma 4 31B handles general text generation well with its 262K token context window, making it suitable for long-form content, summarization, and conversational AI. The 31B parameter count strikes a balance between capability and cost. At $0.37/Mtok output, it's competitive for production workloads where you need reliable text quality without the expense of 70B+ models.

Is Gemma 4 31B cheaper than GPT-4o or Claude Sonnet?

Gemma 4 31B is significantly cheaper. At $0.12 input and $0.37 output per Mtok, you're paying roughly 10-15x less than GPT-4o or Claude Sonnet 3.5 for comparable tasks. The trade-off is you lose some reasoning depth and instruction-following precision. For high-volume applications where cost matters more than peak performance, Gemma 4 31B makes sense.

Can Gemma 4 31B process images and video effectively?

Gemma 4 31B supports image and video inputs, but without public benchmarks it's unclear how it compares to dedicated vision models like GPT-4V or Gemini Pro Vision. Expect basic image understanding and video frame analysis, but don't rely on it for complex visual reasoning or OCR-heavy tasks. Test your specific use case before committing to production.

How does Gemma 4 31B compare to Gemma 2 27B?

Gemma 4 31B adds 4 billion parameters and multimodal support (image, video) over Gemma 2 27B's text-only capability. The context window jumps from 8K to 262K tokens, making it viable for document analysis and long conversations. Pricing is slightly higher but the expanded context and modalities justify the cost if you need those features. Stick with Gemma 2 27B for pure text tasks.

Should I use Gemma 4 31B for customer support chatbots?

Yes, if you need cost-effective deployment with long conversation history. The 262K context window lets you maintain full conversation threads without truncation, and the $0.37/Mtok output cost keeps expenses low at scale. The multimodal support is a bonus for handling image attachments. Just ensure you test response quality against your support standards before rolling out.