Google: Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...
Anyone in the Space can @-mention Google: Gemini 3.1 Pro Preview with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Long-document analysis under budget constraints
- Multimodal workflows with audio and video
- High-volume batch processing with large contexts
- Cost-sensitive file ingestion tasks
Strengths
The 1M-token window matches competitors at half the input cost of many alternatives, making it viable for summarizing entire codebases or legal documents in one pass. Native audio and video support means you skip preprocessing pipelines for multimodal content. The $12 output pricing sits below several peers, which matters for generation-heavy tasks like report writing from long transcripts.
Trade-offs
Preview status means no SLA and potential breaking changes as Google iterates toward general availability. Without published benchmarks, you can't compare reasoning quality against Claude Sonnet 4.5 or GPT-4o on MMLU or HumanEval. The model likely trails frontier options on complex math and code generation based on Google's historical positioning of preview releases. Multimodal quality on video remains unproven in third-party evals.
Specifications
- Provider
- Category
- llm
- Context length
- 1,048,576 tokens
- Max output
- 65,536 tokens
- Modalities
- audio, file, image, text, video
- License
- proprietary
- Released
- 2026-02-19
Pricing
- Input
- $2.00/Mtok
- Output
- $12.00/Mtok
- Model ID
google/gemini-3.1-pro-preview
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| 1049k | $2.00/Mtok | $12.00/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Summarize Legal Deposition
Read this entire deposition transcript and produce a 2-page summary covering: (1) main claims, (2) contradictions between witnesses, (3) unanswered questions. Use bullet points and cite page numbers.Open in a Space →
Extract Insights from Video
Watch this product demo video and list: (1) all features mentioned, (2) customer pain points the speaker addresses, (3) competitive comparisons made. Format as a table.Open in a Space →
Audit Codebase Documentation
Review all files in this codebase. Identify: (1) public functions missing docstrings, (2) inconsistent naming conventions, (3) files over 500 lines that should be split. Prioritize by impact.Open in a Space →
Batch Process Audio Transcripts
Transcribe this meeting audio and extract: (1) decisions made, (2) action items with owners, (3) unresolved questions. Format as markdown with timestamps.Open in a Space →
Analyze Financial Filings
Compare risk factors across these three 10-K filings. Highlight: (1) new risks introduced this year, (2) risks downplayed versus prior year, (3) industry-wide versus company-specific risks. Use a comparison table.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Analyze this 45-minute earnings call video and extract the three most significant forward-looking statements, with timestamps and confidence levels for each claim.
This example would demonstrate the model processing a full video file and returning structured analysis like: '1. Revenue guidance raise (00:12:34) - CFO states Q4 revenue expected to exceed $2.1B, up from $1.8B prior guidance. Confidence: High - specific figures with board approval mentioned. 2. Product launch timeline (00:28:17) - CEO commits to beta release in Q1 2025. Confidence: Medium - hedged with "barring unforeseen circumstances". 3. Market expansion (00:41:02) - COO discusses entering APAC markets by mid-2025. Confidence: Low - described as "exploring" rather than committed.' The response would include direct quotes and contextual cues from both speech and visual presentation slides.
The 1M+ token context window enables processing entire long-form videos in a single request without chunking. This showcases multimodal reasoning across speech, slides, and temporal context. However, without published benchmarks, accuracy on nuanced financial language and visual OCR remains unverified in production scenarios.
Review these 200 customer support tickets (attached CSV), identify recurring technical issues, and draft three knowledge base articles that would resolve the top problems. Include severity rankings.
This example would show the model ingesting a structured file, performing thematic analysis, and generating: 'Severity 1 (42 tickets): OAuth token expiration causing silent failures. Recommended KB article: "Troubleshooting Authentication Errors: Token Refresh Best Practices" - covers token lifecycle, refresh implementation, and error handling patterns. Severity 2 (31 tickets): Webhook delivery delays during peak traffic. Article: "Understanding Webhook Retry Logic and Rate Limits" - explains backoff algorithms and monitoring setup. Severity 3 (18 tickets): Confusion about API versioning in documentation. Article: "API Version Migration Guide" - side-by-side comparison of v2 vs v3 endpoints.' Each article outline would include code samples and troubleshooting flowcharts.
File upload support and large context capacity make this model suitable for bulk document analysis tasks that previously required preprocessing pipelines. The $12/Mtok output pricing means generating comprehensive documentation from large datasets becomes expensive quickly—budget $0.50-2.00 per analysis run depending on output verbosity.
Listen to this podcast episode about supply chain logistics. Create a technical summary for engineers, then rewrite it as a 60-second script for a TikTok explainer aimed at high school students.
This example would demonstrate audio processing with audience-aware rewriting. Technical summary: 'Episode covers just-in-time inventory optimization using predictive ML models. Key points: 1) Training data combines historical demand, weather patterns, and social media sentiment. 2) Model architecture uses transformer-based forecasting with 7-day lookahead. 3) Implementation reduced carrying costs 23% while maintaining 99.1% fulfillment rates. Discusses trade-offs between prediction accuracy and computational overhead.' TikTok script: 'Ever wonder how Amazon knows what to stock? They use AI that reads the weather, Twitter trends, and past orders to predict what you'll buy next week. It's like a crystal ball made of math. This saves millions in warehouse costs but needs massive computers running 24/7. Cool or creepy? Both.'
Audio input handling and style-shifting across technical registers highlights the model's multimodal flexibility. The preview status means production stability and consistent output quality aren't guaranteed—expect potential API changes or performance variations before general availability. Useful for prototyping multimodal workflows before committing to infrastructure.
Use-case deep-dives
When you need one model to parse PDFs, images, and video demos
A 12-person SaaS support team gets questions that reference screenshots, screen recordings, PDF manuals, and plain text tickets. Gemini 3.1 Pro Preview handles all four modalities in a single call, so you skip the routing logic that breaks context across separate vision and text models. The 1M token window means you can drop an entire product manual plus the last 50 support threads into one prompt. At $2 input / $12 output per Mtok, a typical multi-modal support query (20k tokens in, 800 tokens out) costs about $0.05. If your team closes 200+ tickets/day that mix formats, this model cuts integration overhead and keeps context intact across modalities.
When contract length exceeds what most models can hold in memory
A 4-person legal ops team reviews vendor agreements that average 180 pages (roughly 400k tokens with exhibits). Gemini 3.1 Pro Preview's 1M token context means the entire contract, all amendments, and a 50-page compliance checklist fit in a single prompt with room to spare. You avoid chunking strategies that lose cross-reference accuracy. Input cost is $0.80 per full contract review; output (a 3k token summary) adds $0.036. If you're processing 30+ contracts/month where missing a clause costs more than $25, the model pays for itself. Below that volume, a smaller context model with manual chunking is cheaper but riskier.
When you need to scan user-uploaded video for policy violations at scale
A 20-person community platform reviews 800 user-uploaded videos/day for TOS violations. Gemini 3.1 Pro Preview ingests video directly, so you skip the frame-extraction step that older pipelines require. A 90-second video (roughly 15k tokens equivalent) costs $0.03 input + $0.10 output for a 800-token violation report. At 800 videos/day, that's $104/day or $3,120/month. The model flags context that pure vision models miss—audio cues, on-screen text, and scene transitions—reducing false negatives by an estimated 18% compared to frame-sampling approaches. If your moderation backlog costs more than $3k/month in human review time, this model closes the gap.
Frequently asked
Is Gemini 3.1 Pro Preview good for general-purpose tasks?
Yes, Gemini 3.1 Pro Preview handles general text generation, analysis, and reasoning well. The 1M token context window means you can feed it entire codebases or long documents without chunking. It processes audio, images, video, and files natively, so you're not limited to text-only workflows. No public benchmarks yet since it's a preview release.
Is Gemini 3.1 Pro Preview cheaper than GPT-4o?
No. At $2 input and $12 output per million tokens, Gemini 3.1 Pro Preview costs roughly the same as GPT-4o for input but significantly more for output. If you're generating long responses, Claude Sonnet 3.5 at $3 input/$15 output or Haiku at $0.80/$4 will save you money. Use this when you need the massive context window, not for cost savings.
Can Gemini 3.1 Pro Preview handle video analysis in practice?
Yes, it accepts video as a native modality alongside audio, images, and files. The 1M token context means you can process longer videos without hitting limits. Without public benchmarks we can't compare accuracy to GPT-4o or Claude, but Google's multimodal track record suggests it handles video understanding competently for tasks like summarization, transcription, and content extraction.
How does Gemini 3.1 Pro Preview compare to Gemini 2.0 Flash?
Gemini 3.1 Pro Preview offers a much larger 1M token context versus Flash's typical 32k-128k range, and it's a full Pro-tier model rather than a speed-optimized variant. You'll pay more per token but get better reasoning and the ability to process massive inputs. Use Flash for quick tasks under 100k tokens; use 3.1 Pro Preview when context size matters.
Should I use Gemini 3.1 Pro Preview for production applications?
Not yet. The "Preview" label means Google may change behavior, pricing, or availability without notice. Use it for prototyping and testing multimodal workflows that need huge context windows, but keep GPT-4o or Claude Sonnet 3.5 as your production fallback. Wait for the stable release before committing production traffic to this model.