Google: Gemini 2.5 Pro Preview 06-05
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...
Anyone in the Space can @-mention Google: Gemini 2.5 Pro Preview 06-05 with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Processing entire codebases in one context
- Multi-hour audio transcription and analysis
- Long-form document summarization over 100K tokens
- Multimodal tasks combining text, images, and audio
- Research workflows requiring massive context retention
Strengths
The 1M token context window is the standout feature — you can load a novel-length document or hours of audio and query it without chunking. Multimodal support across four input types (text, image, audio, file) makes it versatile for mixed-media workflows. Input pricing at $1.25/Mtok undercuts Claude Opus while matching the context depth, making it economical for ingestion-heavy tasks. Google's infrastructure typically delivers fast streaming responses even with large contexts loaded.
Trade-offs
Output pricing at $10/Mtok is 4x the input rate, which penalizes generation-heavy tasks like code scaffolding or long-form writing. As a preview model, stability and availability are not guaranteed — expect potential API changes or deprecation. No public benchmarks means you cannot compare reasoning quality, instruction-following, or coding ability against peers like GPT-4o or Claude Sonnet. Multimodal performance on complex vision or audio tasks remains unvalidated in independent testing.
Specifications
- Provider
- Category
- llm
- Context length
- 1,048,576 tokens
- Max output
- 65,536 tokens
- Modalities
- file, image, text, audio
- License
- LicenseRef-Google-Commercial
- Released
- 2025-06-05
Pricing
- Input
- $1.25/Mtok
- Output
- $10.00/Mtok
- Model ID
google/gemini-2.5-pro-preview
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| 1049k | $1.25/Mtok | $10.00/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Codebase Architecture Analysis
I've uploaded our full codebase as files. Map the architecture: identify core modules, trace data flow between components, and flag any circular dependencies or tight coupling patterns.Open in a Space →
Multi-Hour Meeting Transcription
This audio file contains a 3-hour strategy meeting. Summarize key decisions, list action items with owners, and highlight any unresolved disagreements that need follow-up.Open in a Space →
Legal Document Cross-Reference
I've provided three vendor contracts as files. Identify conflicting terms, compare liability clauses, and flag any non-standard indemnification language across all three.Open in a Space →
Research Paper Synthesis
I've uploaded 20 research papers on transformer architectures. Synthesize the evolution of attention mechanisms from 2017 to now, noting which innovations became standard and which were abandoned.Open in a Space →
Multimodal Product Feedback
I've provided user interview audio, app screenshots, and written survey responses. Identify the top three usability pain points and recommend specific UI changes to address them.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Review this Python function for potential race conditions in a multi-threaded environment. The function updates a shared cache dictionary without locks.
This function has a critical race condition. Multiple threads can simultaneously check `if key not in cache` and all proceed to compute and write the value, wasting resources. Worse, dictionary updates in Python aren't atomic — concurrent writes can corrupt the internal hash table structure, causing crashes or silent data loss. You need either a threading.Lock around the check-and-set block, or switch to a thread-safe structure like queue.Queue for coordination. The 1M+ token context window here means you could paste your entire codebase for cross-file analysis, but this specific issue is local to the function's logic.
Demonstrates deep reasoning about concurrency primitives and Python internals. The model correctly identifies both the logical race and the implementation-level corruption risk. With 1M tokens available, it can analyse entire repositories, though this example shows it doesn't need massive context to catch subtle bugs.
Generate a product requirements document for a mobile app that lets users scan receipts, extract line items, and auto-categorise expenses. Include user stories, technical constraints, and success metrics.
**Product Requirements: Receipt Scanner App** **User Stories:** US-1: As a freelancer, I want to photograph receipts so I can track business expenses without manual entry. US-2: As a user, I want automatic categorisation so I spend less than 10 seconds per receipt. **Technical Constraints:** - OCR must handle crumpled receipts, low light (target: 85% field accuracy) - Offline mode required; sync when connected - GDPR-compliant data retention (30-day local cache, encrypted) **Success Metrics:** - Time-to-categorise < 10s (p95) - User retention > 40% at day-30 - Category accuracy > 90% after 20 receipts (ML learning curve) The model would continue with API requirements, edge cases, and a phased rollout plan, maintaining this structured, actionable format throughout.
Shows strong document generation with appropriate structure for a technical audience. The $10/Mtok output pricing makes long-form generation expensive compared to alternatives, but the quality of organisation and the inclusion of quantified constraints demonstrates planning capability beyond simple text completion.
I'm attaching a 45-minute podcast transcript and three related research papers. Summarise the key disagreements between the podcast guests and the papers' authors on the topic of retrieval-augmented generation.
The podcast guests argue RAG is over-hyped for production use, citing latency (their median p95: 3.2s) and relevance drift when indexes grow beyond 100K documents. They advocate fine-tuning smaller models instead. In contrast, Paper 1 (Chen et al.) reports sub-500ms retrieval with hierarchical indexes and claims RAG maintains accuracy at 10M+ documents. Paper 2 (Liu) agrees with the guests on latency but proposes hybrid approaches. Paper 3 (Torres) focuses on cost: fine-tuning requires retraining per domain, while RAG amortises that cost. The core disagreement is whether RAG's flexibility justifies its operational complexity — the papers assume engineering resources the podcast guests say small teams lack.
Illustrates multi-document synthesis across modalities (audio transcript + PDFs). The 1M token context window makes this feasible without chunking or summarisation pre-processing. However, at $1.25/Mtok input, a 45-minute transcript plus three papers could cost $0.50–1.00 per query, so this capability suits high-value research tasks rather than casual exploration.
Use-case deep-dives
When 1M-token context makes audio workflows actually work
A 4-person content studio turns 90-minute podcast episodes into blog posts, social clips, and sponsor reports. Gemini 2.5 Pro Preview handles the full audio file plus show notes, guest bios, and prior episode context in one pass—no chunking, no lost references across segments. The 1,048,576-token window means you drop in 3 hours of transcribed audio (roughly 200k tokens) alongside 50 pages of brand guidelines and still have room for iterative edits. At $1.25/Mtok input, processing a 250k-token job costs $0.31, versus $1.50+ on GPT-4 Turbo. Output at $10/Mtok stings if you're generating 50k-word reports, but for 5k-word summaries the math works. If your studio ships 20+ episodes a month and needs consistent voice across long-form content, this is the call.
Why million-token context beats RAG for live negotiation
A 12-person law firm negotiates SaaS agreements where each round references 8-10 prior contract versions, internal playbooks, and opposing counsel's markup history. Gemini 2.5 Pro Preview loads the entire negotiation thread—400 pages of PDFs, emails, and redline docs—into a single context window, so the model sees every clause evolution without embedding lookups or retrieval lag. During a live call, the associate asks 'what's our fallback on the indemnity cap given their last counteroffer' and gets an answer grounded in the full history, not a vector-search guess. At $1.25 input per million tokens, a 600k-token session costs $0.75; output stays cheap if you're generating 2-3 page memos, not 40-page briefs. If your firm closes 5+ deals a month and negotiation speed matters more than rock-bottom per-token cost, this handles it.
When you need the whole monorepo in context, not snippets
A 9-person engineering team is splitting a 300k-line monorepo into microservices and needs to map every function call, shared utility, and config dependency before cutting code. Gemini 2.5 Pro Preview ingests the entire codebase—15 repos, 80k lines of Python, 120k lines of TypeScript, plus API specs and deployment scripts—in one context window. The team asks 'which services will break if we move the auth middleware' and the model traces imports across 40 files without losing the thread. No benchmark data yet means you're flying blind on code-specific accuracy versus GPT-4o or Claude 3.5 Sonnet, but the window size is unmatched. At $1.25/Mtok input, analyzing 500k tokens of code costs $0.63. If your migration is high-stakes and you'd rather pay for exhaustive context than risk a RAG miss, this is the tool—just budget time to validate outputs until public evals land.
Frequently asked
Is Gemini 2.5 Pro Preview good for long document analysis?
Yes. The 1M token context window handles entire codebases, legal documents, or research papers in a single prompt. At $1.25/Mtok input, processing a 500-page PDF costs roughly $0.60. The preview label means API stability isn't guaranteed, but the window size makes it viable for document-heavy workflows where GPT-4 Turbo's 128k would require chunking.
Is Gemini 2.5 Pro cheaper than GPT-4o for production use?
Input yes, output no. Gemini 2.5 Pro charges $1.25 input vs GPT-4o's $2.50, but output costs $10.00 vs $10.00 — they're tied there. For read-heavy tasks like summarisation or classification, Gemini wins. For generation-heavy work like content writing, the cost difference narrows. Factor in that this is a preview model without SLA guarantees.
Can Gemini 2.5 Pro process audio and video files directly?
Audio yes, video unclear. The model accepts audio and image modalities alongside text and file uploads. Google hasn't published whether it handles video natively or requires frame extraction. If you need video understanding today, GPT-4o and Claude 3.5 Sonnet have clearer documentation. Test the API directly if video is critical to your use case.
How does Gemini 2.5 Pro compare to Gemini 1.5 Pro?
No public benchmarks exist yet for 2.5 Pro Preview, so capability comparisons are speculative. The context window matches 1.5 Pro's 1M tokens. Pricing is identical. Google typically releases preview models to test architectural changes before general availability. Expect improvements in reasoning or multimodal understanding, but wait for benchmarks before migrating production workloads from the stable 1.5 Pro release.
Should I use Gemini 2.5 Pro Preview for customer-facing chatbots?
Not yet. Preview models lack the uptime SLAs and rate-limit guarantees production chat needs. The $10/Mtok output cost also makes high-volume chat expensive compared to GPT-4o-mini at $0.60. Use this for internal tooling or prototyping where occasional API changes won't break user experience. Switch to the stable Gemini 1.5 Pro or wait for 2.5 Pro's GA release for customer-facing deployments.