LLMgoogle

Google: Gemini 2.5 Pro Preview 06-05

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

Anyone in the Space can @-mention Google: Gemini 2.5 Pro Preview 06-05 with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Gemini 2.5 Pro is the model that made Google a real player on the frontier rather than a half-step behind. It's the one with native 1M-token context, native multimodal in/out, and a coding tier that holds its own against GPT-5 and Sonnet 4.7 on most benchmarks. What we notice: Gemini 2.5 Pro's context handling is in a different class — drop a 500-page PDF in and it actually answers from page 312. Vision is genuinely good, including for charts and diagrams (most models talk around them). The trade-off is voice: the writing is correct but a touch clinical, and it's more eager to caveat than the Claude or GPT lines. Best for: long-document synthesis where 200k context isn't enough (legal discovery, multi-source research, full codebases); multimodal work that mixes images, PDFs, and structured data; teams already on Google Cloud where the integration story is cleanest; any task where context matters more than personality. Avoid for: nuanced editorial writing (Sonnet has more taste); chatty interfaces where the model's voice matters; agent loops that need consistent personality across turns. Pricing frame: at $1.25/Mtok in, $10/Mtok out, a 5-person team at 200 daily messages lands around $55/month. Cheaper than Sonnet, materially cheaper than Opus or full GPT-5 — and you can fit a lot more in the prompt for the same dollar.

Best for

Processing entire codebases in one context
Multi-hour audio transcription and analysis
Long-form document summarization over 100K tokens
Multimodal tasks combining text, images, and audio
Research workflows requiring massive context retention

Strengths

The 1M token context window is the standout feature — you can load a novel-length document or hours of audio and query it without chunking. Multimodal support across four input types (text, image, audio, file) makes it versatile for mixed-media workflows. Input pricing at $1.25/Mtok undercuts Claude Opus while matching the context depth, making it economical for ingestion-heavy tasks. Google's infrastructure typically delivers fast streaming responses even with large contexts loaded.

Trade-offs

Output pricing at $10/Mtok is 4x the input rate, which penalizes generation-heavy tasks like code scaffolding or long-form writing. As a preview model, stability and availability are not guaranteed — expect potential API changes or deprecation. No public benchmarks means you cannot compare reasoning quality, instruction-following, or coding ability against peers like GPT-4o or Claude Sonnet. Multimodal performance on complex vision or audio tasks remains unvalidated in independent testing.

Specifications

Provider: google
Category: llm
Context length: 1,048,576 tokens
Max output: 65,536 tokens
Modalities: file, image, text, audio
License: LicenseRef-Google-Commercial
Released: 2025-06-05

Pricing

Input: $1.25/Mtok
Output: $10.00/Mtok
Model ID: google/gemini-2.5-pro-preview

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$68.20

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Paper

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Gemini Team, Petko Georgiev, Ving Ian Lei et al.

arXiv:2403.05530

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
google	1049k	$1.25/Mtok	$10.00/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Codebase Architecture Analysis

I've uploaded our full codebase as files. Map the architecture: identify core modules, trace data flow between components, and flag any circular dependencies or tight coupling patterns.

Open in a Space →

Multi-Hour Meeting Transcription

This audio file contains a 3-hour strategy meeting. Summarize key decisions, list action items with owners, and highlight any unresolved disagreements that need follow-up.

Open in a Space →

Legal Document Cross-Reference

I've provided three vendor contracts as files. Identify conflicting terms, compare liability clauses, and flag any non-standard indemnification language across all three.

Open in a Space →

Research Paper Synthesis

I've uploaded 20 research papers on transformer architectures. Synthesize the evolution of attention mechanisms from 2017 to now, noting which innovations became standard and which were abandoned.

Open in a Space →

Multimodal Product Feedback

I've provided user interview audio, app screenshots, and written survey responses. Identify the top three usability pain points and recommend specific UI changes to address them.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential race conditions in a multi-threaded environment. The function updates a shared cache dictionary without locks.

Output

This function has a critical race condition. Multiple threads can simultaneously check `if key not in cache` and all proceed to compute and write the value, wasting resources. Worse, dictionary updates in Python aren't atomic — concurrent writes can corrupt the internal hash table structure, causing crashes or silent data loss. You need either a threading.Lock around the check-and-set block, or switch to a thread-safe structure like queue.Queue for coordination. The 1M+ token context window here means you could paste your entire codebase for cross-file analysis, but this specific issue is local to the function's logic.

Notes

Demonstrates deep reasoning about concurrency primitives and Python internals. The model correctly identifies both the logical race and the implementation-level corruption risk. With 1M tokens available, it can analyse entire repositories, though this example shows it doesn't need massive context to catch subtle bugs.

Prompt

Generate a product requirements document for a mobile app that lets users scan receipts, extract line items, and auto-categorise expenses. Include user stories, technical constraints, and success metrics.

Output

**Product Requirements: Receipt Scanner App** **User Stories:** US-1: As a freelancer, I want to photograph receipts so I can track business expenses without manual entry. US-2: As a user, I want automatic categorisation so I spend less than 10 seconds per receipt. **Technical Constraints:** - OCR must handle crumpled receipts, low light (target: 85% field accuracy) - Offline mode required; sync when connected - GDPR-compliant data retention (30-day local cache, encrypted) **Success Metrics:** - Time-to-categorise < 10s (p95) - User retention > 40% at day-30 - Category accuracy > 90% after 20 receipts (ML learning curve) The model would continue with API requirements, edge cases, and a phased rollout plan, maintaining this structured, actionable format throughout.

Notes

Shows strong document generation with appropriate structure for a technical audience. The $10/Mtok output pricing makes long-form generation expensive compared to alternatives, but the quality of organisation and the inclusion of quantified constraints demonstrates planning capability beyond simple text completion.

Prompt

I'm attaching a 45-minute podcast transcript and three related research papers. Summarise the key disagreements between the podcast guests and the papers' authors on the topic of retrieval-augmented generation.

Output

The podcast guests argue RAG is over-hyped for production use, citing latency (their median p95: 3.2s) and relevance drift when indexes grow beyond 100K documents. They advocate fine-tuning smaller models instead. In contrast, Paper 1 (Chen et al.) reports sub-500ms retrieval with hierarchical indexes and claims RAG maintains accuracy at 10M+ documents. Paper 2 (Liu) agrees with the guests on latency but proposes hybrid approaches. Paper 3 (Torres) focuses on cost: fine-tuning requires retraining per domain, while RAG amortises that cost. The core disagreement is whether RAG's flexibility justifies its operational complexity — the papers assume engineering resources the podcast guests say small teams lack.

Notes

Illustrates multi-document synthesis across modalities (audio transcript + PDFs). The 1M token context window makes this feasible without chunking or summarisation pre-processing. However, at $1.25/Mtok input, a 45-minute transcript plus three papers could cost $0.50–1.00 per query, so this capability suits high-value research tasks rather than casual exploration.

Use-case deep-dives

Multi-hour podcast transcription editing

When 1M-token context makes audio workflows actually work

A 4-person content studio turns 90-minute podcast episodes into blog posts, social clips, and sponsor reports. Gemini 2.5 Pro Preview handles the full audio file plus show notes, guest bios, and prior episode context in one pass—no chunking, no lost references across segments. The 1,048,576-token window means you drop in 3 hours of transcribed audio (roughly 200k tokens) alongside 50 pages of brand guidelines and still have room for iterative edits. At $1.25/Mtok input, processing a 250k-token job costs $0.31, versus $1.50+ on GPT-4 Turbo. Output at $10/Mtok stings if you're generating 50k-word reports, but for 5k-word summaries the math works. If your studio ships 20+ episodes a month and needs consistent voice across long-form content, this is the call.

Legal contract redlining sessions

Why million-token context beats RAG for live negotiation

A 12-person law firm negotiates SaaS agreements where each round references 8-10 prior contract versions, internal playbooks, and opposing counsel's markup history. Gemini 2.5 Pro Preview loads the entire negotiation thread—400 pages of PDFs, emails, and redline docs—into a single context window, so the model sees every clause evolution without embedding lookups or retrieval lag. During a live call, the associate asks 'what's our fallback on the indemnity cap given their last counteroffer' and gets an answer grounded in the full history, not a vector-search guess. At $1.25 input per million tokens, a 600k-token session costs $0.75; output stays cheap if you're generating 2-3 page memos, not 40-page briefs. If your firm closes 5+ deals a month and negotiation speed matters more than rock-bottom per-token cost, this handles it.

Cross-repository code migration planning

When you need the whole monorepo in context, not snippets

A 9-person engineering team is splitting a 300k-line monorepo into microservices and needs to map every function call, shared utility, and config dependency before cutting code. Gemini 2.5 Pro Preview ingests the entire codebase—15 repos, 80k lines of Python, 120k lines of TypeScript, plus API specs and deployment scripts—in one context window. The team asks 'which services will break if we move the auth middleware' and the model traces imports across 40 files without losing the thread. No benchmark data yet means you're flying blind on code-specific accuracy versus GPT-4o or Claude 3.5 Sonnet, but the window size is unmatched. At $1.25/Mtok input, analyzing 500k tokens of code costs $0.63. If your migration is high-stakes and you'd rather pay for exhaustive context than risk a RAG miss, this is the tool—just budget time to validate outputs until public evals land.

Frequently asked

Is Gemini 2.5 Pro Preview good for long document analysis?

Yes. The 1M token context window handles entire codebases, legal documents, or research papers in a single prompt. At $1.25/Mtok input, processing a 500-page PDF costs roughly $0.60. The preview label means API stability isn't guaranteed, but the window size makes it viable for document-heavy workflows where GPT-4 Turbo's 128k would require chunking.

Is Gemini 2.5 Pro cheaper than GPT-4o for production use?

Input yes, output no. Gemini 2.5 Pro charges $1.25 input vs GPT-4o's $2.50, but output costs $10.00 vs $10.00 — they're tied there. For read-heavy tasks like summarisation or classification, Gemini wins. For generation-heavy work like content writing, the cost difference narrows. Factor in that this is a preview model without SLA guarantees.

Can Gemini 2.5 Pro process audio and video files directly?

Audio yes, video unclear. The model accepts audio and image modalities alongside text and file uploads. Google hasn't published whether it handles video natively or requires frame extraction. If you need video understanding today, GPT-4o and Claude 3.5 Sonnet have clearer documentation. Test the API directly if video is critical to your use case.

How does Gemini 2.5 Pro compare to Gemini 1.5 Pro?

No public benchmarks exist yet for 2.5 Pro Preview, so capability comparisons are speculative. The context window matches 1.5 Pro's 1M tokens. Pricing is identical. Google typically releases preview models to test architectural changes before general availability. Expect improvements in reasoning or multimodal understanding, but wait for benchmarks before migrating production workloads from the stable 1.5 Pro release.

Should I use Gemini 2.5 Pro Preview for customer-facing chatbots?

Not yet. Preview models lack the uptime SLAs and rate-limit guarantees production chat needs. The $10/Mtok output cost also makes high-volume chat expensive compared to GPT-4o-mini at $0.60. Use this for internal tooling or prototyping where occasional API changes won't break user experience. Switch to the stable Gemini 1.5 Pro or wait for 2.5 Pro's GA release for customer-facing deployments.

Compare with

Compare with anything else →