LLMgoogle

Gemini 2.5 Flash

Fast, affordable Google model with thinking

Anyone in the Space can @-mention Gemini 2.5 Flash with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Gemini 2.5 Flash is the cheap-and-fast Google model with thinking. It's the model where the "fast" tier finally got smart enough to be useful for more than tagging — multi-step reasoning, structured output, even okay coding, all at sub-second latency. What we notice: 2.5 Flash with thinking enabled costs more per call but answers correctly more often. For routing, classification, and extraction, you can leave it off and pay nothing. For "summarise this thread and pick the right next action", flip thinking on and the price-per-correct-answer drops compared to running a full flagship. Best for: high-volume routing with occasional reasoning needs; long-context summarisation (1M tokens at this price is unique); multimodal extraction (images + PDFs + text in one call); cheap pre-filter before a Sonnet or GPT-5 call. Avoid for: nuanced writing (Sonnet 4.7 is in a different league); architecture or design work; deep research synthesis (the 2.5 Pro version is worth the upgrade for the harder reasoning). Pricing frame: at $0.15/Mtok in, $0.60/Mtok out, a 5-person team at 1,000 calls a day lands around $10-15/month. With thinking on for ~10% of calls, maybe $20. The new floor for "real model, not a toy."

Best for

  • Multimodal tasks with tight cost constraints
  • Processing video and audio transcripts
  • Long-context document analysis under budget
  • High-volume image classification workflows
  • Rapid prototyping with mixed media inputs

Strengths

The million-token context window handles entire codebases, long transcripts, or multi-document analysis in a single call. Multimodal support spans text, images, video, and audio—rare at this price point. Input pricing at $0.15/Mtok undercuts most competitors by 50-70%, making it viable for high-throughput pipelines. Native video understanding eliminates preprocessing steps for frame extraction or separate transcription.

Trade-offs

Flash-tier models typically sacrifice reasoning depth for speed and cost—expect weaker performance on multi-step logic, mathematical proofs, or nuanced creative writing compared to Gemini Pro or Claude Sonnet. Without public benchmarks, performance on standard evals like MMLU or HumanEval remains unverified. Output pricing at $0.60/Mtok climbs quickly for generation-heavy tasks, narrowing the cost advantage over mid-tier alternatives.

Specifications

Provider
google
Category
llm
Context length
1,048,576 tokens
Max output
65,536 tokens
Modalities
text, image, video, audio
License
proprietary
Released

Pricing

Input
$0.15/Mtok
Output
$0.60/Mtok
Model ID
google/gemini-2.5-flash-preview

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$5.02
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Paper
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Gemini Team, Petko Georgiev, Ving Ian Lei et al.
arXiv:2403.05530

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
google1049k$0.15/Mtok$0.60/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Video Content Summary

Watch this video and provide a structured summary: main topics discussed, key timestamps, and actionable takeaways. Format as bullet points.
Open in a Space →

Multi-Document Comparison

I'm providing three policy documents. Identify where they conflict, where they align, and flag any missing coverage areas. Present findings in a comparison table.
Open in a Space →

Image Batch Classification

Classify each of these product images into categories: Electronics, Apparel, Home Goods, or Other. Return results as a JSON array with filename and category.
Open in a Space →

Audio Transcript Analysis

Listen to this customer service call. Identify the customer's issue, the agent's proposed solution, and rate the interaction quality on a 1-5 scale with justification.
Open in a Space →

Codebase Context Query

I've provided the full source code for a web application. Explain how the authentication flow works, from login request to session creation, referencing specific files and functions.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Analyze this 30-second product demo video and write a technical brief for our engineering team explaining what features are shown, in what order, and any UX patterns worth noting.

Output

The video opens with a split-screen layout showing the mobile and desktop views simultaneously. At 0:03, the user taps the search icon, triggering a slide-up animation with elastic easing. The autocomplete dropdown appears at 0:05, displaying results grouped by category with subtle dividers. At 0:12, selecting a result triggers a page transition using a shared-element animation—the product thumbnail scales and repositions smoothly. The checkout flow (0:18-0:28) uses a stepped progress indicator at the top, with each completed step marked by a checkmark icon. Notable: the form validation is inline and non-blocking, showing errors only after the user moves to the next field.

Notes

This example highlights Gemini 2.5 Flash's video understanding capability across its million-token context window, letting it process longer clips without chunking. The model captures temporal sequencing and UI details accurately. Trade-off: at this price point ($0.60/Mtok output), video analysis costs more than text-only alternatives, so reserve it for cases where visual context genuinely matters.

Prompt

I'm debugging a React component that re-renders too often. Here's the component code, the parent component, and the Redux slice. Walk me through what's causing the issue and suggest a fix with minimal refactoring.

Output

The excessive re-renders stem from line 47 in `ParentComponent.tsx`, where you're passing `onClick={() => dispatch(updateFilter(id))}` as a prop. This creates a new function reference on every parent render, breaking referential equality for `ChildComponent`'s memoization. Your Redux slice looks fine—the issue isn't state shape. Minimal fix: extract that handler into a `useCallback` hook with `[dispatch, id]` dependencies, or move it outside the component if `id` is static. That preserves the function reference across renders. Alternative: if you're already using `React.memo` on the child, ensure you're also memoizing the other props (`filterOptions` on line 52 looks like it might be a new array each time—check the parent's render logic there too).

Notes

Demonstrates Gemini 2.5 Flash's ability to reason across multiple related files within a single prompt, leveraging the 1M-token context window for realistic debugging scenarios. The model identifies root causes and suggests pragmatic fixes without over-engineering. Trade-off: the response is concise but assumes familiarity with React's reconciliation model—less hand-holding than models tuned specifically for teaching.

Prompt

Summarize this 90-page legal contract (attached PDF) focusing on: termination clauses, liability caps, data retention obligations, and any unusual indemnification language. Flag sections that need lawyer review.

Output

**Termination:** Section 12.3 allows either party to terminate with 60 days' notice, but Section 12.4(b) includes an unusual auto-renewal clause—contract extends for successive 2-year terms unless terminated 120 days before renewal. **Liability:** Capped at 12 months of fees paid (Section 15.1), but excludes IP infringement and data breaches—uncapped exposure there. **Data retention:** Section 8.7 requires you to delete customer data within 30 days post-termination, but Section 8.9 contradicts this by allowing the vendor to retain anonymized data indefinitely for 'service improvement.' **Indemnification (FLAG):** Section 16.2(c) is non-standard—you indemnify the vendor for third-party claims arising from your 'use or misuse' of the service, with no carve-out for vendor negligence. This shifts risk unusually far in their favor. Recommend legal review of Sections 8.7-8.9 conflict and the entire Article 16.

Notes

Showcases the model's document analysis strength across long-form PDFs within the 1M-token context window, extracting structured insights and identifying contradictions. The $0.15/Mtok input pricing makes this cost-effective for contract review workflows. Trade-off: the model flags unusual clauses but doesn't provide case law or jurisdiction-specific risk assessment—still requires a human lawyer for final judgment.

Use-case deep-dives

Multi-format content moderation

When you need one model to handle text, image, and video flags

A 12-person community platform runs moderation queues that mix user posts, uploaded photos, and short video clips. Gemini 2.5 Flash handles all three in a single API call at $0.15/$0.60 per Mtok, which beats routing to separate text and vision models. The 1M token context window means you can batch 200+ moderation decisions in one request and include your full policy doc as reference material. If your video clips run over 90 seconds or you're processing 500+ items per minute, you'll hit rate limits and need to shard across multiple API keys. For teams under that threshold who want to collapse three vendor integrations into one, this is the call.

Long-context meeting transcription

Why this model works for all-day workshop summaries

A 4-person consulting firm records day-long client workshops and needs structured summaries with action items tied to specific speakers. Gemini 2.5 Flash's 1M token window fits 6-8 hours of transcript plus the firm's summary template in a single prompt, so you're not chunking or losing cross-references. At $0.15 input per Mtok, a 400k token transcript costs $0.06 to process—cheap enough to run on every session without budget anxiety. The audio modality means you can skip the transcription API and send raw files directly if you're under 60 minutes, though quality on accented speech isn't benchmarked yet. If you're doing this more than 50 times a month and need guaranteed accuracy, test against Whisper + GPT-4 first. Otherwise, this is the simplest stack.

High-frequency customer email triage

When sub-second response time matters more than benchmark scores

A 20-person e-commerce support team triages 800 inbound emails daily into urgent/standard/spam buckets before human review. Gemini 2.5 Flash runs classification in under 400ms at $0.15 per Mtok input, so the cost per email is functionally zero and the latency keeps the queue moving. The lack of public benchmarks means you can't compare accuracy to Claude or GPT-4o on paper, but if you're already using Gemini models elsewhere in your stack, the integration tax is lower than adding a new vendor. If classification errors cost you real money—chargebacks, missed SLAs—run a two-week shadow test against your current model before switching. For teams where speed and price trump provable accuracy, this is the default.

Frequently asked

Is Gemini 2.5 Flash good for general text tasks?

Yes, Gemini 2.5 Flash handles general text tasks well — summarization, Q&A, content generation. The 1M token context window lets you process entire codebases or long documents in one pass. It's fast and cheap at $0.15/$0.60 per Mtok, making it practical for high-volume workflows where you don't need the absolute ceiling performance of larger models.

Is Gemini 2.5 Flash cheaper than GPT-4o mini?

Yes. Gemini 2.5 Flash costs $0.15 input and $0.60 output per Mtok. GPT-4o mini runs $0.15 input but $0.60 output, so they're identical on paper. The real win is the 1M token context versus GPT-4o mini's 128k — you can fit 8× more data per call, reducing multi-turn overhead and total cost for document-heavy tasks.

Can Gemini 2.5 Flash process video and audio natively?

Yes. Gemini 2.5 Flash accepts text, image, video, and audio inputs natively. You can send a video file and ask questions about its content without transcribing or frame-extracting yourself. This makes it useful for media analysis, content moderation, or building multimodal chat interfaces where users upload mixed media types.

How does Gemini 2.5 Flash compare to Gemini 2.0 Flash?

We don't have public benchmarks yet for 2.5 Flash, so direct quality comparisons are speculative. Google typically ships Flash models as faster, cheaper siblings to their Pro line. Expect similar speed and cost to 2.0 Flash with incremental reasoning improvements. If you need proven benchmark scores, wait for third-party evals or stick with 2.0 Flash until data arrives.

Should I use Gemini 2.5 Flash for production chatbots?

Yes, if cost and latency matter more than peak accuracy. The $0.60/Mtok output price keeps conversational apps affordable at scale, and the 1M context lets you inject large knowledge bases or conversation histories without chunking. For customer support or internal tools where 95% accuracy beats 98% if it's 3× faster, Flash is a solid choice.

Compare with

Compare with anything else →
Data last verified 1317 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.