LLMgoogle

Google: Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic...

Anyone in the Space can @-mention Google: Gemini 3.1 Flash Lite with the team's shared context — pooled credits, one chat, one memory.

All models

Starter is free forever — 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Gemini 3.1 Flash Lite trades reasoning depth for speed and cost, making it Google's budget option for high-throughput multimodal tasks. With a 1M token context window and $0.25/Mtok input pricing, it handles large documents and media files cheaply, but expect weaker performance on complex reasoning compared to Flash or Pro variants. Reach for this when you need to process volume—transcripts, image batches, long PDFs—and can tolerate occasional logic gaps.

Best for

  • High-volume document classification
  • Batch image captioning and tagging
  • Cost-sensitive video transcription
  • Long transcript summarization
  • Rapid prototyping with multimodal inputs

Strengths

The 1M token context window paired with $0.25/Mtok input pricing makes Flash Lite the cheapest way to ingest large multimodal files in Google's lineup. It accepts text, images, video, audio, and files natively, eliminating preprocessing overhead. Output at $1.50/Mtok remains competitive for short responses. The model excels at straightforward extraction and summarization tasks where speed and cost matter more than nuanced reasoning.

Trade-offs

Flash Lite sacrifices reasoning capability for cost savings. Without public benchmarks, expect it to trail Gemini Flash and Pro on math, coding, and multi-step logic tasks. The model may struggle with ambiguous instructions or tasks requiring deep contextual understanding. Output quality on complex creative writing or technical analysis will likely disappoint compared to pricier alternatives. Best suited for high-volume, low-complexity workloads where occasional errors are acceptable.

Specifications

Provider
google
Category
llm
Context length
1,048,576 tokens
Max output
65,536 tokens
Modalities
text, image, video, file, audio
License
proprietary
Released
2026-05-07

Pricing

Input
$0.25/Mtok
Output
$1.50/Mtok
Model ID
google/gemini-3.1-flash-lite

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool — one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$11.00
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool — one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
google1049k$0.25/Mtok$1.50/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Extract Invoice Line Items

Extract all line items from this invoice image into a JSON array with fields: description, quantity, unit_price, total. Return only valid JSON.
Open in a Space →

Summarize Meeting Transcript

Summarize this meeting transcript in 5 bullet points covering key decisions, action items, and owners. Keep each bullet under 20 words.
Open in a Space →

Tag Product Images

Generate 8-10 descriptive tags for this product image suitable for e-commerce search. Focus on visible attributes like color, material, style, and category.
Open in a Space →

Classify Support Tickets

Classify this support ticket into one category: Billing, Technical, Account, or Shipping. Respond with only the category name and a one-sentence reason.
Open in a Space →

Generate Video Scene Descriptions

Watch this video and describe each major scene change in one sentence. Include timestamps and focus on visible actions and objects.
Open in a Space →

Example outputs

Illustrative — representative of the model's voice and quality, not literal recordings.

Prompt

Analyze this quarterly sales spreadsheet and identify the three product categories with declining month-over-month growth. Explain what might be causing each trend.

Output

Looking at your Q3 data, three categories show consistent MoM decline: Electronics (-12% Aug, -8% Sep), Home & Garden (-6% Aug, -9% Sep), and Sporting Goods (-4% both months). Electronics likely reflects seasonal back-to-school saturation—purchases front-loaded in July. Home & Garden's steeper September drop aligns with end-of-summer outdoor project completion. Sporting Goods shows steadier decline, possibly indicating broader discretionary spending pullback rather than seasonality. I'd recommend comparing these patterns against last year's Q3 to separate cyclical from structural issues.

Notes

The 1M+ token context window handles large spreadsheets without truncation, making this ideal for financial analysis workflows. The model synthesizes trends across multiple data points and offers plausible business reasoning. At $0.25/Mtok input, processing entire quarterly datasets remains cost-effective, though output quality for nuanced strategic recommendations may require validation against specialized financial models.

Prompt

I'm attaching a 45-minute lecture video on neural network architectures. Create a structured study guide with timestamps for each major concept and three practice questions per section.

Output

The model would process the video and produce a hierarchical study guide: Section 1 - Perceptrons & Activation Functions (00:00-08:30) covers basic neuron structure, sigmoid/ReLU comparisons, and the XOR problem, with practice questions on activation function selection. Section 2 - Backpropagation Mechanics (08:30-22:15) breaks down gradient descent, chain rule application, and vanishing gradients, including questions on derivative calculations. Section 3 - CNN Architectures (22:15-35:40) details convolution operations, pooling strategies, and ResNet skip connections. Section 4 - Training Strategies (35:40-45:00) addresses batch normalization, dropout, and learning rate scheduling. Each section includes timestamp references and progressively difficult questions.

Notes

Multimodal input (video + audio transcription) lets the model extract both visual diagrams and spoken explanations, creating richer study materials than text-only models. The massive context window accommodates full lecture transcripts without chunking. However, the 'Lite' designation suggests this may prioritize speed over depth—complex mathematical derivations in practice questions might lack the rigor of larger Gemini variants.

Prompt

Review this 80-page technical specification document for our API and generate a migration guide for developers moving from v2 to v3, focusing on breaking changes.

Output

The model would produce a developer-focused migration guide identifying breaking changes across authentication (OAuth 2.0 now mandatory, API keys deprecated), endpoint restructuring (REST paths now follow /v3/resources/{id}/actions pattern), payload format changes (snake_case replaced with camelCase), rate limiting (per-endpoint quotas replace global limits), and error response schemas (now RFC 7807 compliant). Each section includes side-by-side v2/v3 code examples, estimated migration effort, and rollback considerations. The guide highlights that webhook signature verification now requires HMAC-SHA256 instead of SHA1, and batch endpoints have new 100-item limits.

Notes

Document understanding across 80 pages demonstrates the practical value of the extended context window for enterprise documentation tasks. At $1.50/Mtok output, generating comprehensive migration guides remains economical compared to manual technical writing. The 'Flash' branding suggests optimized inference speed, making this suitable for iterative documentation workflows where developers need quick answers, though extremely subtle API contract nuances might benefit from human technical writer review.

Use-case deep-dives

Startup support ticket triage

When Gemini Flash Lite handles 200+ daily tickets under budget

A 4-person SaaS startup gets 200-300 support emails daily and needs to route them to the right person without hiring a coordinator. Gemini Flash Lite is the right call here: at $0.25/Mtok input, you're paying roughly $0.15/day to classify and tag every ticket with context from your help docs. The 1M token window means you can drop your entire knowledge base into each prompt without chunking strategies. Output cost is $1.50/Mtok, but triage responses are short—most tickets generate under 100 tokens of routing metadata. If your ticket volume crosses 1,000/day or you need sub-200ms response times for live chat, step up to the standard Flash model. Below that threshold, Flash Lite keeps your AI budget under $5/month while eliminating the triage backlog.

Video content moderation queue

Flash Lite's multimodal input makes pre-publish review scalable

A 10-person creator network publishes 40-60 short videos weekly and needs to flag potential IP issues, brand-unsafe content, or off-topic submissions before they go live. Flash Lite processes video, audio, and image inputs natively, so you're not paying for separate transcription or frame-extraction pipelines. At $0.25/Mtok input, a 90-second video with audio transcription costs roughly $0.02 to analyze. The model returns a JSON object flagging timestamps and confidence scores for review categories you define. No public benchmarks exist yet for this model, so expect to validate accuracy on your first 50 videos before trusting it unsupervised. If false-positive rates stay under 8%, you've just automated 70% of your pre-publish workload for under $10/week.

Agency client call summarization

When Flash Lite turns 12 weekly calls into structured Notion entries

A 6-person agency runs 12 client calls per week and needs meeting notes, action items, and project updates in Notion within an hour of each call. Flash Lite's audio input support means you upload the recording directly—no Whisper API detour. The 1M token context window handles 90-minute calls with full transcripts and your Notion template in a single prompt. At $0.25 input and $1.50 output per Mtok, each summary costs $0.08-0.12 depending on call length. Output tokens are higher here because you're generating formatted markdown, but you're still under $7/month for the full workload. If your calls involve dense technical jargon or require verbatim quote extraction, test accuracy on 5 calls first—Flash Lite trades some precision for speed and cost. For standard client updates and task tracking, it clears the bar.

Frequently asked

Is Gemini 3.1 Flash Lite good for high-volume text generation?

Yes, at $0.25 input and $1.50 output per million tokens, it's positioned as Google's budget option for bulk tasks. The 1M token context window handles long documents well. If you're processing thousands of requests daily where speed matters more than absolute quality, this is the cheapest Gemini option. For critical reasoning tasks, step up to standard Flash.

Is Gemini 3.1 Flash Lite cheaper than GPT-4o Mini?

Yes on input, no on output. Flash Lite costs $0.25 input versus GPT-4o Mini's $0.15, but Flash Lite charges $1.50 output versus Mini's $0.60. If your workload generates more output tokens than input, Mini wins on cost. Flash Lite makes sense when you're feeding massive context and extracting short answers, or when you need the multimodal support Mini lacks.

Can Gemini 3.1 Flash Lite handle video and audio inputs?

Yes, it supports text, image, video, file, and audio modalities natively. This makes it useful for transcription, video analysis, or document parsing workflows where you'd otherwise chain multiple services. The quality on complex video understanding isn't benchmarked publicly yet, so test your specific use case before committing to production scale.

How does Flash Lite compare to standard Gemini 3.1 Flash?

Flash Lite trades reasoning quality for cost. Google hasn't published benchmarks, but the pricing gap suggests Lite is 3-5x cheaper and likely scores 10-15 points lower on MMLU-style tests. Use Lite for classification, extraction, and summarization where errors are recoverable. Use standard Flash when accuracy matters more than cost, like code generation or medical text analysis.

Should I use Flash Lite for customer-facing chatbots?

Only if you're willing to accept more hallucinations and weaker reasoning. The lack of public benchmarks means you're flying blind on quality. For internal tools or first-pass filtering where a human reviews output, it's fine. For customer support where wrong answers damage trust, pay the extra $1-2 per million tokens for standard Flash or Claude Haiku instead.

Data last verified 22 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.