LLMgoogle

Google: Gemma 2 27B

Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of...

Anyone in the Space can @-mention Google: Gemma 2 27B with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Gemma 2 27B is Google's mid-sized open-weights model optimized for on-device and self-hosted deployments where you control the infrastructure. At 27B parameters it punches above its weight class on reasoning and instruction-following, but the 8K context window and identical input/output pricing make it less competitive than cloud alternatives for long documents or high-throughput generation. Reach for this when you need a capable model you can run locally or when data residency requirements rule out API calls.

Best for

Self-hosted deployments with data residency needs
On-device inference on high-end hardware
Instruction-following tasks under 8K tokens
Cost-predictable workloads with balanced I/O
Fine-tuning for domain-specific applications

Strengths

The 27B parameter count delivers strong instruction-following and reasoning performance relative to models in the 20-30B range, making it viable for complex prompts that don't require massive context. Google's training emphasizes safety and alignment, so refusals are rare on legitimate use cases. The open-weights license allows fine-tuning and self-hosting, giving you full control over latency, data handling, and cost structure once you've deployed the infrastructure.

Trade-offs

The 8K context window is restrictive compared to modern cloud models — you'll hit limits quickly on document analysis or multi-turn conversations. Identical input and output pricing at $0.65/Mtok removes the cost advantage of concise prompts, and at that rate you're paying more per token than GPT-4o mini while getting a smaller context and no vision capabilities. Performance on specialized domains like code or math lags behind frontier models, and you'll need significant GPU resources to self-host at reasonable speed.

Specifications

Provider: google
Category: llm
Context length: 8,192 tokens
Max output: 2,048 tokens
Modalities: text
License: proprietary
Released: 2024-07-13

Pricing

Input: $0.65/Mtok
Output: $0.65/Mtok
Model ID: google/gemma-2-27b-it

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$11.44

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
google	8k	$0.65/Mtok	$0.65/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Structured Data Extraction

Extract the following information from this customer support ticket and return it as JSON with fields: customer_name, issue_category, priority_level, and requested_action. Ticket text: [paste ticket here]

Open in a Space →

Multi-Step Reasoning Task

A warehouse ships 150 boxes on Monday, 20% more on Tuesday, and half of Tuesday's total on Wednesday. If each box weighs 12kg, what is the total weight shipped across all three days? Show your step-by-step calculation.

Open in a Space →

Content Rewriting for Tone

Rewrite this technical product description for a non-technical audience. Keep all key facts but replace jargon with plain language and add analogies where helpful: [paste description here]

Open in a Space →

Classification with Explanation

Classify this user review as positive, negative, or neutral, then explain in one sentence which specific phrases led to your classification: [paste review here]

Open in a Space →

Instruction Expansion

I want to migrate a WordPress site to a static site generator. List the 8-10 key steps I should follow, with one sentence of explanation for each step.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential bugs. Focus on edge cases and type safety issues: def calculate_discount(price, discount_percent): return price - (price * discount_percent / 100)

Output

The function has several issues: 1. **No input validation**: Negative prices or discount percentages above 100 could produce nonsensical results. A discount_percent of 150 would give a negative final price. 2. **Type assumptions**: The function assumes numeric inputs but doesn't enforce this. Passing strings would cause a TypeError. 3. **Floating-point precision**: For financial calculations, this could accumulate rounding errors. Consider using the `decimal` module for currency. 4. **Missing bounds check**: discount_percent should typically be clamped between 0-100. Recommended fix: Add type hints, validate inputs with `if discount_percent < 0 or discount_percent > 100: raise ValueError()`, and consider returning `Decimal` for monetary precision.

Notes

Gemma 2 27B excels at structured code review with practical recommendations. The 8K context window handles typical function-level analysis comfortably, though it may struggle with reviewing entire codebases or deeply nested call chains that exceed this limit.

Prompt

Explain the CAP theorem to a junior developer who understands databases but hasn't worked with distributed systems. Use a real-world analogy.

Output

Think of CAP like running a chain of coffee shops: **Consistency** means every location shows the same loyalty points balance instantly. If you earn points at one shop, all shops see it immediately. **Availability** means every shop can always process transactions, even if the network connecting them goes down. **Partition tolerance** means the system keeps working when shops can't communicate with each other. CAP theorem says you can only guarantee two of these three. Most distributed databases choose partition tolerance (networks fail) plus either consistency (banks, inventory) or availability (social media feeds, caches). You can't have all three because if shops can't talk to each other AND must stay available, they can't guarantee they're showing the same data.

Notes

The model produces clear technical explanations with appropriate analogies for the audience level specified. At $0.65 per million tokens for both input and output, it's cost-effective for documentation and educational content, though the 8K window may require chunking for long-form technical writing.

Prompt

Generate a SQL query to find customers who made purchases in Q1 2024 but not in Q2 2024, including their total Q1 spend. Tables: customers(id, name), orders(id, customer_id, order_date, amount).

Output

```sql SELECT c.id, c.name, SUM(o.amount) AS q1_total_spend FROM customers c INNER JOIN orders o ON c.id = o.customer_id WHERE o.order_date >= '2024-01-01' AND o.order_date < '2024-04-01' AND c.id NOT IN ( SELECT DISTINCT customer_id FROM orders WHERE order_date >= '2024-04-01' AND order_date < '2024-07-01' ) GROUP BY c.id, c.name ORDER BY q1_total_spend DESC; ``` This uses a subquery to exclude customers with Q2 orders, then aggregates Q1 spending for the remaining customers.

Notes

Gemma 2 27B generates syntactically correct SQL with appropriate joins and date filtering. The model handles multi-step logic well within its context window. For complex queries spanning dozens of tables or requiring extensive schema context, the 8K token limit becomes a constraint compared to larger-context alternatives.

Use-case deep-dives

Internal documentation Q&A

When Gemma 2 27B fits small-team knowledge bases under budget

A 12-person SaaS startup needs an AI assistant that answers questions from their Notion wiki and Google Docs without burning through API credits. Gemma 2 27B hits the sweet spot here: at $0.65/Mtok both ways, you're looking at roughly $1.30 per million tokens round-trip, which means 500 employee queries averaging 400 tokens each costs about $0.26. The 8,192-token context window covers most internal docs when chunked properly, and the 27B parameter count handles technical jargon and multi-step reasoning better than smaller models. If your knowledge base regularly requires stitching together 15+ documents per answer, you'll hit the context ceiling and should consider a long-context alternative. Otherwise, this model keeps your documentation assistant running at predictable cost while your team stays under 2,000 queries per month.

Batch content moderation

Gemma 2 27B for overnight comment filtering at scale

A community platform with 80,000 daily comments needs to flag policy violations before moderators clock in at 9am. Gemma 2 27B works well in this batch-processing scenario because speed matters less than cost and accuracy. Running 80,000 comments at an average 150 tokens each (input) plus 50-token classification outputs gives you roughly 16M tokens total per day, or about $10.40 in API costs. The model's size handles nuanced policy edge cases—sarcasm, context-dependent slurs, coordinated harassment patterns—better than sub-20B alternatives, and the 8K context window lets you include the previous two comments for thread-aware decisions. If you need real-time moderation under 200ms, this won't work; for overnight batch jobs where you're optimizing cost per accurate flag, Gemma 2 27B delivers.

Customer support email drafting

When Gemma 2 27B handles support volume without enterprise pricing

A 25-person e-commerce company processes 400 support emails daily and wants AI to draft responses for agents to review and send. Gemma 2 27B makes sense here because the context window accommodates a typical email thread (customer message plus order history snippet), and the $0.65/Mtok pricing keeps monthly costs under $200 even at peak volume. Assuming 600 input tokens per draft (email + context) and 300 output tokens per response, you're spending roughly $0.52 per day at 400 emails. The 27B parameter count produces replies that sound less robotic than 7B-class models, reducing the editing burden on your agents. If your email threads regularly exceed 6,000 tokens or you need sub-second latency for live chat, look elsewhere. For async email workflows where agents spend 30 seconds polishing each draft, this model cuts response time in half without the sticker shock of frontier models.

Frequently asked

Is Gemma 2 27B good for general text generation tasks?

Yes, it handles most standard text tasks competently — drafting, summarization, Q&A. The 27B parameter count puts it in the mid-tier range for quality, above smaller models but below flagship 70B+ options. It's a solid workhorse for everyday use where you don't need cutting-edge reasoning.

Is Gemma 2 27B cheaper than GPT-4o or Claude Sonnet?

Significantly cheaper. At $0.65 per Mtok for both input and output, it costs roughly 85-90% less than GPT-4o ($2.50-$10) and Claude Sonnet 4 ($3). If your use case doesn't require frontier-model capabilities, the cost savings are substantial for high-volume workloads.

Can Gemma 2 27B handle an 8K token context window effectively?

The 8,192-token limit is restrictive for document analysis or long conversations. You'll hit the ceiling with medium-length PDFs or chat histories beyond 15-20 exchanges. For short-form tasks it's fine, but plan to chunk or summarize if you're working with longer inputs.

How does Gemma 2 27B compare to Llama 3.1 70B?

Llama 3.1 70B outperforms it on complex reasoning and instruction-following due to the parameter advantage. Gemma 2 27B is faster and cheaper but trades off some accuracy. Choose Gemma for cost-sensitive batch jobs; choose Llama 70B when quality matters more than speed or price.

Should I use Gemma 2 27B for a production chatbot?

Only if budget is the primary constraint and user queries are straightforward. The smaller size means weaker multi-turn coherence and nuance compared to 70B+ models. It works for FAQ bots or simple support flows, but expect more edge-case failures than with larger alternatives.