LLMliquid

LiquidAI: LFM2.5-1.2B-Instruct (free)

LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.

Anyone in the Space can @-mention LiquidAI: LFM2.5-1.2B-Instruct (free) with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

LiquidAI's 1.2B parameter model trades raw capability for speed and zero cost. At this size, expect basic instruction-following and simple reasoning, but don't lean on it for nuanced analysis or complex multi-step tasks. The 32K context window is generous for a model this small. Best deployed where latency and budget matter more than output sophistication—think rapid prototyping, high-volume simple tasks, or educational experimentation where free inference removes friction.

Best for

Rapid prototyping without API costs
High-volume simple classification tasks
Educational experimentation and learning
Latency-sensitive lightweight applications
Basic text summarization under 32K tokens

Strengths

Zero-cost inference removes budget constraints for experimentation and high-throughput workloads. The 32K context window exceeds what most sub-2B models offer, enabling longer document processing without chunking. Small parameter count translates to fast response times, useful when you need hundreds of inferences per minute. Liquid's architecture optimizations may deliver better efficiency than naive small models, though public benchmarks aren't yet available to confirm performance against peers like Phi or Gemma.

Trade-offs

At 1.2B parameters, this model will struggle with complex reasoning, nuanced tone control, and domain-specific knowledge. Expect higher error rates on math, code generation, and multi-step logic compared to 7B+ models. The proprietary license limits transparency into training data and fine-tuning options. Without public benchmark scores, you're flying blind relative to established small models—plan to validate performance on your specific tasks before committing to production use cases.

Specifications

Provider: liquid
Category: llm
Context length: 32,768 tokens
Max output: —
Modalities: text
License: proprietary
Released: 2026-01-20

Pricing

Input: $0.00/Mtok
Output: $0.00/Mtok
Model ID: liquid/lfm-2.5-1.2b-instruct:free

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

Freeno token cost

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
liquid	33k	$0.00/Mtok	$0.00/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Classify Customer Feedback

Classify this customer feedback as positive, negative, or neutral. Respond with only one word: [feedback text here]

Open in a Space →

Extract Key Facts

Extract the company name, date, and dollar amount from this invoice text. Format as JSON: [invoice text here]

Open in a Space →

Rewrite for Clarity

Rewrite this sentence in simpler language suitable for a general audience: [complex sentence here]

Open in a Space →

Generate Simple Summaries

Summarize the main points of this article in three bullet points: [article text here]

Open in a Space →

Prototype Chatbot Responses

You are a helpful assistant for a gardening store. A customer asks: 'Do you sell organic fertilizer?' Respond briefly and helpfully.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Explain how JavaScript's event loop works to a developer who understands synchronous code but hasn't worked with async patterns yet.

Output

The event loop is JavaScript's way of handling multiple tasks without blocking. Think of it as a queue manager: your synchronous code runs first (the call stack), then the event loop checks for completed async operations—like setTimeout callbacks or resolved promises—and moves them back to the call stack when it's empty. This is why setTimeout(fn, 0) doesn't run immediately; it waits for the current execution context to finish. The key insight: JavaScript is single-threaded, but the event loop creates the illusion of concurrency by efficiently scheduling when different pieces of code execute.

Notes

This example shows the model's ability to build conceptual bridges for technical explanations. At 1.2B parameters, it produces clear pedagogical content without the depth of reasoning you'd get from larger models—notice it doesn't cover microtasks vs macrotasks, which a 70B+ model would likely include.

Prompt

Write a Python function that finds the longest common subsequence between two strings, with inline comments explaining the dynamic programming approach.

Output

```python def lcs(s1, s2): m, n = len(s1), len(s2) # Create DP table: dp[i][j] = LCS length for s1[:i] and s2[:j] dp = [[0] * (n + 1) for _ in range(m + 1)] for i in range(1, m + 1): for j in range(1, n + 1): if s1[i-1] == s2[j-1]: # Characters match: extend previous LCS dp[i][j] = dp[i-1][j-1] + 1 else: # Take max of excluding either character dp[i][j] = max(dp[i-1][j], dp[i][j-1]) return dp[m][n] ```

Notes

Demonstrates competent code generation with educational annotations. The 32K context window handles this straightforward algorithm comfortably, but the model's smaller parameter count means it's better suited for standard implementations than novel algorithmic optimizations or complex refactoring tasks.

Prompt

Summarize this meeting transcript into action items with owners: 'Sarah mentioned the Q3 roadmap needs finalization by Friday. Tom said he'd review the API docs this week. The design team asked for feedback on mockups by Wednesday.'

Output

**Action Items:** 1. **Finalize Q3 roadmap** – Owner: Sarah – Deadline: Friday 2. **Review API documentation** – Owner: Tom – Deadline: End of week 3. **Provide feedback on design mockups** – Owner: [Team/unspecified] – Deadline: Wednesday Note: The design feedback owner wasn't explicitly assigned in the transcript.

Notes

Shows practical information extraction from conversational text. The model correctly identifies tasks, owners, and deadlines while flagging an ambiguity. At this size, it handles structured extraction well but may miss nuanced implications in longer, more complex meeting discussions where larger models excel.

Use-case deep-dives

Prototype chatbot validation

Free iteration cycles for early-stage product teams testing conversational flows

A 4-person startup building a customer support bot needs to run 200+ test conversations daily across different prompt templates before committing to a paid model. LFM2.5-1.2B-Instruct at $0.00/Mtok lets you burn through those iterations without watching a meter. The 32k context window handles multi-turn dialogues and knowledge base chunks for RAG experiments. You'll hit quality ceilings on complex reasoning or domain-specific jargon—expect to rewrite 15-20% of responses manually during QA. But for validating information architecture, testing edge cases, and proving conversational patterns work before you scale, free inference removes the friction tax. Once your flows stabilize and you're processing real customer volume, migrate the proven prompts to a production-grade model and keep this one in your dev environment.

High-frequency content moderation

When zero cost justifies a 1.2B parameter model for comment filtering at scale

A community platform with 80k daily comments needs first-pass toxicity screening before human review. At free pricing, LFM2.5-1.2B-Instruct processes those 80k checks for $0 versus $240/day on a typical $3/Mtok model (assuming 1k tokens per check). The smaller parameter count means faster inference—critical when you're flagging content in near-real-time. You'll tune precision/recall thresholds higher than you would with a larger model, sending maybe 12% to human review instead of 8%, but the cost delta pays for that extra moderation labor ten times over. The 32k window isn't load-bearing here since each comment is evaluated independently. If your false-negative rate climbs above 5% and brand safety becomes the blocker, then you pay for a frontier model. Until that threshold, free inference wins the unit economics.

Batch document summarization

Overnight processing of meeting transcripts when speed and cost beat nuance

A 12-person consulting firm records 40 client calls weekly and needs rough summaries in Notion by morning standup. LFM2.5-1.2B-Instruct running overnight batch jobs costs nothing and the 32k context fits most hour-long transcripts without chunking. Summaries will miss subtle client sentiment and occasionally garble technical terms—plan on account managers skimming the output and fixing 20% of the entries. But the alternative is paying $80/month for a smarter model or having someone manually summarize for 6 hours weekly. The free model shifts that 6 hours to 90 minutes of cleanup work, and the overnight latency doesn't matter when no one reads these until 9am anyway. If clients start requesting these summaries directly, upgrade to a model with stronger instruction-following and tighter factual accuracy.

Frequently asked

Is LFM2.5-1.2B-Instruct good for coding tasks?

Not really. At 1.2 billion parameters, this model is too small for reliable code generation or debugging. It might handle trivial snippets or syntax questions, but you'll hit accuracy walls fast. For coding, use Qwen 2.5 Coder 7B or Claude 3.5 Sonnet instead—they're built for it and won't hallucinate imports or logic.

Is LFM2.5-1.2B-Instruct actually free to use?

Yes, it's $0.00 per million tokens in and out. That makes it the cheapest option on Switchy for throwaway tasks like classification, simple rewrites, or prototyping prompts before you scale to a paid model. Just don't expect GPT-4 quality—you're trading cost for capability.

Can it handle 32k token contexts reliably?

The 32,768-token window is there, but a 1.2B parameter model will struggle to reason across that much text. Expect coherence to degrade past 8-10k tokens. If you need full-context summarization or long-document Q&A, use Gemini 1.5 Flash or Claude 3 Haiku—they're optimized for it.

How does LFM2.5-1.2B compare to Gemini Flash 2.0?

Gemini Flash 2.0 is vastly more capable—better reasoning, multimodal support, and proven benchmarks. LFM2.5 is free but limited to basic text tasks. Use LFM2.5 for zero-budget experiments or high-volume classification where errors don't matter. Use Flash 2.0 when you need actual intelligence.

Should I use this model for customer-facing chatbots?

No. A 1.2B model will produce awkward phrasing, miss nuance, and fail on anything beyond FAQ-style responses. For customer chat, you need GPT-4o Mini, Claude 3.5 Haiku, or Gemini Flash—models trained on conversational data with guardrails. LFM2.5 is fine for internal tooling or batch jobs only.