LLMliquid

LiquidAI: LFM2.5-1.2B-Thinking (free)

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...

Anyone in the Space can @-mention LiquidAI: LFM2.5-1.2B-Thinking (free) with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

LiquidAI's 1.2B parameter thinking model trades raw capability for zero cost and fast inference. It exposes chain-of-thought reasoning in real-time, making it useful for educational demos or prototyping reasoning workflows where you need to see the model's work. At this size, expect weaker performance than frontier models on complex tasks, but the free tier and compact footprint make it a solid sandbox for testing reasoning patterns before scaling to larger models.

Best for

  • Prototyping chain-of-thought workflows
  • Educational demos of reasoning processes
  • Cost-free experimentation with small models
  • Fast inference on simple reasoning tasks

Strengths

The thinking mode surfaces intermediate reasoning steps, letting you audit how the model arrives at answers. At 1.2B parameters, inference is fast and the model runs efficiently on modest hardware. The zero-cost tier removes friction for experimentation. The 32K context window is generous for a model this size, supporting multi-turn conversations or moderate document analysis without truncation.

Trade-offs

A 1.2B parameter model will underperform GPT-4, Claude, or even mid-tier models like Gemini Flash on complex reasoning, coding, or nuanced language tasks. Without public benchmarks, you're flying blind on comparative accuracy. The thinking output adds latency and token overhead, which may not justify the transparency gain in production use cases. Proprietary licensing limits deployment flexibility compared to open-weight alternatives like Qwen or Phi.

Specifications

Provider
liquid
Category
llm
Context length
32,768 tokens
Max output
Modalities
text
License
proprietary
Released
2026-01-20

Pricing

Input
$0.00/Mtok
Output
$0.00/Mtok
Model ID
liquid/lfm-2.5-1.2b-thinking:free

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
Freeno token cost
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
liquid33k$0.00/Mtok$0.00/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Debug Reasoning Steps

Solve this logic puzzle step-by-step, showing your reasoning at each stage: Three friends each have a different pet (cat, dog, bird). Alex doesn't have the dog. The person with the bird sits next to Bailey. Who has which pet?
Open in a Space →

Explain Math Concepts

Explain how to calculate the area of a trapezoid with bases 8 cm and 12 cm and height 5 cm. Show each step of your reasoning and why each formula applies.
Open in a Space →

Prototype Decision Trees

I need to decide whether to buy or lease a car. Walk through the key factors I should consider, showing how you weigh each one against the others.
Open in a Space →

Draft Simple Code

Write a Python function that takes a list of numbers and returns only the even ones. Show your thinking as you plan the logic.
Open in a Space →

Summarize Short Articles

Summarize this 300-word article about renewable energy trends. Show which points you consider most important as you decide what to include.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Walk me through why this Python function might be slow: `def find_duplicates(items): return [x for x in items if items.count(x) > 1]`

Output

The bottleneck is the nested iteration hidden in `items.count(x)`. For each element in the list comprehension, `count()` scans the entire list — that's O(n) per element, making the whole function O(n²). With 10,000 items, you're doing 100 million comparisons. A set-based approach would drop this to O(n): `seen = set(); dupes = set(); [dupes.add(x) if x in seen else seen.add(x) for x in items]`. The original is readable but doesn't scale past a few hundred elements.

Notes

This example highlights the 'thinking' aspect — the model breaks down algorithmic complexity step-by-step rather than just offering a fix. The 32K context window supports pasting entire modules for this kind of analysis, though the 1.2B parameter count means explanations stay grounded rather than exhaustively academic.

Prompt

I need to explain to a non-technical stakeholder why we can't just 'add more servers' to fix our database query times. Help me frame this.

Output

Tell them the database is like a filing cabinet, not a copy machine. Adding servers means more people can *ask* for files simultaneously, but if each request still requires flipping through the same disorganized cabinet, you've only parallelized the waiting. The real fix is indexing — reorganizing the cabinet so lookups are instant. That's a one-time engineering cost that scales permanently, versus renting more servers monthly to work around the underlying inefficiency. Frame it as: solve the root cause now, or pay the workaround tax forever.

Notes

Demonstrates the model's ability to translate technical constraints into business logic without jargon. The smaller parameter count keeps analogies concrete rather than overwrought. At zero cost per token, this makes LFM2.5-1.2B viable for high-volume explanation tasks where GPT-4-class depth isn't required.

Prompt

Generate a Git commit message for changes that refactor user authentication to use JWT tokens instead of session cookies, including migration logic for existing sessions.

Output

``` Refactor: migrate authentication from session cookies to JWT - Replace Express session middleware with jsonwebtoken - Add /auth/migrate endpoint to convert active sessions - Update login/logout to issue/revoke JWTs in Authorization header - Preserve session expiry logic (7-day default) in token claims - Add backward-compat layer: accept cookies for 30 days post-deploy Breaking: clients must handle Authorization header after migration window ```

Notes

Shows structured output generation with technical specificity. The model captures both the 'what' and the 'why' of a complex change, including deployment considerations. The free pricing makes this practical for CI/CD automation where commit message quality matters but budget for LLM calls is constrained.

Use-case deep-dives

Prototyping chatbot logic

When free inference beats paid models for early-stage bot work

A 4-person startup building a customer support chatbot needs to iterate on prompt templates and conversation flows without burning through API credits. LFM2.5-1.2B-Thinking handles this at $0.00/Mtok with a 32K context window—enough to hold 15-20 conversation turns plus system instructions. The 1.2B parameter count means responses arrive in under 2 seconds on shared infrastructure, fast enough for live testing in Slack or Discord. You'll outgrow it once you need nuanced reasoning or domain expertise (switch to GPT-4o-mini at that point), but for mapping out dialog trees and testing edge cases across 500+ iterations in a week, free inference removes the budget friction. Use this model until your bot handles real customer load or you need benchmark-proven accuracy on complex queries.

Batch text classification

Why a small thinking model works for high-volume tagging jobs

A 10-person e-commerce team needs to classify 50,000 product reviews into sentiment buckets (positive/negative/neutral) and flag policy violations. LFM2.5-1.2B-Thinking runs this job at zero marginal cost, and the 32K window lets you batch 40-60 reviews per request with few-shot examples. The "thinking" architecture suggests it allocates extra compute to reasoning steps, which helps with ambiguous cases where sentiment isn't surface-level. You won't match GPT-4's accuracy on sarcasm or multilingual edge cases, but for English reviews with clear signals, a small model at $0.00 beats paying $0.15/Mtok to process 50M tokens. Run a 500-review validation set first—if accuracy clears 92%, batch the full dataset and save $7,500 in API costs.

Internal documentation Q&A

When to use a free model for low-stakes knowledge retrieval

A 12-person agency wants a Slack bot that answers questions about their internal wiki (onboarding docs, client templates, process guides). LFM2.5-1.2B-Thinking fits the entire knowledge base into its 32K context window and returns answers at no cost per query. The team runs 200-300 questions per week—mostly straightforward lookups like "What's our NDA template?" or "How do we invoice retainer clients?" At $0.00/Mtok, you avoid the $40-60/month spend you'd rack up with a paid model, and the thinking layer helps when questions require light inference ("Which template applies to EU clients?"). This breaks down if your docs exceed 25K tokens or if you need citations with page numbers—then you need a RAG setup with a stronger model. For simple internal Q&A under 300 queries/week, free inference is the right call.

Frequently asked

Is LFM2.5-1.2B-Thinking good for reasoning tasks?

Yes, but with caveats. The "Thinking" label suggests chain-of-thought capability, which helps with multi-step logic. At 1.2B parameters it won't match GPT-4 or Claude on complex reasoning, but for lightweight inference where you need transparent step-by-step outputs—like simple math, basic code logic, or structured decision trees—it punches above its weight class. Free pricing makes experimentation risk-free.

Is LFM2.5-1.2B-Thinking actually free to use?

Yes. $0.00 per million tokens input and output means no usage costs. This makes it ideal for prototyping, high-volume batch jobs, or student projects where budget is zero. The trade-off is capability: you're getting a 1.2B model, not a frontier system. For production apps needing nuanced understanding, you'll eventually pay for something bigger.

Can it handle 32k token context windows reliably?

The 32,768 token window is standard for modern small models, roughly 24,000 words. That's enough for a short technical document or 15-page report. However, smaller models like this often degrade in quality past 16k tokens—attention mechanisms struggle at scale. Use the full window for retrieval tasks, but keep critical reasoning prompts under 8k tokens for best results.

How does LFM2.5 compare to Llama 3.2 1B?

Without public benchmarks we can't give hard numbers, but both are 1B-class models targeting edge deployment. Llama 3.2 1B has Meta's training data advantage and published MMLU scores around 49%. LiquidAI's "Thinking" branding suggests explicit chain-of-thought tuning, which Llama lacks by default. If you need reasoning transparency, try LFM2.5 first. For general instruction-following, Llama is the safer bet until LiquidAI publishes evals.

Should I use this for production chatbots?

Only for low-stakes use cases. A 1.2B model will produce coherent responses for FAQ bots, simple customer triage, or internal tools where mistakes are tolerable. It will hallucinate more than GPT-4, miss nuance in complex queries, and struggle with multi-turn context. The free pricing is tempting, but if your chatbot errors cost money or reputation, spend $0.15/Mtok on Haiku 3.5 instead.

Data last verified 8 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.