LiquidAI: LFM2-24B-A2B
LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per...
Anyone in the Space can @-mention LiquidAI: LFM2-24B-A2B with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Budget-conscious text generation tasks
- Prototyping with 32K context needs
- Teams exploring Liquid architecture benefits
- Cost-sensitive chatbot backends
Strengths
The 24B parameter count delivers more reasoning capacity than 8B models while maintaining aggressive pricing that undercuts mid-tier options. The 32K context window handles moderately long documents without chunking. Liquid's architecture claims efficiency gains through their novel approach to sequence modeling, potentially offering faster inference than traditional transformer models at this scale. The A2B variant suggests architectural optimizations for specific workload patterns.
Trade-offs
No public benchmarks means you're flying blind on actual performance versus Gemini Flash, GPT-4o Mini, or Llama alternatives at similar price points. The 32K context is half what Gemini Flash offers and a fraction of Claude or GPT-4's capabilities. Proprietary license limits deployment flexibility. As a newer entrant from LiquidAI, ecosystem tooling and community knowledge are sparse compared to established models. You'll need to run your own evals before trusting it with production traffic.
Specifications
- Provider
- liquid
- Category
- llm
- Context length
- 32,768 tokens
- Max output
- —
- Modalities
- text
- License
- proprietary
- Released
- 2026-02-25
Pricing
- Input
- $0.03/Mtok
- Output
- $0.12/Mtok
- Model ID
liquid/lfm-2-24b-a2b
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| liquid | 33k | $0.03/Mtok | $0.12/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Document Summarization Test
Summarize the following document in 3-4 bullet points, focusing on key decisions and action items. Preserve technical terminology and highlight any unresolved questions.Open in a Space →
Code Review Assistant
Review this code snippet for potential bugs, performance issues, and style improvements. Explain your reasoning for each suggestion and rank them by severity.Open in a Space →
Customer Support Draft
Draft a professional response to this customer inquiry. Acknowledge their concern, provide a clear solution or next steps, and maintain a helpful tone without over-apologizing.Open in a Space →
Structured Data Extraction
Extract all dates, names, and monetary amounts from the following text. Return results as a JSON object with arrays for each category.Open in a Space →
Long-Context Reasoning
Given these meeting transcripts from the past month, identify recurring themes, track how decisions evolved, and flag any contradictions between earlier and later discussions.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Refactor this Python function to handle edge cases and improve readability: def calc(x,y): return x/y if y!=0 else None
Here's a refactored version with better error handling: ```python def safe_divide(dividend: float, divisor: float) -> float | None: """Safely divide two numbers, returning None for invalid operations.""" if divisor == 0: return None return dividend / divisor ``` Key improvements: descriptive function and parameter names, type hints for clarity, docstring explaining behavior, and the same zero-division logic preserved. Consider raising a ValueError instead of returning None if you want callers to handle the error explicitly.
This example shows LFM2-24B-A2B's ability to produce clean, idiomatic code with modern Python conventions. The 32K context window supports reviewing larger codebases, though at $0.12/Mtok output, extended refactoring sessions cost more than some alternatives in the same parameter class.
Summarize the key regulatory changes in this 8-page compliance memo and flag any action items for our legal team by end of quarter.
The memo outlines three primary changes: (1) Updated data retention requirements extending storage from 5 to 7 years for financial records, effective Q1 2025. (2) New consent language mandated for EU customers under amended GDPR guidelines—templates must be revised by December 15. (3) Expanded breach notification timelines, now requiring disclosure within 48 hours instead of 72. Action items: Legal must draft updated consent forms (due Dec 15), IT must audit retention policies (due Jan 31), and Compliance must file acknowledgment with regulators (due Nov 30).
Demonstrates the model's structured summarization and task extraction from dense documents. The 32K token window accommodates multi-page memos without truncation. However, without public benchmark data, users should verify accuracy on domain-specific jargon before relying on output for compliance decisions.
Write a professional email declining a vendor proposal while keeping the door open for future collaboration. They quoted $45K; our budget is $20K.
Subject: Re: Proposal for Q1 Marketing Services Hi [Vendor Name], Thank you for the detailed proposal. Your team's approach to audience segmentation and the case studies you shared are impressive. Unfortunately, the $45K investment exceeds our current budget allocation for this initiative. We're working within a $20K range for Q1, which doesn't align with the scope you've outlined. That said, we'd welcome the opportunity to reconnect in Q2 when we revisit our marketing spend. If you have a scaled-down option that fits our budget, I'm happy to discuss that as well. Best regards
Shows the model's tone calibration for business communication—polite, specific about constraints, and forward-looking. The output balances professionalism with directness. At 24B parameters, this model competes with larger alternatives on nuanced writing tasks, though pricing is mid-range for the category.
Use-case deep-dives
When LFM2-24B-A2B makes sense for high-volume ticket triage
A 12-person SaaS company processing 800 support tickets daily needs to classify urgency and route to the right specialist without burning through their AI budget. LFM2-24B-A2B at $0.03/$0.12 per Mtok sits in the sweet spot: cheap enough to run on every inbound message, capable enough to handle the classification logic that doesn't require frontier reasoning. The 32k context window covers full ticket threads plus knowledge base excerpts for routing decisions. If your accuracy threshold is above 95% or you need multi-step reasoning within tickets, you'll want benchmarks before committing—this model hasn't published public evals yet. For straightforward classification at scale where cost per call matters more than bleeding-edge performance, run a week-long pilot and measure false-positive rates against your current system.
LFM2-24B-A2B as a cost-effective RAG backend for team wikis
A 40-person engineering org wants every developer to query their Confluence docs and runbooks without waiting on senior engineers. LFM2-24B-A2B handles the retrieval-augmented generation layer at a fraction of the cost of GPT-4 class models—critical when you're serving 200+ queries per day across the team. The 32k window fits most chunked documentation contexts, and the $0.03 input rate means embedding lookups and context injection stay cheap. The trade-off: you're betting on a model without public benchmark validation, so complex technical reasoning or edge-case accuracy might lag proven alternatives. If your docs are well-structured and queries are mostly lookup-heavy rather than synthesis-heavy, this model keeps your RAG pipeline economical. Test it on your hardest 50 questions first, then scale if precision holds.
When to use LFM2-24B-A2B for first-pass blog outlines and social copy
A 5-person content agency produces 60 blog posts and 300 social assets monthly, and they need a model that drafts outlines and short-form copy fast without eating their margin. LFM2-24B-A2B's output pricing at $0.12/Mtok makes it viable for generating 500-800 token drafts that human editors refine—substantially cheaper than frontier models for work that doesn't require deep research or brand-voice nuance out of the gate. The 32k context lets you feed style guides and competitor examples inline. The risk: without published benchmarks, you can't predict how often drafts will need heavy rewrites versus light edits, which changes the true cost-per-finished-asset. Run a two-week trial on 20% of your workload, track editor time-to-publish, and calculate whether the cheaper token rate actually saves labor hours. If your editors spend under 10 minutes per draft, the math works.
Frequently asked
Is LFM2-24B-A2B good for general text tasks?
Yes, for most standard text work. At 24B parameters with a 32K context window, it handles document analysis, summarization, and conversational tasks well. Without public benchmarks we can't compare it directly to GPT-4 or Claude, but the parameter count suggests solid performance for everyday business use cases like email drafting, content editing, and customer support responses.
Is LFM2-24B-A2B cheaper than GPT-4o?
Significantly cheaper. At $0.03 input and $0.12 output per million tokens, you're paying roughly 10-20x less than GPT-4o depending on the task mix. For high-volume applications where you don't need frontier-model reasoning, this pricing makes it viable for batch processing, internal tools, and scenarios where cost per request matters more than peak capability.
Can LFM2-24B-A2B handle 32K tokens reliably?
The 32K context window is standard for mid-tier models, but real-world performance depends on your use case. For structured documents and straightforward retrieval it should work fine. For complex reasoning across the full context or needle-in-haystack tasks, expect degradation past 20K tokens—common for models in this class. Test with your actual data before committing to context-heavy workflows.
How does LFM2-24B-A2B compare to Llama 3.1 70B?
Llama 3.1 70B will outperform it on reasoning and coding tasks due to the 3x parameter advantage. LFM2-24B-A2B trades capability for speed and cost—inference is faster, hosting is cheaper, and for simpler tasks the quality gap disappears. Choose this if latency and price matter more than handling complex multi-step problems or advanced code generation.
Should I use LFM2-24B-A2B for production chatbots?
Yes, if your chatbot handles defined workflows rather than open-ended research. The 24B size means fast response times and the pricing supports high message volumes. For customer service, FAQ systems, or guided conversations it's a practical choice. For complex troubleshooting or technical support where reasoning depth matters, you'll want a larger model like Claude or GPT-4.