TheDrummer: Cydonia 24B V4.1
Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence.
Anyone in the Space can @-mention TheDrummer: Cydonia 24B V4.1 with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Long-document summarization under budget constraints
- High-volume chatbot backends with extended context
- Prototyping multi-turn workflows before scaling up
- Cost-sensitive code review on large codebases
Strengths
The 131K context window handles full codebases, lengthy transcripts, or multi-document analysis without chunking. At $0.30 input and $0.50 output per million tokens, it undercuts most alternatives in the 20-30B parameter range by 40-60%. The V4.1 iteration suggests active development, and the model size hits a sweet spot for inference speed versus capability on consumer-grade hardware or modest cloud instances.
Trade-offs
No public benchmark data means you're flying blind relative to Qwen 2.5 32B Instruct (which posts strong MMLU and HumanEval scores) or Mistral Medium. Proprietary licensing limits transparency into training data and fine-tuning methods. Early-stage models from smaller vendors can exhibit inconsistent instruction-following or hallucination rates that only surface under production load. Plan to run your own evals before committing to high-stakes use cases.
Specifications
- Provider
- thedrummer
- Category
- llm
- Context length
- 131,072 tokens
- Max output
- 131,072 tokens
- Modalities
- text
- License
- proprietary
- Released
- 2025-09-27
Pricing
- Input
- $0.30/Mtok
- Output
- $0.50/Mtok
- Model ID
thedrummer/cydonia-24b-v4.1
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| thedrummer | 131k | $0.30/Mtok | $0.50/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Summarize Long Transcript
Below is a 30,000-word transcript of a product planning meeting. Extract the top 5 decisions made, the rationale for each, and all assigned action items with owners. Format as a bulleted list.Open in a Space →
Multi-File Code Review
I'm pasting 15 Python files from a Flask API (total ~25K tokens). Review the authentication flow across routes, models, and middleware. Flag any security gaps or inconsistencies in how JWT tokens are validated.Open in a Space →
Compare Contract Versions
Here are two versions of a SaaS contract (v2.1 and v3.0, each ~15K words). List every substantive change to liability clauses, termination rights, and data handling obligations. Ignore formatting tweaks.Open in a Space →
Generate Test Cases from Spec
Below is a 20-page product requirements document for a new checkout flow. Generate 15 end-to-end test cases covering happy paths, edge cases, and error states. Include preconditions and expected outcomes for each.Open in a Space →
Extract Insights from Research Papers
I'm providing three research papers on transformer attention mechanisms (total ~40K tokens). Summarize the core innovation in each, then identify overlapping findings and contradictions across all three.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Review this Python function for potential bugs. It's supposed to merge two sorted lists but users report incorrect output when one list is empty.
The function has a classic off-by-one error in the final cleanup loops. When `i < len(list1)` is true but `j` has already exhausted `list2`, you're appending `list1[i]` correctly—but you increment `i` inside the loop without checking bounds on the next iteration. More critically, your base case returns `list2` when `list1` is empty, but that creates a reference issue if the caller mutates the result later. Return `list2.copy()` instead, or better yet, handle empty inputs before entering the merge logic. The algorithm itself is sound; these are defensive coding gaps that surface in edge cases.
This example shows Cydonia's strength in code review with contextual debugging—it identifies both the logical error and a subtle reference-handling issue. The 131K token context window supports pasting entire modules for review. However, without public benchmarks, we can't compare its bug-detection accuracy to models like Claude or GPT-4.
Explain the CAP theorem to a junior backend engineer who's choosing between PostgreSQL and Cassandra for a new service.
CAP theorem says you can't simultaneously guarantee Consistency, Availability, and Partition tolerance in a distributed system—you pick two. PostgreSQL is a CP system: it prioritizes consistency and can tolerate partitions, but sacrifices availability during network splits (writes may block until consensus is restored). Cassandra is AP: it stays available and handles partitions gracefully, but you get eventual consistency—reads might return stale data briefly. For your choice: if you need strict transactional guarantees (e.g., financial ledger, inventory counts), PostgreSQL's consistency is non-negotiable. If you need always-on writes across data centers and can tolerate brief staleness (e.g., user activity feeds, metrics), Cassandra's availability wins. The real question is whether your service can architect around eventual consistency.
Cydonia excels at technical explanations that balance theory with practical decision-making. The response stays grounded in the engineer's actual choice rather than abstract definitions. At $0.50/Mtok output, this kind of teaching-focused use case is cost-effective compared to frontier models, though we lack benchmark data on factual accuracy for systems concepts.
Generate a professional email declining a vendor proposal. We're going with a competitor due to better API documentation, but want to keep the relationship warm for future opportunities.
Subject: Re: Proposal for Analytics Integration Hi [Name], Thank you for the detailed proposal and the time your team invested in the demo last week. After evaluating several options, we've decided to move forward with another vendor whose API documentation aligns more closely with our current development workflow. That said, we were impressed by [specific feature from their demo], and I'd like to stay in touch as our needs evolve. Would you be open to a quarterly check-in? I'll also keep an eye on your product updates—particularly any improvements to developer resources. Thanks again for your partnership on this. Best, [Your name]
This demonstrates Cydonia's ability to handle professional writing with appropriate tone calibration—direct about the decision, specific about the reason, and genuinely warm without false promises. The 131K context window is overkill for emails but useful if you're drafting multiple vendor responses in one session. Pricing is competitive for business writing tasks, though models fine-tuned for enterprise communication may have stronger templates.
Use-case deep-dives
When 131k context makes sense for contract analysis teams
A 4-person legal ops team processing vendor agreements needs to compare clauses across 20-30 page contracts without chunking. Cydonia 24B V4.1 handles the full 131k token window at $0.30/$0.50 per Mtok—roughly $0.40 to process a 100-page contract end-to-end. That's competitive if you're running 200+ contracts monthly and need the model to hold entire documents in memory for cross-reference work. The trade-off: no public benchmarks exist, so you're flying blind on accuracy versus Claude or GPT-4 on legal reasoning tasks. If your workflow tolerates a 2-week eval period and your volume justifies the window size, test it against a baseline on 50 real contracts. Otherwise, stick with a benchmarked model until Cydonia publishes MMLU or legal-specific scores.
Why Cydonia falls short for real-time chat routing at scale
A 12-person SaaS support team routing 800 inbound tickets daily needs sub-second classification and can't afford hallucinations on urgency tagging. Cydonia's pricing is attractive—$0.30 input makes it cheaper than most frontier models—but the absence of public benchmarks means you have no signal on instruction-following accuracy or latency under load. For a use-case where a single misrouted P0 ticket costs customer trust, you need published MMLU, HumanEval, or domain-specific scores before committing production traffic. The 131k window is overkill here; most tickets are under 2k tokens. If you're processing fewer than 100 tickets daily and can manually audit outputs for a month, Cydonia might work as a cost experiment. Above that threshold, choose a model with public accuracy data and SLA guarantees.
When Cydonia's context window beats chunking on forum threads
A 3-person community team moderating a 50k-member forum needs to flag toxic threads where context spans 40-60 replies. Cydonia's 131k window lets you pass an entire thread without splitting it into overlapping chunks, which reduces false positives from lost context. At $0.30 input, scanning 1,000 threads monthly costs roughly $30 if average threads run 10k tokens. The risk: without benchmarks, you don't know how Cydonia performs on nuanced toxicity detection versus models like Llama 3 or Mistral that publish safety evals. Run a 2-week shadow deployment where Cydonia flags threads in parallel with your current model, then compare precision and recall on 200 real examples. If it matches or beats your baseline, the price and window make it a strong fit for this workload.
Frequently asked
Is TheDrummer Cydonia 24B V4.1 good for general text tasks?
With 24 billion parameters and a 131k token context window, Cydonia handles most text generation, summarization, and analysis tasks competently. The lack of public benchmarks makes it hard to compare directly, but the parameter count puts it in the mid-tier range—capable for everyday work but not competing with frontier models. Best suited for teams needing a cost-effective workhorse rather than bleeding-edge performance.
Is Cydonia 24B cheaper than GPT-4 or Claude?
Yes, significantly. At $0.30 input and $0.50 output per million tokens, Cydonia costs roughly 10-20x less than GPT-4 Turbo or Claude Opus. For high-volume applications where you need decent quality without premium pricing, this model offers strong value. The trade-off is you're working with a smaller model that likely won't match frontier reasoning or nuance.
Can Cydonia 24B handle long documents with its 131k context?
The 131k token window is large enough for most business documents, research papers, or codebases. That's roughly 100,000 words of input. However, without published benchmarks on long-context retrieval accuracy, you'll want to test whether it maintains coherence across the full window for your specific use case. The window size is there; performance across it is unverified.
How does V4.1 compare to earlier Cydonia versions?
Without public benchmarks for V4.1 or its predecessors, we can't quantify improvements. Version increments typically mean fine-tuning adjustments, bug fixes, or training data updates rather than architectural overhauls. If you're already using an earlier Cydonia version, test V4.1 on your actual prompts before migrating—the differences may be subtle.
Should I use Cydonia 24B for production chatbots?
It depends on your quality bar and budget. The pricing makes it attractive for high-volume chat applications where you need acceptable responses without premium costs. The 24B parameter size means it won't hallucinate as much as tiny models but won't reason as deeply as 70B+ alternatives. Run A/B tests against your current solution to see if the cost savings justify any quality drop.