LLMthedrummer

TheDrummer: Skyfall 36B V2

Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling.

Anyone in the Space can @-mention TheDrummer: Skyfall 36B V2 with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Skyfall 36B V2 targets the mid-tier sweet spot: larger than 7B models but priced well below frontier options at $0.55/$0.80 per Mtok. With 32K context and a 36B parameter count, it handles multi-turn conversations and moderate document tasks without the cost overhead of 70B+ models. Best for teams that need consistent reasoning on structured tasks — code review, data extraction, technical Q&A — where GPT-4 class performance isn't required. The trade-off is lack of public benchmarks, so you're buying on spec until you test it yourself.

Best for

  • Cost-sensitive code review and debugging
  • Structured data extraction from documents
  • Multi-turn technical support conversations
  • Internal tooling with moderate reasoning needs

Strengths

The 36B parameter count sits in a practical zone: large enough for nuanced reasoning on technical tasks, small enough to keep inference costs manageable. The 32K context window supports full codebases or multi-page documents in a single pass. At $0.55 input, it undercuts most 70B models by 40-60% while delivering more headroom than 7B alternatives. The pricing structure favors read-heavy workflows — data analysis, code audits, document parsing — where input tokens dominate.

Trade-offs

No public benchmark data means you're flying blind on MMLU, HumanEval, or reasoning evals relative to peers like Llama 3.1 70B or Qwen 2.5 72B. The 32K context is half what newer models offer, limiting use on very long documents or deep codebases. Output pricing at $0.80 per Mtok climbs quickly on generation-heavy tasks like drafting or creative writing. Without visibility into training data or fine-tuning approach, you can't predict performance on domain-specific tasks until you run your own evals.

Specifications

Provider
thedrummer
Category
llm
Context length
32,768 tokens
Max output
32,768 tokens
Modalities
text
License
proprietary
Released
2025-03-10

Pricing

Input
$0.55/Mtok
Output
$0.80/Mtok
Model ID
thedrummer/skyfall-36b-v2

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$11.00
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
thedrummer33k$0.55/Mtok$0.80/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Code Review Assistant

Review this pull request for logic errors, style inconsistencies, and potential runtime issues. Highlight any edge cases the code doesn't handle. Provide specific line references and suggest fixes.
Open in a Space →

Extract Structured Data

Extract all key data points from this document into a JSON object. Include dates, amounts, parties involved, and any critical terms. Flag any ambiguous or missing information.
Open in a Space →

Technical Q&A Bot

Answer this technical question based on the documentation provided. Be precise, cite specific sections, and include code examples where relevant. If the docs don't cover it, say so clearly.
Open in a Space →

Summarize Long Threads

Summarize this conversation thread. List the key decisions made, action items assigned, and any unresolved questions. Keep it under 150 words.
Open in a Space →

Debug Error Messages

Analyze this error log and stack trace. Identify the most likely root cause, explain why it's happening, and suggest 2-3 concrete steps to fix it.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential bugs. It's supposed to merge overlapping time intervals but users report missing data: `def merge_intervals(intervals): result = []; for start, end in sorted(intervals): if result and start <= result[-1][1]: result[-1][1] = max(result[-1][1], end); else: result.append([start, end]); return result`

Output

The bug is in the tuple unpacking and list mutation. You're sorting tuples but appending lists, then trying to mutate `result[-1][1]` which fails because tuples are immutable if the input was tuples. Even if input is lists, the logic has an off-by-one: `start <= result[-1][1]` should be `start <= result[-1][1] + 1` for truly adjacent intervals. Also, you're not handling empty input. Here's the fix: check if intervals is empty first, ensure consistent data structures, and use `start <= result[-1][1]` only for overlapping (not adjacent) intervals unless that's your intent.

Notes

This example shows Skyfall's debugging methodology — it identifies the type mismatch, the boundary condition error, and the edge case in one pass. The 32K context window supports pasting entire modules for review. However, without benchmark data, we can't verify its accuracy against models with published code-evaluation scores.

Prompt

I need to explain our microservices authentication flow to a new backend engineer. We use JWT tokens, a separate auth service, and Redis for token blacklisting. Write a 3-paragraph explanation that covers the happy path, token refresh, and what happens when a user logs out.

Output

When a user logs in, the auth service validates credentials against our user database and generates a JWT containing the user ID, role, and expiration timestamp. This token is returned to the client and included in the Authorization header of subsequent requests. Each microservice validates the JWT signature using a shared public key — no database lookup required for the happy path, which keeps latency low. Tokens expire after 15 minutes. Before expiration, the client can hit `/auth/refresh` with the current token to receive a new one with an extended lifetime. The auth service checks that the original token is valid and not blacklisted, then issues a fresh token. This rotation limits the window of vulnerability if a token is compromised. On logout, the client sends the token to `/auth/logout`. The auth service adds that token's unique ID to a Redis set with a TTL matching the token's remaining lifetime. All services check this blacklist before accepting a token. Once the original expiration passes, Redis automatically removes the entry. This approach handles distributed logout without requiring a synchronous revocation broadcast.

Notes

Skyfall produces structured technical explanations with appropriate detail density for the audience. The three-paragraph constraint was respected, and the explanation balances theory with implementation specifics. At $0.80/Mtok output, this 230-token response costs ~$0.0002 — reasonable for documentation tasks, though pricier than sub-$0.50 alternatives for high-volume generation.

Prompt

Brainstorm 5 creative names for a CLI tool that automatically generates API documentation from OpenAPI specs. The tool is fast, opinionated, and outputs static HTML. Avoid generic words like 'doc', 'api', 'spec'.

Output

1. **Parchment** — evokes the idea of turning raw specifications into readable scrolls; short, memorable, and suggests a finished artifact rather than a process. 2. **Loom** — conveys weaving threads (API endpoints) into a cohesive fabric (documentation); implies both craft and automation. 3. **Herald** — a messenger that announces your API's capabilities; suggests clarity and authority without being overly technical. 4. **Cartographer** — maps out API territory for developers; emphasizes navigation and exploration, fitting for a tool that creates reference material. 5. **Scribe** — classic, implies faithful transcription with a human touch; pairs well with the static-HTML angle and suggests the tool is opinionated about presentation.

Notes

This demonstrates Skyfall's creative ideation capability — each name includes a brief rationale connecting the metaphor to the tool's function. The model avoided the banned terms and provided distinct conceptual angles. For brainstorming tasks, the 32K context window allows including extensive project background or brand guidelines to refine suggestions, though the $0.55 input cost makes it less economical than smaller models for simple prompts.

Use-case deep-dives

Mid-length customer support triage

When Skyfall 36B handles support queues under $200/month

A 4-person SaaS team routing 800 support tickets monthly needs classification and draft responses without blowing their runway. Skyfall 36B at $0.55/$0.80 per Mtok sits in the sweet spot: cheap enough to run on every ticket (average cost ~$0.003 per 500-token exchange), capable enough to handle multi-turn context up to 32k tokens when a thread gets messy. The model lacks public benchmarks, so you're flying blind on accuracy — plan to spot-check the first 100 outputs and set a confidence threshold before going fully automated. If your ticket volume crosses 2,000/month or you need guaranteed 95%+ classification accuracy, switch to a benchmarked alternative. Below that line, Skyfall's price makes it worth the validation overhead.

Internal documentation Q&A prototype

Skyfall 36B as the low-risk RAG backend for team wikis

A 10-person engineering team wants to test retrieval-augmented generation over their Confluence docs before committing to a vendor lock-in. Skyfall 36B's 32k context window fits most chunked wiki pages, and at $0.55 input you can run 500 queries for under $15 in token costs. The lack of public benchmarks means you can't predict hallucination rates, but for an internal prototype where wrong answers get caught by humans, the financial risk is negligible. Run it for two weeks, log every query where the answer felt off, and calculate your error rate. If it stays under 10% and your team likes the UX, keep it. If errors spike or you need customer-facing accuracy, migrate to a model with published MMLU and TruthfulQA scores.

Batch content moderation overnight

When Skyfall handles 10k moderation jobs while your team sleeps

A 3-person community platform reviews 10,000 user posts nightly for policy violations. Skyfall 36B at $0.55 input processes each 200-token post for ~$0.0011, totaling $11/night in compute. The 32k context lets you include your full moderation policy in every prompt without truncation. The gamble: no benchmarks means you don't know its false-negative rate on edge cases like coded hate speech. Set it to flag borderline content for human review rather than auto-removing, and track precision weekly. If you're seeing more than 15% false positives or missing obvious violations, the cost savings aren't worth the reputational risk — upgrade to a model with published safety evals. Below 5,000 posts/day, Skyfall's price justifies the monitoring overhead.

Frequently asked

Is TheDrummer Skyfall 36B V2 good for general text tasks?

Yes, with 36B parameters and a 32K context window, Skyfall handles most text generation, summarization, and analysis tasks competently. The model sits in the mid-range size category—large enough for nuanced responses but small enough to run efficiently. Without public benchmarks, you're relying on community reports rather than verified performance data.

Is Skyfall 36B V2 cheaper than GPT-4o or Claude Sonnet?

Significantly cheaper. At $0.55/$0.80 per Mtok, Skyfall costs roughly 10-15x less than frontier models like GPT-4o or Claude Sonnet 3.5. You're trading raw capability for cost efficiency. If your workload doesn't need bleeding-edge reasoning or multimodal support, the price difference justifies the performance gap for high-volume applications.

Can Skyfall 36B V2 handle 32K token contexts reliably?

The advertised 32K window is standard for models this size, but real-world performance at maximum context varies. Expect coherence to degrade somewhat in the final 20-25% of the window—typical for non-frontier models. For most document analysis or long-form generation under 24K tokens, it should perform adequately without major quality loss.

How does Skyfall 36B V2 compare to Llama 3 70B?

Llama 3 70B has double the parameters and generally outperforms on reasoning and instruction-following. Skyfall's advantage is cost and inference speed—smaller models run faster and cheaper. Without public benchmarks for Skyfall, you're choosing between proven Llama performance and potentially lower operational costs. Test both on your specific use case before committing.

Should I use Skyfall 36B V2 for production chatbots?

Only if cost is your primary constraint and you can tolerate occasional quality issues. The lack of public benchmarks means you're accepting unknown failure modes. Run extensive testing on your domain before deploying. For customer-facing applications where errors damage brand trust, spend more on a frontier model with documented performance.

Data last verified 7 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.