TheDrummer: Skyfall 36B V2
Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling.
Anyone in the Space can @-mention TheDrummer: Skyfall 36B V2 with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Cost-sensitive code review and debugging
- Structured data extraction from documents
- Multi-turn technical support conversations
- Internal tooling with moderate reasoning needs
Strengths
The 36B parameter count sits in a practical zone: large enough for nuanced reasoning on technical tasks, small enough to keep inference costs manageable. The 32K context window supports full codebases or multi-page documents in a single pass. At $0.55 input, it undercuts most 70B models by 40-60% while delivering more headroom than 7B alternatives. The pricing structure favors read-heavy workflows — data analysis, code audits, document parsing — where input tokens dominate.
Trade-offs
No public benchmark data means you're flying blind on MMLU, HumanEval, or reasoning evals relative to peers like Llama 3.1 70B or Qwen 2.5 72B. The 32K context is half what newer models offer, limiting use on very long documents or deep codebases. Output pricing at $0.80 per Mtok climbs quickly on generation-heavy tasks like drafting or creative writing. Without visibility into training data or fine-tuning approach, you can't predict performance on domain-specific tasks until you run your own evals.
Specifications
- Provider
- thedrummer
- Category
- llm
- Context length
- 32,768 tokens
- Max output
- 32,768 tokens
- Modalities
- text
- License
- proprietary
- Released
- 2025-03-10
Pricing
- Input
- $0.55/Mtok
- Output
- $0.80/Mtok
- Model ID
thedrummer/skyfall-36b-v2
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| thedrummer | 33k | $0.55/Mtok | $0.80/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Code Review Assistant
Review this pull request for logic errors, style inconsistencies, and potential runtime issues. Highlight any edge cases the code doesn't handle. Provide specific line references and suggest fixes.Open in a Space →
Extract Structured Data
Extract all key data points from this document into a JSON object. Include dates, amounts, parties involved, and any critical terms. Flag any ambiguous or missing information.Open in a Space →
Technical Q&A Bot
Answer this technical question based on the documentation provided. Be precise, cite specific sections, and include code examples where relevant. If the docs don't cover it, say so clearly.Open in a Space →
Summarize Long Threads
Summarize this conversation thread. List the key decisions made, action items assigned, and any unresolved questions. Keep it under 150 words.Open in a Space →
Debug Error Messages
Analyze this error log and stack trace. Identify the most likely root cause, explain why it's happening, and suggest 2-3 concrete steps to fix it.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Review this Python function for potential bugs. It's supposed to merge overlapping time intervals but users report missing data: `def merge_intervals(intervals): result = []; for start, end in sorted(intervals): if result and start <= result[-1][1]: result[-1][1] = max(result[-1][1], end); else: result.append([start, end]); return result`
The bug is in the tuple unpacking and list mutation. You're sorting tuples but appending lists, then trying to mutate `result[-1][1]` which fails because tuples are immutable if the input was tuples. Even if input is lists, the logic has an off-by-one: `start <= result[-1][1]` should be `start <= result[-1][1] + 1` for truly adjacent intervals. Also, you're not handling empty input. Here's the fix: check if intervals is empty first, ensure consistent data structures, and use `start <= result[-1][1]` only for overlapping (not adjacent) intervals unless that's your intent.
This example shows Skyfall's debugging methodology — it identifies the type mismatch, the boundary condition error, and the edge case in one pass. The 32K context window supports pasting entire modules for review. However, without benchmark data, we can't verify its accuracy against models with published code-evaluation scores.
I need to explain our microservices authentication flow to a new backend engineer. We use JWT tokens, a separate auth service, and Redis for token blacklisting. Write a 3-paragraph explanation that covers the happy path, token refresh, and what happens when a user logs out.
When a user logs in, the auth service validates credentials against our user database and generates a JWT containing the user ID, role, and expiration timestamp. This token is returned to the client and included in the Authorization header of subsequent requests. Each microservice validates the JWT signature using a shared public key — no database lookup required for the happy path, which keeps latency low. Tokens expire after 15 minutes. Before expiration, the client can hit `/auth/refresh` with the current token to receive a new one with an extended lifetime. The auth service checks that the original token is valid and not blacklisted, then issues a fresh token. This rotation limits the window of vulnerability if a token is compromised. On logout, the client sends the token to `/auth/logout`. The auth service adds that token's unique ID to a Redis set with a TTL matching the token's remaining lifetime. All services check this blacklist before accepting a token. Once the original expiration passes, Redis automatically removes the entry. This approach handles distributed logout without requiring a synchronous revocation broadcast.
Skyfall produces structured technical explanations with appropriate detail density for the audience. The three-paragraph constraint was respected, and the explanation balances theory with implementation specifics. At $0.80/Mtok output, this 230-token response costs ~$0.0002 — reasonable for documentation tasks, though pricier than sub-$0.50 alternatives for high-volume generation.
Brainstorm 5 creative names for a CLI tool that automatically generates API documentation from OpenAPI specs. The tool is fast, opinionated, and outputs static HTML. Avoid generic words like 'doc', 'api', 'spec'.
1. **Parchment** — evokes the idea of turning raw specifications into readable scrolls; short, memorable, and suggests a finished artifact rather than a process. 2. **Loom** — conveys weaving threads (API endpoints) into a cohesive fabric (documentation); implies both craft and automation. 3. **Herald** — a messenger that announces your API's capabilities; suggests clarity and authority without being overly technical. 4. **Cartographer** — maps out API territory for developers; emphasizes navigation and exploration, fitting for a tool that creates reference material. 5. **Scribe** — classic, implies faithful transcription with a human touch; pairs well with the static-HTML angle and suggests the tool is opinionated about presentation.
This demonstrates Skyfall's creative ideation capability — each name includes a brief rationale connecting the metaphor to the tool's function. The model avoided the banned terms and provided distinct conceptual angles. For brainstorming tasks, the 32K context window allows including extensive project background or brand guidelines to refine suggestions, though the $0.55 input cost makes it less economical than smaller models for simple prompts.
Use-case deep-dives
When Skyfall 36B handles support queues under $200/month
A 4-person SaaS team routing 800 support tickets monthly needs classification and draft responses without blowing their runway. Skyfall 36B at $0.55/$0.80 per Mtok sits in the sweet spot: cheap enough to run on every ticket (average cost ~$0.003 per 500-token exchange), capable enough to handle multi-turn context up to 32k tokens when a thread gets messy. The model lacks public benchmarks, so you're flying blind on accuracy — plan to spot-check the first 100 outputs and set a confidence threshold before going fully automated. If your ticket volume crosses 2,000/month or you need guaranteed 95%+ classification accuracy, switch to a benchmarked alternative. Below that line, Skyfall's price makes it worth the validation overhead.
Skyfall 36B as the low-risk RAG backend for team wikis
A 10-person engineering team wants to test retrieval-augmented generation over their Confluence docs before committing to a vendor lock-in. Skyfall 36B's 32k context window fits most chunked wiki pages, and at $0.55 input you can run 500 queries for under $15 in token costs. The lack of public benchmarks means you can't predict hallucination rates, but for an internal prototype where wrong answers get caught by humans, the financial risk is negligible. Run it for two weeks, log every query where the answer felt off, and calculate your error rate. If it stays under 10% and your team likes the UX, keep it. If errors spike or you need customer-facing accuracy, migrate to a model with published MMLU and TruthfulQA scores.
When Skyfall handles 10k moderation jobs while your team sleeps
A 3-person community platform reviews 10,000 user posts nightly for policy violations. Skyfall 36B at $0.55 input processes each 200-token post for ~$0.0011, totaling $11/night in compute. The 32k context lets you include your full moderation policy in every prompt without truncation. The gamble: no benchmarks means you don't know its false-negative rate on edge cases like coded hate speech. Set it to flag borderline content for human review rather than auto-removing, and track precision weekly. If you're seeing more than 15% false positives or missing obvious violations, the cost savings aren't worth the reputational risk — upgrade to a model with published safety evals. Below 5,000 posts/day, Skyfall's price justifies the monitoring overhead.
Frequently asked
Is TheDrummer Skyfall 36B V2 good for general text tasks?
Yes, with 36B parameters and a 32K context window, Skyfall handles most text generation, summarization, and analysis tasks competently. The model sits in the mid-range size category—large enough for nuanced responses but small enough to run efficiently. Without public benchmarks, you're relying on community reports rather than verified performance data.
Is Skyfall 36B V2 cheaper than GPT-4o or Claude Sonnet?
Significantly cheaper. At $0.55/$0.80 per Mtok, Skyfall costs roughly 10-15x less than frontier models like GPT-4o or Claude Sonnet 3.5. You're trading raw capability for cost efficiency. If your workload doesn't need bleeding-edge reasoning or multimodal support, the price difference justifies the performance gap for high-volume applications.
Can Skyfall 36B V2 handle 32K token contexts reliably?
The advertised 32K window is standard for models this size, but real-world performance at maximum context varies. Expect coherence to degrade somewhat in the final 20-25% of the window—typical for non-frontier models. For most document analysis or long-form generation under 24K tokens, it should perform adequately without major quality loss.
How does Skyfall 36B V2 compare to Llama 3 70B?
Llama 3 70B has double the parameters and generally outperforms on reasoning and instruction-following. Skyfall's advantage is cost and inference speed—smaller models run faster and cheaper. Without public benchmarks for Skyfall, you're choosing between proven Llama performance and potentially lower operational costs. Test both on your specific use case before committing.
Should I use Skyfall 36B V2 for production chatbots?
Only if cost is your primary constraint and you can tolerate occasional quality issues. The lack of public benchmarks means you're accepting unknown failure modes. Run extensive testing on your domain before deploying. For customer-facing applications where errors damage brand trust, spend more on a frontier model with documented performance.