LLManthracite-org

Magnum v4 72B

This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus). The model is fine-tuned on top of [Qwen2.5 72B](https://openrouter.ai/qwen/qwen-2.5-72b-instruct).

Anyone in the Space can @-mention Magnum v4 72B with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Magnum v4 72B targets creative writing and roleplay scenarios where personality consistency and narrative flow matter more than raw reasoning benchmarks. At $3/$5 per Mtok it sits in mid-tier pricing but lacks public benchmark data to validate performance against peers like Llama 3.1 70B or Qwen 2.5 72B. The 16K context window limits long-document work. Reach for this if you need character-driven dialogue or fiction generation and can validate output quality in your own tests, but expect to supplement with a reasoning-focused model for analytical tasks.

Best for

Character-consistent creative writing
Roleplay and dialogue generation
Fiction drafting with narrative voice
Interactive storytelling scenarios

Strengths

Magnum v4 72B is tuned specifically for creative and conversational tasks where maintaining character voice and narrative coherence across exchanges matters. The 72B parameter count gives it enough capacity to handle nuanced personality traits and stylistic consistency within its 16K window. Pricing at $3 input and $5 output per Mtok makes it cheaper than frontier models for high-volume creative workflows where you don't need multimodal or extended reasoning capabilities.

Trade-offs

No public benchmarks means you're flying blind on coding, math, or reasoning performance relative to open alternatives like Qwen 2.5 72B or Llama 3.1 70B. The 16K context window cuts off before you can process full manuscripts or long research documents. Proprietary licensing blocks self-hosting or fine-tuning. If your work involves structured analysis, code generation, or fact-heavy summarization, models with published MMLU or HumanEval scores will give you more predictable results.

Specifications

Provider: anthracite-org
Category: llm
Context length: 16,384 tokens
Max output: 2,048 tokens
Modalities: text
License: proprietary
Released: 2024-10-22

Pricing

Input: $3.00/Mtok
Output: $5.00/Mtok
Model ID: anthracite-org/magnum-v4-72b

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$63.36

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
anthracite-org	16k	$3.00/Mtok	$5.00/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Character Dialogue Expansion

Write a 300-word dialogue between a skeptical detective and an overconfident hacker. The detective is methodical and tired; the hacker is brash but insecure underneath. They're negotiating terms for the hacker's cooperation on a case.

Open in a Space →

Scene Continuation

Continue this scene in the same noir style: 'Rain hammered the warehouse roof. Marquez checked her watch—3:47 AM. The informant was late, or dead, or playing her.' Add 200 words.

Open in a Space →

Character Profile Generation

Create a character profile for a 40-year-old botanist who left academia after a scandal. Include three defining traits, a secret motivation, and two speech patterns that reveal her background.

Open in a Space →

Narrative Voice Adaptation

Rewrite this in a cynical first-person voice: 'The committee approved the funding. Dr. Chen presented her findings with confidence. The board members nodded approvingly.' Make it 100 words, from Chen's perspective.

Open in a Space →

Interactive Story Branching

A thief stands outside a locked vault. Write two 150-word outcomes: one where she picks the lock and finds something unexpected, one where she walks away. Keep her cautious personality consistent in both.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

I have a Python function that recursively calculates Fibonacci numbers. It's correct but painfully slow for n > 30. Show me how to refactor it with memoization and explain the performance difference.

Output

The recursive approach recalculates the same Fibonacci values repeatedly — fib(5) gets computed dozens of times when calculating fib(10). Here's a memoized version using functools.lru_cache: ```python from functools import lru_cache @lru_cache(maxsize=None) def fib(n): if n < 2: return n return fib(n-1) + fib(n-2) ``` This decorator caches results automatically. Time complexity drops from O(2^n) to O(n) because each value is computed once. For fib(35), you'll see execution time fall from ~5 seconds to under a millisecond. The trade-off is memory — you're storing all intermediate results.

Notes

Magnum v4 72B excels at code refactoring tasks with clear pedagogical explanations. The response structure — problem diagnosis, solution, complexity analysis — matches how senior engineers think. The 16K context window handles substantial codebases, though the lack of public benchmarks means performance claims here rely on community reports rather than verified metrics.

Prompt

Write a professional email declining a meeting request from a vendor. I'm genuinely interested in their product but our budget is frozen until Q3. Keep it warm — I want to reconnect in July.

Output

Subject: Re: Meeting Request — Let's Reconnect in Q3 Hi [Name], Thank you for reaching out about [product]. I've reviewed your materials and the capabilities align well with what we're building. Our challenge right now is timing — budget allocations are locked through June. Rather than waste your time with a premature conversation, I'd prefer to connect in early July when we're planning Q3 initiatives. Would you be open to touching base the week of July 7th? I'll have a clearer picture of our priorities and can give your solution the attention it deserves. Best, [Your name]

Notes

This example shows Magnum's strength in professional communication that balances multiple objectives — declining politely, maintaining relationship equity, setting clear next steps. The tone is direct without being curt. At $5/Mtok output pricing, this kind of single-draft business writing becomes cost-effective compared to models with higher inference costs, especially for teams generating substantial correspondence.

Prompt

Explain quantum entanglement to someone who understands basic probability but has no physics background. Use an analogy that doesn't mislead about the actual mechanism.

Output

Imagine two coins that are mysteriously linked. When you flip one and it lands heads, the other instantly becomes tails — no matter how far apart they are. But here's the crucial part: neither coin 'decided' its outcome beforehand. Before measurement, both exist in a superposition of heads and tails simultaneously. Quantum entanglement works similarly. Two particles share a quantum state where measuring one particle's property (like spin) instantly determines the other's complementary property. This isn't communication — no information travels between them. Instead, the measurement collapses their shared wavefunction. The correlation was always there, encoded in their entangled state, but neither particle had a definite value until observed. It's correlation without causation, which is why Einstein found it so troubling.

Notes

Magnum v4 handles technical explanation with appropriate nuance — the analogy is accessible but the follow-up clarifies where the analogy breaks down. This prevents the common pitfall of oversimplified quantum explanations that spread misconceptions. The 72B parameter count likely contributes to this balance between accessibility and accuracy, though without benchmark data, it's difficult to quantify how it compares to similarly-sized models on technical Q&A tasks.

Use-case deep-dives

Internal knowledge base chat

When 16K context covers your company wiki without RAG overhead

A 12-person SaaS startup keeps product docs, onboarding guides, and internal processes in Notion—roughly 40 pages that compress to 14K tokens. Magnum v4 72B fits the entire knowledge base in a single context window, so your team can ask "how do we handle enterprise trials?" and get answers grounded in actual policy without building a vector database. At $3/$5 per Mtok, 200 queries/day costs under $15/month if each query averages 15K input and 400 output tokens. The 16K limit is the hard boundary: once your docs exceed that, you're chunking and losing the simplicity advantage. If your knowledge base fits and query volume stays under 500/day, this beats the engineering cost of a RAG stack.

Slack thread summarization

Why 72B parameter count matters for nuanced team conversation summaries

A 20-person design agency runs client projects in Slack threads that balloon to 80-150 messages over a week. Magnum v4 72B can parse the back-and-forth, track who committed to what, and write a 3-paragraph summary that captures tone shifts and unresolved questions—nuance that smaller models flatten into bullet lists. The 16K window handles threads up to roughly 200 messages depending on verbosity. At $3 input, summarizing 50 threads/week (averaging 12K tokens each) costs $1.80/week. The model's size gives it the reasoning depth to distinguish a firm decision from a tentative suggestion, which is the actual value in summarization. If your threads regularly exceed 16K tokens or you need sub-second latency, look at a smaller model with streaming.

Multi-turn customer support drafting

When output pricing kills the case for long support responses

A 5-person B2B support team uses AI to draft replies to technical questions that arrive via email—typically 3-4 exchanges per ticket, each response 300-500 words. Magnum v4 72B writes coherent, context-aware replies that reference earlier messages in the thread, but the $5/Mtok output rate means a 400-word response (roughly 530 tokens) costs $0.00265 per draft. At 60 tickets/day with 3 drafts each, that's $14.30/day or $286/month just on output tokens. The model handles the task well, but the economics don't close unless your support tickets justify premium pricing or you're using drafts as starting points that humans heavily edit. If you're drafting at volume, a model with sub-$1 output pricing cuts your bill by 80% with minimal quality loss for this use case.

Frequently asked

Is Magnum v4 72B good for creative writing and roleplay?

Yes. Magnum v4 is specifically tuned for creative writing, storytelling, and character roleplay. The 72B parameter count gives it enough depth for nuanced dialogue and narrative coherence. At 16K context, you can maintain longer conversations or story threads without losing character consistency. It's a solid choice if you prioritize creative output over raw benchmark performance.

Is Magnum v4 72B cheaper than GPT-4o or Claude Sonnet?

Yes, significantly. At $3/$5 per Mtok, Magnum v4 costs roughly 60-70% less than GPT-4o and Claude Sonnet 3.5 for most workloads. The trade-off is no vision support, a smaller context window, and less polish on technical tasks. If your use case is creative text generation and you're cost-sensitive, the savings add up fast.

Can Magnum v4 72B handle technical coding tasks?

Not reliably. Magnum v4 is optimized for creative and conversational tasks, not code generation or debugging. It lacks the instruction-following precision and technical grounding of models like GPT-4o, Claude, or Llama 3.3 70B. For anything beyond basic scripting or pseudocode, use a model explicitly trained on code.

How does Magnum v4 72B compare to Llama 3.3 70B?

Different strengths. Llama 3.3 70B has a 128K context window, better instruction-following, and stronger reasoning. Magnum v4 72B trades those for more creative, less constrained output and lower pricing. If you need a chatbot that feels less robotic or generates fiction, Magnum wins. For structured tasks or long-document work, Llama 3.3 is the better pick.

Should I use Magnum v4 72B for customer support chatbots?

Probably not. The 16K context is tight for multi-turn support threads with knowledge base retrieval. Magnum's creative tuning also means it may generate plausible-sounding but inaccurate responses instead of admitting uncertainty. For support, you want a model with stronger factual grounding and longer context—GPT-4o mini or Claude Haiku are safer bets.