Sao10K: Llama 3.1 Euryale 70B v2.2
Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/models/sao10k/l3-euryale-70b).
Anyone in the Space can @-mention Sao10K: Llama 3.1 Euryale 70B v2.2 with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Long-form creative fiction generation
- Multi-turn roleplay with character consistency
- Narrative-driven chatbot experiences
- Story brainstorming and plot development
Strengths
This fine-tune prioritizes narrative coherence and character voice over raw benchmark scores. The 131K context window supports extended story arcs without losing thread, and the symmetric $0.85/Mtok pricing makes iterative creative workflows affordable. Community feedback highlights stronger emotional range and less repetitive phrasing than base Llama 3.1 70B in creative contexts. The model maintains distinct character personalities across long conversations without constant re-prompting.
Trade-offs
No public benchmarks means you're flying blind on factual accuracy, math reasoning, and code generation relative to peers. This model sacrifices general-purpose performance for creative specialization — expect weaker results on analytical tasks, structured data extraction, or technical Q&A compared to base Llama 3.1 70B or GPT-4 class models. The proprietary license limits commercial deployment options. If your use case needs verifiable accuracy or strong reasoning, look elsewhere.
Specifications
- Provider
- sao10k
- Category
- llm
- Context length
- 131,072 tokens
- Max output
- 16,384 tokens
- Modalities
- text
- License
- proprietary
- Released
- 2024-08-28
Pricing
- Input
- $0.85/Mtok
- Output
- $0.85/Mtok
- Model ID
sao10k/l3.1-euryale-70b
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| sao10k | 131k | $0.85/Mtok | $0.85/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Character Backstory Expansion
I have a character named Elena, a former architect now living off-grid. Expand her backstory: what drove her to leave the city, what skills she brought with her, and one formative childhood memory that explains her self-reliance.Open in a Space →
Dialogue Scene Drafting
Write a tense dinner conversation between three siblings arguing over selling their late mother's house. Each has different financial needs and emotional attachments. Keep their voices distinct and let subtext drive the conflict.Open in a Space →
Plot Branch Exploration
In my story, the protagonist just discovered her mentor has been lying about his past. Give me three different ways this revelation could unfold in the next chapter, each with different emotional tones and consequences.Open in a Space →
Setting Atmosphere Building
Describe an abandoned research station on Europa's ice surface. Focus on sensory details that convey isolation and unease. The protagonist is about to discover something that changes everything — set that tone without revealing the discovery.Open in a Space →
Character Voice Consistency Check
My character Marcus is a cynical ex-cop who uses dark humor as a defense mechanism. Write his internal monologue as he walks through a crime scene that reminds him why he quit the force. Keep his voice consistent with someone who's seen too much but still cares.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
I'm building a fantasy RPG dialogue system. Write a conversation between a weary innkeeper and a suspicious traveler who's asking too many questions about the local lord's movements.
The innkeeper sets down a clay mug with deliberate slowness. "Lord Vasrick? Keeps to himself these days. Winter's hard on old bones." Her eyes don't leave the traveler's hands. "You're the third stranger this tenday asking after him. Funny, that." She wipes the counter in slow circles. "The roads aren't safe for curious folk. Bandits, they say. Though some bandits ask questions before they rob you." A pause. "Your coin's good here, but your questions aren't. Drink up and move along come morning."
Euryale excels at creative writing with strong character voice and subtext. The 131k context window supports complex narrative threads across long story sessions. This fine-tune prioritizes literary quality over strict instruction-following, so users seeking technical precision may find responses drift toward narrative embellishment.
Analyze this relationship dynamic: Two co-founders have been arguing about product direction for months. One wants to pivot to enterprise, the other believes their consumer traction is about to break through. How should their board navigate this?
This is a classic vision-versus-validation deadlock. The enterprise advocate likely sees predictable revenue and shorter sales cycles; the consumer defender is betting on network effects that haven't materialized yet. The board shouldn't pick a side—they should demand evidence. Set a 90-day experiment: allocate 20% of resources to one enterprise pilot with clear success metrics. If consumer metrics don't inflect upward in that window, the market has spoken. The real risk isn't the wrong direction—it's burning runway while two smart people argue past each other. Force the data to referee, then back the winner completely.
The model handles nuanced reasoning about interpersonal and strategic problems well, producing analysis that balances multiple perspectives. The response structure is clear without being formulaic. However, with no public benchmarks available, users should verify factual accuracy on specialized domains before relying on outputs for high-stakes decisions.
Rewrite this customer support email to be more empathetic but still set boundaries: 'Your refund request is outside our 30-day window. We can't process this. Policy is policy.'
I understand how frustrating this situation must be, and I genuinely wish I could resolve it the way you're hoping. Our 30-day refund window exists so we can manage inventory and process returns while products are still current, but I realize that doesn't make your situation any easier. While I'm not able to approve a refund at this point, I'd like to explore what other options might work for you—perhaps store credit toward a future purchase, or connecting you with our product team if there's a quality issue we should know about. Would either of those help?
Euryale's fine-tuning shows strength in emotionally intelligent rewriting tasks, maintaining warmth while preserving necessary boundaries. The output demonstrates natural phrasing without corporate jargon. The model's creative writing bias means it may over-elaborate on simpler requests—users wanting terse, technical responses should provide explicit length constraints in their prompts.
Use-case deep-dives
When 131K context lets your writing team iterate without re-prompting
A 4-person content studio building serialized fiction needs a model that holds 40+ pages of story bible, character sheets, and draft chapters in memory while generating new scenes. Llama 3.1 Euryale 70B v2.2 handles this at $0.85/Mtok — cheaper than GPT-4 class models — with a 131K context window that keeps the entire narrative arc loaded across sessions. The trade-off: no public benchmarks mean you're testing creative coherence yourself, and response quality on technical tasks is unproven. If your team writes 200K+ tokens/month and needs long-context creative consistency over coding or analysis, this model delivers the window at a price point that makes daily iteration affordable.
Budget-friendly long-context retrieval for small team wikis
A 10-person startup maintains 80 pages of onboarding docs, process guides, and product specs in Notion. They need a model that can ingest the full knowledge base and answer new-hire questions without chunking or vector search infrastructure. Euryale's 131K window fits the entire corpus in a single prompt at $0.85/Mtok — roughly half the cost of Claude 3.5 Sonnet for the same context load. The risk: without published benchmarks, you won't know how it performs on factual recall or instruction-following until you test it against your own docs. If you're under 500 queries/month and can afford a week of validation, this model offers long-context access at a price that doesn't require VC funding.
When 2-hour call transcripts need single-pass summaries on a tight budget
A 6-person agency records client strategy calls that run 90-120 minutes, generating 25K-35K token transcripts. They need accurate summaries with action items, but can't justify $4/Mtok models for 40 calls/month. Euryale processes the full transcript in one pass at $0.85/Mtok, avoiding the context-splitting errors that plague smaller models. The catch: no MMLU or instruction-following scores mean summarization quality is a black box until you run your own evals. If your team does under 50 calls/month and can spot-check the first 10 summaries for accuracy, this model cuts your transcript processing cost by 60-80% compared to frontier alternatives.
Frequently asked
Is Llama 3.1 Euryale 70B good for creative writing and roleplay?
Yes, this is exactly what it's built for. Euryale is a community fine-tune of Llama 3.1 70B specifically optimized for creative storytelling, character consistency, and roleplay scenarios. It trades some benchmark performance for better narrative coherence and less corporate guardrails. If you're generating fiction or running character-based interactions, this beats base Llama 3.1 70B.
Is $0.85 per Mtok worth it compared to base Llama 3.1 70B?
It's the same price as most hosted Llama 3.1 70B endpoints, but you're paying for the creative fine-tune. If you need the model for coding or factual Q&A, stick with base Llama 3.1 70B. If you're doing creative work where personality and narrative flow matter more than raw accuracy, the tuning justifies the cost. You won't find this cheaper elsewhere for the same quality.
Can it handle 131k token context windows reliably?
The model inherits Llama 3.1's 131k context window, but practical performance degrades past 80-90k tokens like most long-context models. For creative work, you'll rarely need the full window anyway. If you're doing multi-chapter novel generation or massive worldbuilding documents, test carefully past 100k tokens. For typical roleplay sessions under 50k tokens, it's solid.
How does Euryale v2.2 compare to the original Euryale release?
Version 2.2 refines instruction-following and reduces repetition issues from earlier versions. The base model upgrade to Llama 3.1 also brings better multilingual support and reasoning. If you used v1.x and hit loops or inconsistent character voices, v2.2 fixes most of that. The creative tuning is more stable without sacrificing the personality that made Euryale popular.
Should I use this for customer-facing chatbots?
No. Euryale's fine-tuning removes safety guardrails and optimizes for creative freedom, not business use cases. It can generate content that violates typical content policies. For customer support, documentation, or any regulated application, use base Llama 3.1 70B Instruct or a model with commercial safety tuning. Use Euryale only for internal creative tools or personal projects.