LLMsao10k

Sao10K: Llama 3.3 Euryale 70B

Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b).

Anyone in the Space can @-mention Sao10K: Llama 3.3 Euryale 70B with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Sao10K's Llama 3.3 Euryale 70B is a community fine-tune of Meta's Llama 3.3 70B, optimized for creative writing and roleplay scenarios. It trades benchmark-topping accuracy for more expressive, character-consistent outputs in narrative contexts. At $0.65/$0.75 per Mtok, it undercuts frontier models by 10x while delivering a 128k context window. Reach for this when you need long-form creative generation or interactive fiction, not when you need factual precision or code generation.

Best for

Long-form creative writing and storytelling
Character-driven roleplay and dialogue
Interactive fiction with extended context
Narrative consistency across multi-turn conversations
Budget-conscious creative content generation

Strengths

Euryale 70B excels at maintaining character voice and narrative coherence across extended conversations, a direct result of its fine-tuning dataset. The 128k context window lets you feed entire story outlines or character sheets without truncation. Pricing sits at roughly one-tenth the cost of GPT-4 class models, making it viable for high-volume creative workflows. The Llama 3.3 base gives it strong instruction-following within its target domain.

Trade-offs

This model sacrifices factual accuracy and reasoning depth for creative expressiveness. Expect weaker performance on math, code, and structured data tasks compared to base Llama 3.3 70B or frontier models. No public benchmarks exist yet, so you're flying blind on standard evals. The proprietary license limits commercial use cases depending on your agreement with Sao10K. Latency will be higher than smaller models, though the 70B parameter count keeps it faster than 405B alternatives.

Specifications

Provider: sao10k
Category: llm
Context length: 131,072 tokens
Max output: 16,384 tokens
Modalities: text
License: proprietary
Released: 2024-12-18

Pricing

Input: $0.65/Mtok
Output: $0.75/Mtok
Model ID: sao10k/l3.3-euryale-70b

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$11.97

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
sao10k	131k	$0.65/Mtok	$0.75/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Character Backstory Expansion

I have a character named Aria, a former military pilot turned smuggler in a cyberpunk setting. She's pragmatic, distrusts authority, and has a dry sense of humor. Write a 500-word backstory explaining how she left the military and what drives her now.

Open in a Space →

Multi-Character Dialogue Scene

Write a tense negotiation scene between three characters: a cautious merchant, an impatient bounty hunter, and a smooth-talking diplomat. Each has conflicting goals around a stolen artifact. Keep their voices distinct and show their motivations through subtext.

Open in a Space →

Story Continuation with Context

Here's the first chapter of my fantasy novel: [paste 3000 words]. Continue the story from where it left off, maintaining the same tone, pacing, and character voices. Focus on the brewing conflict between the protagonist and the council.

Open in a Space →

World-Building Detail Generation

Describe a bustling spaceport on a frontier colony world. Include sensory details, the mix of cultures present, local slang, and three specific locations within the port that could serve as story settings. Make it feel gritty and functional, not utopian.

Open in a Space →

Interactive Roleplay Session

You are Kael, a grizzled starship mechanic who's seen too much and trusts too little. I'm a new crew member asking for your help repairing a damaged engine. Stay in character, respond to my questions with Kael's personality, and reveal backstory naturally through dialogue.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

I have a Python function that's become a 200-line monster with nested conditionals. Walk me through how to refactor it into smaller, testable pieces without changing behavior.

Output

Let's break this down systematically. First, identify each distinct responsibility — input validation, business logic branches, data transformation, output formatting. Extract the validation block into a separate function with clear success/failure returns. For those nested conditionals, consider the strategy pattern: each branch becomes its own handler class or function. The parent function becomes a router that selects the right handler based on input state. This gives you isolated units you can test independently. Keep the original function as a thin orchestrator that calls these new pieces in sequence. Run your existing test suite after each extraction to confirm behavior hasn't changed.

Notes

This example shows the model's structured approach to complex refactoring tasks. The 131K context window supports pasting entire codebases for analysis. Without public benchmarks, actual code-generation quality and edge-case handling remain unverified in production scenarios.

Prompt

Explain the difference between TCP and UDP to a junior developer who keeps choosing the wrong protocol for their services.

Output

Think of TCP as certified mail — you get confirmation of delivery, packets arrive in order, and if something goes missing, it gets resent automatically. This reliability costs overhead: handshakes, acknowledgments, retransmission delays. Use TCP when correctness matters more than speed — database connections, file transfers, HTTP requests. UDP is like shouting across a room — fast, low overhead, but no guarantees. Packets might arrive out of order, get lost, or duplicate. Perfect for real-time scenarios where old data becomes worthless: video calls, gaming, live metrics. A dropped frame in a video stream doesn't need retransmission; the next frame is already coming. Choose based on whether you can tolerate loss.

Notes

Demonstrates the model's ability to translate technical concepts into concrete analogies without oversimplifying. The explanation balances accuracy with accessibility. The 70B parameter count suggests nuanced responses, though teaching effectiveness varies by topic complexity.

Prompt

I'm debugging a race condition in a multi-threaded checkout system where inventory counts occasionally go negative. What's your diagnostic approach?

Output

Classic read-modify-write race. Two threads read inventory count 1, both see it's available, both decrement, you end up at -1. Start by adding logging around the inventory check and decrement — timestamp each operation with thread ID. Look for interleaved sequences where the gap between read and write exceeds a few milliseconds. Your fix depends on your stack: database-level, use SELECT FOR UPDATE to lock the row during the transaction. Application-level, wrap the check-and-decrement in a mutex or use atomic compare-and-swap operations. If you're on a distributed system, consider optimistic locking with version numbers — retry the transaction if the version changed between read and write. Test under load with a script that hammers the same SKU simultaneously.

Notes

Shows diagnostic reasoning for concurrency bugs with multiple solution paths. The response adapts to different architectural contexts. At $0.65/$0.75 per million tokens, this is mid-range pricing for a 70B model, making it viable for extended debugging sessions.

Use-case deep-dives

Creative writing collaboration

When a 70B fine-tune beats frontier models on narrative work

A 4-person content studio writing serialized fiction for Substack needs consistent character voices across 8,000-word chapters. Llama 3.3 Euryale 70B handles this at $0.65/$0.75 per Mtok—roughly 40% cheaper than GPT-4 class models—while the 131k context window fits two full chapters plus style guides in a single call. The Euryale fine-tune prioritizes creative coherence over instruction-following precision, so it's stronger on dialogue flow and tonal consistency than base Llama 3.3, weaker on structured JSON output or multi-step reasoning. If your team writes more than 200k words/month and needs memorable prose over perfect task execution, this is the call. Below that volume, the setup cost of learning its quirks outweighs the savings.

Long-context research synthesis

Why 131k context makes this viable for multi-document analysis

A 3-person policy research team needs to compare 12 white papers (averaging 9k words each) and extract conflicting recommendations into a brief. Llama 3.3 Euryale 70B fits all 12 documents in one 108k-token call, eliminating the chunking and re-ranking overhead that breaks cheaper models. At $0.65 input per Mtok, processing the full corpus costs under $0.08—10x cheaper than Claude Opus on the same task. The trade-off: no public benchmarks mean you're flying blind on accuracy until you test it on your own corpus. If your documents are dense with citations and you can afford a 2-day eval sprint, this wins. If you need guaranteed factual precision on medical or legal sources, pay up for Gemini 1.5 Pro or Claude.

High-frequency chatbot prototyping

When to use a 70B fine-tune for internal tool experiments

A 10-person SaaS startup is prototyping 5 different customer support chatbot personalities to A/B test on 2,000 beta users. Llama 3.3 Euryale 70B costs $0.75/Mtok on output—half the price of GPT-4o—so running 50k test conversations across 5 variants costs $600 instead of $1,200. The 70B parameter count gives you enough nuance to test tone and empathy variations without the latency or cost of 405B models. The risk: Euryale's creative fine-tuning may generate more off-script responses than a safety-tuned base model, so you'll need tighter prompt guardrails and a human review sample. If you're iterating fast on tone and can tolerate 5% weird responses in beta, this is the right price point. Once you pick a winner and go to production, switch to a safety-hardened model.

Frequently asked

Is Sao10K Llama 3.3 Euryale 70B good for creative writing?

Yes, this is a creative-focused fine-tune of Llama 3.3 70B. Sao10K models typically prioritize storytelling, roleplay, and character consistency over strict instruction-following. The 131K context window handles novel-length drafts. If you need factual accuracy or structured outputs, use base Llama 3.3 70B instead.

Is Sao10K Euryale 70B cheaper than Claude Sonnet?

Much cheaper. At $0.65/$0.75 per million tokens, you pay roughly one-tenth of Claude 3.5 Sonnet's rates. For creative work where you're iterating on long-form content, the cost difference compounds fast. The trade-off is less reasoning depth and no multimodal support.

Can it handle 131K tokens without quality degradation?

The model supports 131K context, but fine-tunes on Llama 3.3 often show attention decay past 64K tokens in practice. For creative writing, keep critical plot points and character details in the first 50K tokens. Test your specific use case—results vary by prompt structure and content type.

How does Euryale 70B compare to base Llama 3.3 70B?

Euryale trades instruction-following precision for creative flexibility. Base Llama 3.3 70B scores higher on MMLU and coding benchmarks but feels more constrained in fiction. Euryale is tuned for less refusal behavior and more narrative coherence. Choose base for analysis, Euryale for storytelling.

Should I use this for production chatbots?

No. Sao10K fine-tunes are optimized for creative roleplay, not customer support or factual Q&A. The model may generate overly elaborate responses or ignore strict formatting requirements. For production chat, use instruction-tuned models like Llama 3.3 70B Instruct or GPT-4o mini.