LLMsao10k

Sao10K: Llama 3.1 70B Hanami x1

This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b).

Anyone in the Space can @-mention Sao10K: Llama 3.1 70B Hanami x1 with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Llama 3.1 70B Hanami x1 is a community fine-tune optimized for creative writing and roleplay scenarios. It trades benchmark performance for stronger character consistency and narrative flow in fiction contexts. The symmetric $3/Mtok pricing sits mid-range for 70B models, making it viable for hobbyist writers and creative projects where vanilla Llama's instruction-following rigidity gets in the way. Reach for this when you need a model that understands story beats over strict task completion.

Best for

Creative fiction and storytelling
Character-driven roleplay scenarios
Narrative consistency across long exchanges
Dialogue writing with distinct voices
World-building and lore development

Strengths

This fine-tune prioritizes narrative coherence and character voice over general-purpose instruction following. The 16K context window handles multi-turn creative exchanges without losing thread. Symmetric input/output pricing removes the penalty for generating longer creative outputs. Community reports suggest stronger emotional range and less corporate-safe hedging than base Llama 3.1 70B, making it useful when you need a model that commits to a creative direction rather than qualifying every response.

Trade-offs

No public benchmarks means you're flying blind on factual accuracy, reasoning, and coding tasks. The creative fine-tuning likely degrades performance on structured outputs like JSON generation or technical documentation. The 16K context is half what you get from base Llama 3.1 70B (128K), limiting its use for long-document tasks. Proprietary license restricts commercial deployment options. If you need reliable task completion or factual grounding, standard instruction-tuned models will outperform this consistently.

Specifications

Provider: sao10k
Category: llm
Context length: 16,000 tokens
Max output: —
Modalities: text
License: proprietary
Released: 2025-01-08

Pricing

Input: $3.00/Mtok
Output: $3.00/Mtok
Model ID: sao10k/l3.1-70b-hanami-x1

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$52.80

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
sao10k	16k	$3.00/Mtok	$3.00/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Character Backstory Development

Create a backstory for a character who left their family's merchant business to become a cartographer. Include three formative memories that shaped this decision and how they rationalize it to themselves versus what they tell others.

Open in a Space →

Scene Continuation

Continue this scene: Maya pushed open the workshop door, sawdust catching the afternoon light. Her mentor looked up from the half-finished violin, and she knew immediately he'd heard about the audition. Continue for 200 words, showing their conversation through action and subtext.

Open in a Space →

Dialogue Rewrite

Rewrite this exchange to show the power dynamic between a new apprentice and a resentful senior craftsperson: 'Can you show me how to do that?' 'I suppose.' 'Thanks.' Make each line reveal character through word choice and rhythm.

Open in a Space →

World-Building Details

Describe the daily routine of a lighthouse keeper in a world where the ocean glows at night and attracts dangerous flying creatures. Focus on three practical adaptations they've made to survive and one superstition they privately hold.

Open in a Space →

Emotional Beat Analysis

Analyze the emotional beats in this scene and suggest where to add physical details or internal thought to strengthen the reader's connection: [paste your scene]. Focus on moments where emotion shifts or intensifies.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

I'm writing a fantasy novel and need help developing a morally ambiguous antagonist who believes they're saving the world. Give me three character beats that make readers sympathize despite their methods.

Output

Here are three beats that humanize your antagonist: 1. **The Personal Loss**: Show them visiting a memorial of someone they couldn't save under the old system—a child, a mentor, someone whose preventable death catalyzed their extremism. Let readers see the wound that never healed. 2. **The Moment of Doubt**: Mid-campaign, have them encounter someone who reminds them of who they used to be. They hesitate, almost pull back—then double down because retreat would mean all prior sacrifices were meaningless. The tragedy is they're trapped by their own commitment. 3. **The Unreceived Letter**: They write regular letters to someone they've lost or estranged, explaining their choices, seeking validation they'll never get. These private moments show self-awareness and loneliness—they know the cost, they pay it anyway. The key is showing them as someone who chose wrong for comprehensible reasons, not someone who enjoys cruelty.

Notes

This example highlights the model's creative writing assistance capabilities—it provides structured narrative advice with emotional depth. The 16K context window supports longer story development sessions. However, without benchmark data, we can't verify how its creative suggestions compare to other 70B-class models in consistency or originality across extended conversations.

Prompt

Explain the difference between React's useEffect and useLayoutEffect hooks, and give me a concrete scenario where using the wrong one causes a visible bug.

Output

**useEffect** runs *after* the browser paints. React updates the DOM, the browser renders it to screen, then your effect runs. It's asynchronous relative to rendering. **useLayoutEffect** runs *before* the browser paints, synchronously after DOM mutations but before the user sees anything. **The Bug Scenario**: You're building a tooltip that needs to position itself based on a target element's dimensions. With useEffect: ```jsx const [position, setPosition] = useState({ top: 0, left: 0 }); useEffect(() => { const rect = targetRef.current.getBoundingClientRect(); setPosition({ top: rect.bottom, left: rect.left }); }, []); ``` Users see the tooltip flash at (0,0) before jumping to the correct position—because the browser painted before your effect calculated the position. With useLayoutEffect, the calculation happens before paint, eliminating the flash. Use useLayoutEffect for any DOM measurements that affect immediate visual output.

Notes

Demonstrates technical explanation ability with practical code examples. The model structures the answer logically and provides a debugging scenario developers actually encounter. At $3/$3 per Mtok, it's mid-priced for a 70B model, making it viable for technical documentation tasks where the 16K window handles moderate codebases, though larger architectural discussions may hit limits.

Prompt

I have a dataset of customer support tickets. Write a Python function that clusters them by topic using sentence embeddings, then generates a one-sentence summary for each cluster. Assume I'm using sentence-transformers.

Output

Here's a clustering pipeline with summary generation: ```python from sentence_transformers import SentenceTransformer from sklearn.cluster import KMeans import numpy as np def cluster_and_summarize(tickets, n_clusters=5): model = SentenceTransformer('all-MiniLM-L6-v2') embeddings = model.encode(tickets) kmeans = KMeans(n_clusters=n_clusters, random_state=42) labels = kmeans.fit_predict(embeddings) summaries = [] for i in range(n_clusters): cluster_tickets = [tickets[j] for j in range(len(tickets)) if labels[j] == i] # Find ticket closest to centroid as representative cluster_embeddings = embeddings[labels == i] centroid = kmeans.cluster_centers_[i] distances = np.linalg.norm(cluster_embeddings - centroid, axis=1) representative = cluster_tickets[np.argmin(distances)] summaries.append(f"Cluster {i}: {representative[:100]}...") return labels, summaries ``` This uses the centroid-nearest ticket as a summary proxy. For true generated summaries, you'd need to pass cluster tickets to an LLM separately.

Notes

Shows data science workflow understanding with practical library usage. The code is functional and includes the important caveat about summary generation limits. However, the model itself can't verify this code executes correctly, and the 16K context means you can't paste in large ticket datasets for inline analysis—you'd need to describe them abstractly or work in chunks.

Use-case deep-dives

Creative fiction drafting

When you need character-consistent dialogue at $3/Mtok flat rate

A 4-person game studio writing branching narrative for an indie RPG needs 80k words of NPC dialogue that stays in-character across 12 story arcs. Llama 3.1 70B Hanami x1 runs $3/Mtok in and out—no output premium—which matters when you're generating long drafts and iterating on tone. The 16k context window holds roughly 12k words, enough for a full character bible plus the current scene. No public benchmarks exist for this tuned variant, so you're trusting Sao10K's creative-writing reputation over measurable performance. If your studio ships dialogue that players actually read, test a 10k-word sample against base Llama 3.1 70B first. If Hanami's style fits and your token volume stays under 500M/month, the flat pricing beats models that charge 4-5x more on output.

Internal knowledge-base Q&A

Budget-friendly retrieval when context fits under 16k tokens

A 12-person SaaS support team wants to auto-answer Slack questions by pulling from a 200-page product wiki. Each query needs 8-10k tokens of context (the relevant wiki sections) plus the question itself. Llama 3.1 70B Hanami x1 handles this at $3/Mtok flat, so a team doing 2,000 queries/day (averaging 12k input, 400 output tokens) spends roughly $100/month. The 16k window is the hard ceiling—if your wiki sections regularly exceed 10k tokens after chunking, you'll hit truncation errors. No benchmarks means you can't predict accuracy on factual recall, so run a 200-question eval against your actual wiki before committing. If answers stay accurate and queries stay under the context limit, this beats $8-12/Mtok alternatives by 60-75 percent.

Batch content moderation

Overnight moderation runs when speed isn't the constraint

A 3-person community platform reviews 15,000 user posts nightly for policy violations—toxicity, spam, off-topic content. Each post averages 300 tokens; the moderation prompt adds 200. Llama 3.1 70B Hanami x1 processes this batch at $3/Mtok ($22.50/night for 7.5M tokens), and overnight latency is fine because results go live at 6 AM. The model's 70B parameter count should handle nuanced policy edge-cases better than smaller alternatives, but without benchmarks you can't verify false-positive rates until you test. If your nightly volume stays under 20M tokens and you can afford a 2-week tuning period to dial in the moderation prompt, the flat pricing saves $60-80/night versus real-time moderation APIs. Above 50k posts/night, switch to a provider with published safety benchmarks.

Frequently asked

Is Sao10K Llama 3.1 70B Hanami good for creative writing?

Yes, this is a fine-tuned variant specifically optimized for creative and roleplay tasks. The Hanami series from Sao10K targets narrative generation and character consistency over raw benchmark performance. If you need technical accuracy or code generation, use base Llama 3.1 70B instead. For fiction and dialogue, this specialization makes it worth considering.

Is $3 per Mtok competitive pricing for a 70B model?

It's mid-range. Base Llama 3.1 70B runs $0.88-$1.20 per Mtok on most providers, so you're paying roughly 2.5-3.4× for the fine-tune. That premium makes sense if the creative specialization saves you editing time. For general-purpose work where base Llama performs fine, the markup isn't justified.

Can it handle 16K token context reliably?

The 16K window is standard for Llama 3.1 70B derivatives, but it's on the smaller side for long-form fiction. You'll hit limits around 12-14 pages of prose. For novel chapters or multi-turn roleplay sessions, you'll need to manage context carefully or use a model with 32K+ windows like Claude or Gemini.

How does Hanami x1 compare to base Llama 3.1 70B?

No public benchmarks exist for this variant, so you're trading measurable performance data for anecdotal creative improvements. Base Llama 3.1 70B scores 86 on MMLU and handles general tasks predictably. Hanami sacrifices that documented capability for tuning toward narrative voice and character work. Test it yourself before committing to production.

Should I use this for chatbot applications?

Only if your chatbot needs strong character roleplay or creative storytelling. For customer support, documentation help, or transactional chat, base instruction-tuned models will give you better factual accuracy and lower cost. The creative fine-tune here optimizes for entertainment and narrative depth, not information retrieval or task completion.