Tencent: Hy3 preview
Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...
Anyone in the Space can @-mention Tencent: Hy3 preview with the team's shared context — pooled credits, one chat, one memory.
Starter is free forever — 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Long document processing on tight budgets
- High-volume batch summarization tasks
- Context-heavy workflows needing 200K+ tokens
- Cost-sensitive Chinese language applications
Strengths
The 262K context window matches GPT-4 Turbo's capacity while undercutting most competitors on input pricing. At $0.07/Mtok input, it's roughly half the cost of Claude Sonnet 4 for ingesting large documents. Tencent's infrastructure means low-latency access for Asia-Pacific deployments. The preview label suggests active development and potential feature velocity.
Trade-offs
No public benchmarks means you can't compare reasoning quality, instruction-following, or coding ability against established models. Preview status implies API stability and feature completeness aren't guaranteed. Output pricing at $0.26/Mtok is competitive but not exceptional. Teams outside China may see higher latency. Without MMLU, HumanEval, or MT-Bench scores, you're running blind on capability gaps relative to GPT-4o or Claude.
Specifications
- Provider
- tencent
- Category
- llm
- Context length
- 262,144 tokens
- Max output
- 262,144 tokens
- Modalities
- text
- License
- proprietary
- Released
- 2026-04-22
Pricing
- Input
- $0.07/Mtok
- Output
- $0.26/Mtok
- Model ID
tencent/hy3-preview
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool — one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool — one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| tencent | 262k | $0.07/Mtok | $0.26/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Legal Contract Analysis
Review these three contracts totaling 180,000 tokens. Extract all liability clauses, payment terms, and termination conditions into a comparison table. Flag any conflicting terms across contracts.Open in a Space →
Multi-Document Summarization
Summarize these twelve research papers (combined 150K tokens) into a 500-word executive brief. Highlight consensus findings, contradictions, and gaps in the literature.Open in a Space →
Codebase Context Query
Given this 200K token codebase dump, trace how user authentication flows from login endpoint through middleware to database. List every file and function involved.Open in a Space →
Batch Translation Check
Compare these 50 English-to-Chinese translation pairs (80K tokens total). Flag mistranslations, tone inconsistencies, and cultural localization errors. Provide corrected versions.Open in a Space →
Meeting Transcript Synthesis
From this 40K token transcript of five back-to-back meetings, create a unified action item list with owners, deadlines, and dependencies. Group by project workstream.Open in a Space →
Example outputs
Illustrative — representative of the model's voice and quality, not literal recordings.
Review this Python function for potential race conditions. It's meant to update a shared cache in a multi-threaded web server: [paste 40-line cache manager class]
The cache manager has three race conditions. First, the `get_or_compute` method checks cache existence then writes without locking the full sequence—two threads can both miss, both compute, both write. Second, the `evict_old_entries` background thread reads `self.access_times` while request threads modify it, risking KeyError. Third, `cache_size` is incremented/decremented separately from dict operations, so it can desync. Recommend: wrap the check-compute-write sequence in a lock, use `threading.RLock` for access_times updates, or switch to a thread-safe structure like `cachetools.TTLCache` with built-in expiry.
This example demonstrates Hy3's code analysis depth—it catches concurrency bugs that require understanding execution interleaving, not just syntax. The 262k token context window handles large codebases, though without public benchmarks we can't compare its accuracy to models like Claude 3.5 Sonnet or GPT-4 on standard threading-bug datasets.
Translate this technical product spec into a customer-facing feature announcement for our SaaS dashboard. Keep it under 150 words: [paste internal spec with API endpoints, latency targets, and database schema changes]
**Faster insights, right when you need them.** Your analytics dashboard now loads 3× faster, even with year-over-year comparisons. We've rebuilt the query engine to pre-calculate common reports overnight, so morning check-ins are instant. New this week: export any chart as a live-updating Slack post—your team sees the latest numbers without opening a browser. We've also added keyboard shortcuts (press `?` to see them) for power users who live in the dashboard. All existing reports and saved views work exactly as before; speed improvements are automatic.
Showcases Hy3's ability to reframe technical details into user benefits without losing accuracy. The output stays concrete (3× faster, specific features) rather than vague marketing speak. At $0.26/Mtok output, this is cost-competitive for high-volume content generation, though the lack of benchmark data makes it hard to assess consistency across varied writing tasks.
I'm debugging a Kubernetes deployment where pods restart every 90 seconds. Logs show 'OOMKilled' but memory requests are set to 2Gi and limits to 4Gi. The app is a Node.js API. What's likely happening?
The 'OOMKilled' status with those settings suggests your Node.js process is hitting the 4Gi limit, not the 2Gi request. Node's V8 heap doesn't auto-tune to container limits by default—it uses a percentage of system memory, which in Kubernetes means the node's total RAM, not the pod's cgroup limit. Your app likely allocates 5-6Gi before V8 triggers garbage collection, breaching the pod limit. Fix: set `--max-old-space-size=3072` in your Node startup command to cap heap at 3Gi, leaving 1Gi for buffers and overhead. Also check for memory leaks—90-second cycles are suspiciously regular.
Highlights Hy3's troubleshooting reasoning across infrastructure and runtime internals. The 262k context window is overkill for this example but valuable when users paste full deployment YAMLs and multi-service logs. The response is actionable and specific, though without benchmarks we can't verify accuracy against known Kubernetes failure-mode datasets.
Use-case deep-dives
When 262k context handles full contract stacks without chunking
A 4-person legal ops team at a Series B SaaS company needs to compare vendor agreements against their standard terms—often 8-12 contracts at once, each running 15-30 pages. Tencent Hy3 preview fits this scenario because the 262k token window holds roughly 200k words, meaning you can load the entire contract stack plus your playbook in a single prompt without retrieval engineering. At $0.07 input per million tokens, a typical comparison run (180k tokens in, 2k out) costs under $0.02. The trade-off: no public benchmarks means you're flying blind on accuracy until you test it against your own contract corpus. If your team already has a QA process for model outputs and values cost over proven performance, run a two-week pilot. Otherwise, wait for benchmark data or start with a model that has published legal reasoning scores.
Why ultra-low input pricing matters for high-volume overnight jobs
A 12-person community platform reviews 40,000 user posts per night—flagging policy violations before the morning shift starts. Each post averages 150 tokens; the moderation prompt adds 800 tokens of context. That's 38M input tokens nightly. At $0.07/Mtok input, the job costs $2.66 in input fees alone (output is minimal—just violation flags). Compare that to models charging $0.50-$1.00 input: you'd pay $19-$38 for the same workload. Tencent Hy3 preview wins on cost structure here, but the lack of public safety benchmarks is a real risk. If you're moderating high-stakes content (healthcare, finance, minors), the savings don't justify the uncertainty. For lower-risk communities where you can tolerate a 2-3% miss rate and have human review as a backstop, this model cuts your overnight batch bill by 85-90%. Run a shadow deployment for two weeks to measure precision and recall against your current system before switching production traffic.
When context window beats retrieval complexity for small teams
A 6-person DevOps team maintains 180 Markdown runbooks (totaling 140k tokens) covering deployment procedures, incident playbooks, and architecture decisions. They want a Slack bot that answers questions without building a vector database. Tencent Hy3 preview's 262k context window holds the entire knowledge base plus a 5k token conversation history in every request. At $0.07 input, a typical query (145k tokens in, 400 out) costs $0.01. For a team fielding 200 queries/week, that's $2/week—cheap enough that you skip the engineering overhead of chunking, embedding, and retrieval tuning. The catch: without benchmarks, you don't know if it hallucinates on edge-case runbook details. If your runbooks are mission-critical (database failover, security protocols), test it against 50 historical questions with known-good answers before going live. If most queries are low-stakes lookups where a human double-checks the output anyway, the simplicity and cost make this a strong fit.
Frequently asked
Is Tencent Hy3 preview good for general text tasks?
Hy3 preview handles standard text generation, summarization, and Q&A adequately, but without public benchmarks it's impossible to verify performance claims. The 262k context window supports long documents. If you need proven capability on coding or reasoning, wait for benchmark data or test alternatives like GPT-4o or Claude 3.5 Sonnet that publish MMLU and HumanEval scores.
Is Tencent Hy3 cheaper than GPT-4o or Claude Sonnet?
Yes, significantly. At $0.07 input and $0.26 output per million tokens, Hy3 costs roughly 85% less than GPT-4o ($2.50/$10.00) and 90% less than Claude 3.5 Sonnet ($3.00/$15.00). For high-volume applications where cost matters more than cutting-edge reasoning, this pricing makes Hy3 worth testing despite the lack of public performance data.
Can Hy3 handle 262k tokens in practice?
The advertised 262k context window theoretically supports 200+ page documents or entire codebases. However, without published needle-in-haystack or long-context retrieval benchmarks, real-world accuracy at maximum context is unverified. Test your specific use case before relying on full-window performance, especially for tasks requiring precise recall from early tokens.
How does Hy3 compare to other Tencent models?
Tencent hasn't published comparative data between Hy3 and their other offerings. The "preview" label suggests this is an experimental release, likely positioned below their flagship models in capability but above older generations in efficiency. Without version history or benchmark progression, treat this as a cost-optimized option rather than their performance leader.
Should I use Hy3 for production chatbots?
Only if cost is your primary constraint and you can tolerate unknown reliability. The lack of public benchmarks means no verified data on instruction-following accuracy, safety guardrails, or response quality. For customer-facing applications, start with models that publish LMSYS Chatbot Arena scores or similar metrics. Use Hy3 for internal tools where you can monitor and iterate.