T
LLMtencent

Tencent: Hy3 preview

Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...

Anyone in the Space can @-mention Tencent: Hy3 preview with the team's shared context — pooled credits, one chat, one memory.

All models

Starter is free forever — 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Tencent's Hy3 preview offers a massive 262K token context window at aggressive pricing — $0.07 input makes it one of the cheapest ways to process long documents. Without public benchmarks, you're trading proven performance data for cost savings and context capacity. Best for teams willing to test a newer model on high-volume, long-context workloads where price per token matters more than established track records.

Best for

  • Long document processing on tight budgets
  • High-volume batch summarization tasks
  • Context-heavy workflows needing 200K+ tokens
  • Cost-sensitive Chinese language applications

Strengths

The 262K context window matches GPT-4 Turbo's capacity while undercutting most competitors on input pricing. At $0.07/Mtok input, it's roughly half the cost of Claude Sonnet 4 for ingesting large documents. Tencent's infrastructure means low-latency access for Asia-Pacific deployments. The preview label suggests active development and potential feature velocity.

Trade-offs

No public benchmarks means you can't compare reasoning quality, instruction-following, or coding ability against established models. Preview status implies API stability and feature completeness aren't guaranteed. Output pricing at $0.26/Mtok is competitive but not exceptional. Teams outside China may see higher latency. Without MMLU, HumanEval, or MT-Bench scores, you're running blind on capability gaps relative to GPT-4o or Claude.

Specifications

Provider
tencent
Category
llm
Context length
262,144 tokens
Max output
262,144 tokens
Modalities
text
License
proprietary
Released
2026-04-22

Pricing

Input
$0.07/Mtok
Output
$0.26/Mtok
Model ID
tencent/hy3-preview

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool — one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$2.19
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool — one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
tencent262k$0.07/Mtok$0.26/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Multi-Document Summarization

Summarize these twelve research papers (combined 150K tokens) into a 500-word executive brief. Highlight consensus findings, contradictions, and gaps in the literature.
Open in a Space →

Codebase Context Query

Given this 200K token codebase dump, trace how user authentication flows from login endpoint through middleware to database. List every file and function involved.
Open in a Space →

Batch Translation Check

Compare these 50 English-to-Chinese translation pairs (80K tokens total). Flag mistranslations, tone inconsistencies, and cultural localization errors. Provide corrected versions.
Open in a Space →

Meeting Transcript Synthesis

From this 40K token transcript of five back-to-back meetings, create a unified action item list with owners, deadlines, and dependencies. Group by project workstream.
Open in a Space →

Example outputs

Illustrative — representative of the model's voice and quality, not literal recordings.

Prompt

Review this Python function for potential race conditions. It's meant to update a shared cache in a multi-threaded web server: [paste 40-line cache manager class]

Output

The cache manager has three race conditions. First, the `get_or_compute` method checks cache existence then writes without locking the full sequence—two threads can both miss, both compute, both write. Second, the `evict_old_entries` background thread reads `self.access_times` while request threads modify it, risking KeyError. Third, `cache_size` is incremented/decremented separately from dict operations, so it can desync. Recommend: wrap the check-compute-write sequence in a lock, use `threading.RLock` for access_times updates, or switch to a thread-safe structure like `cachetools.TTLCache` with built-in expiry.

Notes

This example demonstrates Hy3's code analysis depth—it catches concurrency bugs that require understanding execution interleaving, not just syntax. The 262k token context window handles large codebases, though without public benchmarks we can't compare its accuracy to models like Claude 3.5 Sonnet or GPT-4 on standard threading-bug datasets.

Prompt

Translate this technical product spec into a customer-facing feature announcement for our SaaS dashboard. Keep it under 150 words: [paste internal spec with API endpoints, latency targets, and database schema changes]

Output

**Faster insights, right when you need them.** Your analytics dashboard now loads 3× faster, even with year-over-year comparisons. We've rebuilt the query engine to pre-calculate common reports overnight, so morning check-ins are instant. New this week: export any chart as a live-updating Slack post—your team sees the latest numbers without opening a browser. We've also added keyboard shortcuts (press `?` to see them) for power users who live in the dashboard. All existing reports and saved views work exactly as before; speed improvements are automatic.

Notes

Showcases Hy3's ability to reframe technical details into user benefits without losing accuracy. The output stays concrete (3× faster, specific features) rather than vague marketing speak. At $0.26/Mtok output, this is cost-competitive for high-volume content generation, though the lack of benchmark data makes it hard to assess consistency across varied writing tasks.

Prompt

I'm debugging a Kubernetes deployment where pods restart every 90 seconds. Logs show 'OOMKilled' but memory requests are set to 2Gi and limits to 4Gi. The app is a Node.js API. What's likely happening?

Output

The 'OOMKilled' status with those settings suggests your Node.js process is hitting the 4Gi limit, not the 2Gi request. Node's V8 heap doesn't auto-tune to container limits by default—it uses a percentage of system memory, which in Kubernetes means the node's total RAM, not the pod's cgroup limit. Your app likely allocates 5-6Gi before V8 triggers garbage collection, breaching the pod limit. Fix: set `--max-old-space-size=3072` in your Node startup command to cap heap at 3Gi, leaving 1Gi for buffers and overhead. Also check for memory leaks—90-second cycles are suspiciously regular.

Notes

Highlights Hy3's troubleshooting reasoning across infrastructure and runtime internals. The 262k context window is overkill for this example but valuable when users paste full deployment YAMLs and multi-service logs. The response is actionable and specific, though without benchmarks we can't verify accuracy against known Kubernetes failure-mode datasets.

Use-case deep-dives

Multi-document contract comparison

When 262k context handles full contract stacks without chunking

A 4-person legal ops team at a Series B SaaS company needs to compare vendor agreements against their standard terms—often 8-12 contracts at once, each running 15-30 pages. Tencent Hy3 preview fits this scenario because the 262k token window holds roughly 200k words, meaning you can load the entire contract stack plus your playbook in a single prompt without retrieval engineering. At $0.07 input per million tokens, a typical comparison run (180k tokens in, 2k out) costs under $0.02. The trade-off: no public benchmarks means you're flying blind on accuracy until you test it against your own contract corpus. If your team already has a QA process for model outputs and values cost over proven performance, run a two-week pilot. Otherwise, wait for benchmark data or start with a model that has published legal reasoning scores.

Overnight batch content moderation

Why ultra-low input pricing matters for high-volume overnight jobs

A 12-person community platform reviews 40,000 user posts per night—flagging policy violations before the morning shift starts. Each post averages 150 tokens; the moderation prompt adds 800 tokens of context. That's 38M input tokens nightly. At $0.07/Mtok input, the job costs $2.66 in input fees alone (output is minimal—just violation flags). Compare that to models charging $0.50-$1.00 input: you'd pay $19-$38 for the same workload. Tencent Hy3 preview wins on cost structure here, but the lack of public safety benchmarks is a real risk. If you're moderating high-stakes content (healthcare, finance, minors), the savings don't justify the uncertainty. For lower-risk communities where you can tolerate a 2-3% miss rate and have human review as a backstop, this model cuts your overnight batch bill by 85-90%. Run a shadow deployment for two weeks to measure precision and recall against your current system before switching production traffic.

Internal knowledge base Q&A

When context window beats retrieval complexity for small teams

A 6-person DevOps team maintains 180 Markdown runbooks (totaling 140k tokens) covering deployment procedures, incident playbooks, and architecture decisions. They want a Slack bot that answers questions without building a vector database. Tencent Hy3 preview's 262k context window holds the entire knowledge base plus a 5k token conversation history in every request. At $0.07 input, a typical query (145k tokens in, 400 out) costs $0.01. For a team fielding 200 queries/week, that's $2/week—cheap enough that you skip the engineering overhead of chunking, embedding, and retrieval tuning. The catch: without benchmarks, you don't know if it hallucinates on edge-case runbook details. If your runbooks are mission-critical (database failover, security protocols), test it against 50 historical questions with known-good answers before going live. If most queries are low-stakes lookups where a human double-checks the output anyway, the simplicity and cost make this a strong fit.

Frequently asked

Is Tencent Hy3 preview good for general text tasks?

Hy3 preview handles standard text generation, summarization, and Q&A adequately, but without public benchmarks it's impossible to verify performance claims. The 262k context window supports long documents. If you need proven capability on coding or reasoning, wait for benchmark data or test alternatives like GPT-4o or Claude 3.5 Sonnet that publish MMLU and HumanEval scores.

Is Tencent Hy3 cheaper than GPT-4o or Claude Sonnet?

Yes, significantly. At $0.07 input and $0.26 output per million tokens, Hy3 costs roughly 85% less than GPT-4o ($2.50/$10.00) and 90% less than Claude 3.5 Sonnet ($3.00/$15.00). For high-volume applications where cost matters more than cutting-edge reasoning, this pricing makes Hy3 worth testing despite the lack of public performance data.

Can Hy3 handle 262k tokens in practice?

The advertised 262k context window theoretically supports 200+ page documents or entire codebases. However, without published needle-in-haystack or long-context retrieval benchmarks, real-world accuracy at maximum context is unverified. Test your specific use case before relying on full-window performance, especially for tasks requiring precise recall from early tokens.

How does Hy3 compare to other Tencent models?

Tencent hasn't published comparative data between Hy3 and their other offerings. The "preview" label suggests this is an experimental release, likely positioned below their flagship models in capability but above older generations in efficiency. Without version history or benchmark progression, treat this as a cost-optimized option rather than their performance leader.

Should I use Hy3 for production chatbots?

Only if cost is your primary constraint and you can tolerate unknown reliability. The lack of public benchmarks means no verified data on instruction-following accuracy, safety guardrails, or response quality. For customer-facing applications, start with models that publish LMSYS Chatbot Arena scores or similar metrics. Use Hy3 for internal tools where you can monitor and iterate.

Data last verified 22 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.