Arcee AI: Trinity Mini
Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function...
Anyone in the Space can @-mention Arcee AI: Trinity Mini with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- High-volume long-document processing
- Cost-sensitive context-heavy tasks
- Internal prototyping with extended context
- Budget-constrained reasoning workflows
Strengths
Trinity Mini's standout feature is cost efficiency at scale: $0.04/Mtok input pricing combined with 131K context makes it viable for processing entire codebases, legal documents, or research papers without the token costs of GPT-4 or Claude. The output pricing at $0.15/Mtok remains competitive for reasoning tasks. Arcee positions this as a reasoning-focused model, suggesting it's optimized for multi-step logic rather than raw generation speed.
Trade-offs
The absence of public benchmarks is a red flag — you can't compare Trinity Mini's reasoning quality against Gemini Flash, Haiku, or GPT-4o Mini without running your own evals. Arcee is a smaller vendor with less ecosystem support than Anthropic or OpenAI, so expect fewer integrations and slower feature updates. The proprietary license limits transparency into training data and fine-tuning methods, which matters for compliance-heavy teams.
Specifications
- Provider
- arcee-ai
- Category
- llm
- Context length
- 131,072 tokens
- Max output
- 131,072 tokens
- Modalities
- text
- License
- proprietary
- Released
- 2025-12-01
Pricing
- Input
- $0.04/Mtok
- Output
- $0.15/Mtok
- Model ID
arcee-ai/trinity-mini
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| arcee-ai | 131k | $0.04/Mtok | $0.15/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Multi-Document Synthesis
I'm providing three research papers below. Identify where their conclusions align, where they conflict, and what gaps remain unaddressed. Organize your analysis by theme, not by paper.Open in a Space →
Codebase Reasoning
Here's a Python codebase with 15 files. Trace how user authentication flows from the API endpoint through middleware to the database layer. Flag any security concerns you spot.Open in a Space →
Legal Document Comparison
Compare these two vendor agreements. Highlight differences in liability caps, termination clauses, and data handling obligations. Note which contract favors the buyer.Open in a Space →
Budget-Conscious Summarization
Summarize this 40-page meeting transcript into a one-page executive brief. Focus on decisions made, action items assigned, and unresolved blockers.Open in a Space →
Chain-of-Thought Debugging
This function returns incorrect results for edge cases. Walk through the logic step-by-step, identify where the bug occurs, and propose a fix with explanation.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Review this Python function for potential bugs. It's supposed to merge two sorted lists but sometimes returns incorrect results: def merge_sorted(a, b): result = [] i = j = 0 while i < len(a) and j < len(b): if a[i] < b[j]: result.append(a[i]) i += 1 else: result.append(b[j]) return result
The function has a critical bug: it doesn't handle remaining elements after one list is exhausted. When the while loop exits, any leftover items in either list are ignored. Fix by adding after the loop: ```python result.extend(a[i:]) result.extend(b[j:]) ``` This ensures all elements from both lists appear in the final merged result. Without this, merge_sorted([1,3,5], [2,4]) would return [1,2,3,4] but miss the 5.
Trinity Mini's 131K token context window allows it to handle substantial codebases in a single prompt, making it suitable for reviewing multi-file refactors. The output demonstrates clear diagnostic reasoning, though without benchmark data we can't compare its accuracy against specialized code models like Codestral.
Summarize the key regulatory changes in this 8-page compliance memo, focusing on what our engineering team needs to action before Q2. [memo text follows...]
Three immediate actions required: 1. **Data retention (by March 15)**: Update user deletion workflows to purge associated metadata within 30 days, not 90. Affects user_data and analytics_events tables. 2. **Consent logging (by April 1)**: Implement audit trail for all consent withdrawals. Legal requires timestamp, user_id, and withdrawal method stored for 7 years. 3. **Third-party processors (by March 30)**: Review contracts with Mixpanel and Segment. New regs require explicit data processing agreements referencing Article 28 standards. Non-engineering: Finance handles the reporting changes in Section 4.
At $0.04 per million input tokens, Trinity Mini is cost-effective for processing lengthy documents repeatedly. The 131K context accommodates most corporate documents in one pass. The structured output shows good information extraction, though the model's summarization style may be more verbose than frontier models.
I'm preparing a workshop on async programming in JavaScript. Explain the event loop to intermediate developers who understand callbacks but haven't used promises much. Use a concrete analogy, not abstract CS theory.
Think of the event loop as a restaurant kitchen with one chef (the main thread). The chef takes orders from a ticket rail (the call stack) and completes them one at a time. When an order requires waiting—like "start baking this cake"—the chef doesn't stand there watching the oven. Instead, they hand that task to a timer, put the ticket aside (callback queue), and move to the next order. When the timer dings, the ticket goes back on the rail. The chef picks it up only when they finish their current task. This is why `setTimeout(fn, 0)` doesn't run immediately—it waits for the chef to clear the current ticket. Promises are pre-paid tickets: the kitchen commits to completing them, and you get a receipt (the promise object) you can pass around before the food arrives.
The model produces accessible technical explanations suitable for documentation or teaching. The analogy-driven approach works well for intermediate audiences. However, without benchmark scores, we can't assess whether Trinity Mini matches the pedagogical quality of models explicitly trained on educational content.
Use-case deep-dives
When Trinity Mini handles docs for lean engineering teams
A 4-person startup shipping features weekly needs API docs, changelog summaries, and internal runbooks generated from code comments and commit history. Trinity Mini at $0.04/$0.15 per Mtok sits in the sweet spot: cheap enough to run on every merge, with a 131k context window that ingests an entire microservice codebase plus the last 200 commits in one pass. You're trading benchmark certainty (no public evals yet) for cost efficiency and context depth. If your docs need legal precision or you're generating 10M+ tokens monthly, step up to a benchmarked model. Otherwise, this handles the 80% case—turning engineer notes into readable prose—without burning runway on inference costs.
Trinity Mini's 131k window for clause-level vendor analysis
A 9-person procurement team reviews SaaS contracts from 30+ vendors quarterly, flagging liability caps, auto-renewal terms, and data residency clauses. Trinity Mini's 131k context window fits 4-6 full contracts simultaneously, letting you ask 'which vendors allow EU-only data storage' in a single prompt instead of chaining calls. At $0.04 input, loading 100k tokens of legalese costs $4—cheap enough to rerun comparisons when terms change. The risk: without public benchmarks, you can't verify accuracy against GPT-4 or Claude on legal reasoning. Run a 20-contract pilot, spot-check outputs against known clauses, and if error rate stays under 5%, this becomes your quarterly contract sprint workhorse.
When Trinity Mini routes 200 daily tickets for support ops
A 12-person SaaS company fields 200 support tickets daily across billing, technical, and feature requests. Trinity Mini tags each ticket (category, urgency, account tier) and drafts a first-response template, cutting triage time from 90 seconds to 15. At $0.15/Mtok output, generating 500-token responses for 200 tickets costs $15/day—$450/month versus $2k+ on frontier models. The 131k window handles full ticket history plus knowledge base context without truncation. The gamble: no MMLU or HumanEval scores means you're flying blind on edge-case reasoning. If 10% of tickets need escalation anyway, this works. If your product is dev tools where technical accuracy is make-or-break, pay up for a benchmarked model instead.
Frequently asked
Is Arcee AI Trinity Mini good for general text tasks?
Trinity Mini handles standard text generation, summarization, and Q&A adequately for its price point. With a 131k token context window, it can process long documents without chunking. However, no public benchmarks exist to verify its performance against GPT-4o-mini or Claude Haiku, so you're buying on trust in Arcee's internal testing.
Is Trinity Mini cheaper than GPT-4o-mini or Claude Haiku?
Yes. At $0.04/$0.15 per million tokens, Trinity Mini undercuts GPT-4o-mini ($0.15/$0.60) and Claude Haiku ($0.25/$1.25) by 70-85% on input and output. If you're running high-volume batch jobs where quality differences are marginal, the cost savings compound fast. For mission-critical work, the lack of benchmarks is a risk.
Can Trinity Mini handle 131k tokens in practice?
The 131k context window matches GPT-4o-mini's spec, so it should handle full-length books or large codebases in a single prompt. Real-world performance depends on how well Arcee trained the attention mechanism at that scale. Without published needle-in-haystack or long-context benchmarks, you'll need to test your specific use case before committing production traffic.
How does Trinity Mini compare to other Arcee models?
Arcee positions Trinity Mini as a cost-optimized option in their lineup, likely smaller and faster than their flagship models. The "Mini" suffix suggests a parameter count under 10B, trading raw capability for speed and price. If you need Arcee's fine-tuning approach but can't justify premium pricing, this is the entry point.
Should I use Trinity Mini for customer-facing chatbots?
Only if you can afford to A/B test it against established alternatives. The $0.15/Mtok output cost makes it viable for moderate-traffic chat, and the context window handles multi-turn conversations. But without MMLU, HumanEval, or MT-Bench scores, you're guessing at instruction-following quality. Start with a pilot on non-critical flows.