LLMrelace

Relace: Relace Search

The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic...

Anyone in the Space can @-mention Relace: Relace Search with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Relace Search is a specialized model optimized for retrieval-augmented generation workflows, designed to excel at synthesizing information from large document sets within its 256K context window. At $1/$3 per Mtok, it sits in the budget tier but lacks public benchmark data to validate performance claims. Best for teams already committed to Relace's ecosystem who need a cost-effective option for search-heavy tasks, but evaluate carefully against established alternatives like GPT-4o-mini or Claude Haiku that offer proven performance at similar price points.

Best for

Document search and synthesis tasks
Budget-constrained RAG pipelines
Large context window retrieval workflows
Teams using Relace infrastructure

Strengths

The 256K context window provides substantial headroom for multi-document analysis without chunking strategies. Pricing undercuts many mid-tier models on input tokens, making it economical for high-volume retrieval tasks where you're feeding large contexts repeatedly. The search-specific optimization suggests tuning for information extraction and relevance ranking, which can outperform general-purpose models on retrieval-focused prompts if the training data aligns with your domain.

Trade-offs

Zero public benchmarks means you're flying blind on reasoning quality, factual accuracy, and instruction-following compared to models with MMLU, HumanEval, or GPQA scores. The search specialization likely comes at the cost of general reasoning and creative tasks—expect weaker performance on code generation, mathematical problem-solving, or open-ended writing compared to GPT-4o-mini. Output pricing at $3/Mtok is 3x the input rate, penalizing verbose responses in a way that models like Gemini Flash avoid.

Specifications

Provider: relace
Category: llm
Context length: 256,000 tokens
Max output: 128,000 tokens
Modalities: text
License: proprietary
Released: 2025-12-08

Pricing

Input: $1.00/Mtok
Output: $3.00/Mtok
Model ID: relace/relace-search

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$28.16

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
relace	256k	$1.00/Mtok	$3.00/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Multi-Document Synthesis

Review these five research papers and identify the three most significant points of consensus and the two major areas of disagreement. Cite specific papers when referencing claims.

Open in a Space →

Contract Clause Extraction

Find all indemnification clauses in these contracts and summarize the scope of liability protection in each. List them in order of most to least protective for the service provider.

Open in a Space →

Technical Documentation Search

I need to implement user authentication with OAuth2. Find the relevant API endpoints, required parameters, and example request formats from this documentation.

Open in a Space →

Customer Feedback Analysis

Analyze these 200 support tickets and identify the top five feature requests by frequency. For each, provide example quotes and estimate how many users mentioned it.

Open in a Space →

Competitive Intelligence Report

Compare pricing strategies, target markets, and key differentiators across these competitor reports. Highlight gaps in our positioning relative to the market leaders.

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Search our internal knowledge base for all mentions of 'API rate limiting' across documentation, Slack threads, and support tickets from the last 6 months. Prioritize by relevance and recency.

Output

Found 47 relevant results across your workspace: **Top Results:** 1. Engineering Wiki: "Rate Limiting Implementation Guide" (updated 3 weeks ago) - Covers current 1000 req/min default, burst allowances, and override procedures. 2. Slack #engineering thread (2 months ago) - Discussion of customer complaints about 429 errors; decision to increase limits for enterprise tier. 3. Support ticket #8834 (1 month ago) - Customer request for rate limit increase; resolved by upgrading plan. Common themes: Most recent discussions focus on the Q3 limit increases. Documentation appears current. Two open support tickets mention confusion about per-endpoint vs. global limits.

Notes

This example highlights Relace's strength in unified search across heterogeneous data sources with temporal awareness. The 256K context window enables processing large result sets with surrounding context. However, without benchmark data, we can't verify retrieval accuracy against established search quality metrics.

Prompt

Find all decisions made about our database migration strategy. Include who decided, when, and what alternatives were considered. Cross-reference meeting notes, RFCs, and email threads.

Output

**Database Migration Timeline:** **Initial Decision (Jan 2024):** Engineering lead Sarah Chen proposed PostgreSQL → CockroachDB in RFC-089. Alternatives considered: staying on Postgres with read replicas, moving to Aurora. Decision drivers: horizontal scalability, reduced operational overhead. **Reversal (Mar 2024):** Team meeting notes show cost concerns emerged during POC phase. CockroachDB licensing projected at $180K/year vs. $45K for Aurora. Decision: pivot to Aurora, accepted by VP Engineering. **Current Status:** Migration 60% complete per last sprint review. No further strategy changes documented.

Notes

Demonstrates Relace's ability to construct decision timelines by connecting fragments across document types. The model synthesizes who-what-when from unstructured sources. Trade-off: at $3/Mtok output, generating comprehensive summaries for broad queries could become expensive compared to traditional search engines that return links.

Prompt

What have we learned about customer churn in the past quarter? Pull insights from sales calls, customer success notes, exit surveys, and analytics reports. Group by common themes.

Output

**Q1 2024 Churn Insights (23 churned accounts analyzed):** **Theme 1: Onboarding Friction (9 accounts)** - Exit surveys cite "too complex to set up" and "needed more hand-holding." CS notes show 6 of these accounts never completed integration despite 3+ support touchpoints. **Theme 2: Missing Features (7 accounts)** - Sales call transcripts reveal requests for SSO (5 mentions) and advanced reporting (4 mentions). Two accounts explicitly moved to competitors offering these. **Theme 3: Pricing Concerns (7 accounts)** - Analytics show these were small teams (<10 users) who hit usage limits. Exit surveys mention "outgrew free tier but paid tier too expensive." Recommendation pattern: Multiple sources suggest improved onboarding would address largest churn segment.

Notes

Shows Relace's strength in qualitative analysis across mixed data types—transcripts, surveys, structured logs. The model identifies patterns humans might miss when data is siloed. The 256K window allows processing many customer records in one query. However, accuracy depends on the quality of source tagging and metadata, which isn't benchmarked here.

Use-case deep-dives

Multi-document research synthesis

When 256K context beats RAG for quarterly board decks

A 12-person consulting firm builds quarterly board decks by synthesizing 40-60 client documents—contracts, meeting notes, financial summaries. Relace Search's 256K context window lets you load all source material in one prompt and ask cross-document questions without chunking or retrieval logic. At $1/Mtok input, a typical synthesis run (200K tokens in, 5K out) costs $0.22. This beats building a RAG pipeline if you're under 100 decks/year and your documents change every quarter. Above that volume, or if you need sub-second responses, switch to a dedicated search stack. For ad-hoc research where setup time matters more than per-query cost, Relace Search closes the deal.

Legal contract comparison

Why 256K context works for side-by-side contract review

A 4-person legal ops team reviews vendor contracts against a master template, flagging deviations in liability caps, termination clauses, and data-handling terms. Relace Search's 256K window fits 8-12 standard contracts plus the template in a single context, so the model sees all clauses at once and catches cross-reference conflicts that chunked retrieval misses. At $1 input / $3 output per Mtok, a 6-contract comparison (180K in, 8K out) runs $0.20. Without public benchmarks, you're betting on the context window doing the heavy lifting—if accuracy matters more than cost, test against GPT-4 Turbo or Claude 3.5 Sonnet first. For teams doing under 200 comparisons/month, the simplicity trade-off is worth it.

Customer support ticket triage

When long context helps—and when it doesn't—for ticket routing

A 20-person SaaS support team routes 300 tickets/day across billing, technical, and account issues. Relace Search's 256K context lets you load a week of ticket history plus internal runbooks in one prompt, so the model sees patterns across recent escalations. At $1 input / $3 output per Mtok, routing 300 tickets/day (avg 2K in, 200 tokens out per ticket) costs $0.78/day. The problem: no public benchmarks means you can't verify classification accuracy before committing. If ticket volume crosses 500/day or you need sub-200ms routing, a fine-tuned smaller model (Llama 3.1 8B, Mistral 7B) will beat this on cost and speed. Below that threshold, Relace Search works if you value deployment speed over proven performance.

Frequently asked

Is Relace Search good for retrieval-augmented generation?

Yes, with a 256k token context window, Relace Search handles large document sets for RAG workflows. The name suggests search-optimized architecture, likely tuned for semantic retrieval and ranking tasks. At $1/Mtok input, it's economical for ingesting long contexts repeatedly. However, no public benchmarks exist yet to verify retrieval accuracy against alternatives like Claude or GPT-4.

Is Relace Search cheaper than GPT-4 for search tasks?

Yes. At $1 input / $3 output per Mtok, Relace undercuts GPT-4 Turbo ($10/$30) by 10x on input and output. For search workloads where you process large corpora but generate short responses, the input savings dominate. If you're running hundreds of queries daily against the same knowledge base, Relace's pricing makes it viable where GPT-4 would burn budget.

Can Relace Search handle 200k token documents in one query?

Yes, the 256k context window accommodates 200k token documents with room for your query and system prompt. That's roughly 150,000 words or 500 pages of text. Practical performance depends on whether the model maintains coherence across the full window—something we can't verify without published benchmarks. Test with your actual document sizes before committing to production.

How does Relace Search compare to semantic search APIs?

Relace appears positioned between traditional vector search (Pinecone, Weaviate) and full LLMs. Vector search is faster and cheaper for pure retrieval but can't reason about results. Full LLMs like GPT-4 reason well but cost more. Relace likely splits the difference—better reasoning than embeddings, lower cost than frontier models. Without benchmarks, you're betting on that middle ground making sense for your use case.

Should I use Relace Search for customer support knowledge bases?

Probably, if your KB exceeds 50k tokens and you need semantic understanding, not just keyword matching. The 256k window lets you load entire documentation sets in-context rather than chunking into vector DBs. At $1 input, repeated queries against the same KB stay affordable. The $3 output cost is reasonable for concise answers. Main risk: no public accuracy data means you'll need to pilot it yourself first.