OpenAI: GPT-4o-mini Search Preview
GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.
Anyone in the Space can @-mention OpenAI: GPT-4o-mini Search Preview with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Customer support with live product info
- Research summaries requiring current data
- Cost-sensitive fact-checking workflows
- Chatbots needing real-time context
- Quick lookups in conversational apps
Strengths
The search integration is native and fast — no need to bolt on Bing API or scrape results yourself. At fifteen cents per million input tokens, this undercuts every other search-enabled model by 3-10x while maintaining GPT-4o-mini's solid instruction-following and JSON mode. The 128K context window means you can pack substantial conversation history alongside search results. For teams building customer-facing tools that need to cite current pricing, documentation, or news, this is the obvious starting point.
Trade-offs
You're getting the mini model, which means noticeably weaker performance on math, complex reasoning, and nuanced writing compared to full GPT-4o or Claude Sonnet. Search results are filtered through OpenAI's interpretation layer — you don't get raw URLs or the ability to inspect sources directly, which limits transparency. The preview label means API behavior may shift without warning. If your task demands deep analysis or creative writing, the cost savings won't justify the capability drop.
Specifications
- Provider
- openai
- Category
- llm
- Context length
- 128,000 tokens
- Max output
- 16,384 tokens
- Modalities
- text
- License
- proprietary
- Released
- 2025-03-12
Pricing
- Input
- $0.15/Mtok
- Output
- $0.60/Mtok
- Model ID
openai/gpt-4o-mini-search-preview
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| openai | 128k | $0.15/Mtok | $0.60/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Product Comparison Research
Compare the latest iPhone 16 Pro and Samsung Galaxy S24 Ultra. Focus on camera specs, battery life, and current retail pricing. Summarize key differences in a table.Open in a Space →
News-Aware Briefing
What are the three most significant AI policy developments announced this week? For each, explain the impact on US-based startups in two sentences.Open in a Space →
Documentation Lookup
Show me the current syntax for async/await error handling in Python 3.12's asyncio library. Include a code example with proper exception catching.Open in a Space →
Fact-Check Customer Claims
A customer says their 2022 MacBook Air supports Wi-Fi 6E. Verify this claim and explain what Wi-Fi standard it actually supports.Open in a Space →
Event Schedule Lookup
List upcoming tech conferences in San Francisco for Q2 2025. Include dates, venue, and a one-sentence description of each event's focus.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Search for recent academic papers on transformer attention mechanisms published in 2024, then summarize the three most-cited findings about computational efficiency improvements.
After searching recent literature, three key efficiency findings emerge: (1) Sparse attention patterns reduce FLOPs by 40-60% with minimal accuracy loss in models under 7B parameters, particularly effective for long-context tasks. (2) Flash Attention 3 implementations show 2.3x speedup over standard attention on A100 GPUs by optimizing memory hierarchy access patterns. (3) Mixture-of-Experts routing with learned sparsity achieves sub-linear scaling for context windows beyond 32K tokens. These approaches are increasingly combined in production systems, though sparse methods still struggle with tasks requiring dense cross-document reasoning.
This example highlights the model's search-augmented retrieval capability — it can pull current information beyond its training cutoff. The 128K context window supports synthesizing multiple papers in one pass. Trade-off: search quality depends on query formulation; vague prompts yield surface-level results rather than deep technical synthesis.
I'm debugging a Python Flask app where POST requests return 405 errors only in production, not locally. Both use gunicorn. Search for common causes and walk me through diagnosis steps.
Common causes for production-only 405 errors include: reverse proxy misconfiguration (nginx/Apache rewriting methods), CORS preflight handling differences, or route registration order issues in multi-worker setups. Start by checking: (1) Your nginx config — ensure `proxy_method` isn't set and `proxy_pass` includes trailing slashes correctly. (2) Gunicorn worker count — test with `--workers=1` to rule out race conditions in route registration. (3) Add logging to Flask's `before_request` hook to confirm the method reaching your app. (4) Compare `curl -X POST -v` outputs between environments to spot header differences. Most 405 cases trace to proxy layer method handling, not application code.
Demonstrates real-time search for troubleshooting scenarios where the model's training data may be outdated. The diagnosis steps are practical and ordered by likelihood. Trade-off: at $0.60/Mtok output, search-heavy debugging sessions cost more than base GPT-4o-mini; users should scope queries tightly.
What are the current FDA approval stages for GLP-1 receptor agonists targeting Alzheimer's disease? Include any phase 3 trial results published in the last six months.
As of recent searches, semaglutide (Novo Nordisk) and liraglutide are in phase 2/3 trials for Alzheimer's, focusing on neuroinflammation and insulin resistance pathways. The EVOKE trial (semaglutide) reported interim results in Q3 2024 showing modest cognitive score improvements in early-stage patients, though statistical significance remains borderline. Liraglutide's LIRA-MIND extension study found sustained benefits in glucose metabolism markers but no definitive cognitive endpoint wins. No GLP-1 agonist has reached FDA approval for Alzheimer's indication yet; current approvals remain limited to diabetes and obesity. Timeline estimates suggest earliest approval consideration in late 2025 if phase 3 endpoints are met.
Showcases the model's ability to retrieve time-sensitive regulatory and clinical data — critical for medical, legal, or investment research where staleness kills accuracy. The 128K window allows synthesizing multiple trial reports. Trade-off: search preview models can't verify source credibility in real-time; users should cross-check citations for high-stakes decisions.
Use-case deep-dives
When GPT-4o-mini wins for high-volume ticket routing under budget
A 12-person SaaS company handling 400+ support tickets daily needs to classify urgency and route to the right specialist without burning through their AI budget. GPT-4o-mini hits the sweet spot here: the 128k context window lets you dump entire ticket threads plus knowledge-base snippets into a single call, and at $0.15 per million input tokens you're spending roughly $6 per 100k tickets on classification alone. The search preview variant means you can reference live docs or recent similar tickets without pre-embedding everything. If your tickets average under 2k tokens and you need sub-second routing decisions, this model keeps cost per interaction under a cent while maintaining enough reasoning depth to catch edge cases that simpler classifiers miss. Switch to standard GPT-4o if accuracy on ambiguous cases justifies 3x the cost.
Why GPT-4o-mini works for overnight moderation jobs at scale
A 5-person community platform runs nightly moderation on 20k user-generated posts, flagging policy violations before the morning shift reviews borderline cases. GPT-4o-mini's pricing makes this viable: at $0.60/Mtok output you're generating short flag summaries for pennies per thousand posts, and the 128k window means you can include full comment threads plus the last 50 posts from repeat offenders in one prompt. The search preview feature lets you pull live policy updates or recent precedent cases mid-batch without redeploying. If you're processing under 50k posts per night and can tolerate a 2-3% false-negative rate on nuanced violations, this model keeps your moderation budget under $40/month while catching the obvious stuff automatically. Move to Claude 3.5 Sonnet if your community has complex cultural context that demands higher recall.
When to use GPT-4o-mini for cross-referencing internal reports
A 3-person strategy consultancy needs to synthesize findings from 8-12 client interview transcripts and market reports into a single brief, typically 40-60k tokens of source material per project. GPT-4o-mini's 128k context window handles this in one shot without chunking, and the search preview capability means you can reference live competitor data or recent news without manual copy-paste. At $0.15 input per Mtok you're spending about $0.60 per synthesis run on ingestion, with output summaries adding another $0.30 for a 500-word brief. The model handles straightforward pattern-matching across documents well enough for first-draft synthesis. If your reports require deep logical inference across conflicting data points or you're synthesizing technical research with domain jargon, step up to GPT-4o standard for the reasoning boost.
Frequently asked
Is GPT-4o-mini Search Preview good for general text tasks?
Yes, it handles most standard text work well — drafting, summarization, Q&A, light analysis. The 128k context window lets you feed in long documents without chunking. It's a solid default for teams that need reliable output without paying GPT-4 rates. Don't expect frontier reasoning or deep technical problem-solving, but for everyday writing and information tasks it's competent and fast.
Is GPT-4o-mini Search Preview cheaper than GPT-4o?
Much cheaper. At $0.15 input and $0.60 output per million tokens, you're paying roughly 10-15x less than full GPT-4o for most workloads. If your use case doesn't need the absolute best reasoning — customer support, content drafting, data extraction — this pricing makes it viable to run high-volume jobs that would bankrupt you on the flagship model.
Can it handle 128k tokens in practice or does quality degrade?
The 128k window is real, but like all models, attention quality drops toward the middle of very long contexts. For documents under 50k tokens it's reliable. Beyond that, put critical information near the start or end of your prompt. If you're regularly hitting 100k+ tokens, test retrieval accuracy on your specific data before committing to production workflows.
How does GPT-4o-mini Search Preview compare to Claude Haiku?
Both target the budget-friendly tier. GPT-4o-mini typically edges out Haiku on instruction-following and structured output, while Haiku is often preferred for nuanced writing and safety-conscious responses. Pricing is comparable. If you're already in the OpenAI ecosystem or need tight JSON schema adherence, stick with this. If you value tone and refusal behavior, test Haiku side-by-side.
Should I use this for customer-facing chatbots?
Yes, if cost matters and you can tolerate occasional mediocre responses. The latency is good, the context window handles multi-turn conversations, and the price lets you scale to thousands of users. Just build in human escalation paths — it will misunderstand edge cases more often than GPT-4o. Monitor failure modes in your domain before going fully automated.