OpenAI: GPT-3.5 Turbo (older v0613)
GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.
Anyone in the Space can @-mention OpenAI: GPT-3.5 Turbo (older v0613) with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Legacy system compatibility requiring v0613
- High-volume simple classification under budget
- Short-form content generation at scale
Strengths
The pricing remains competitive for high-throughput workloads where cost per token matters more than capability. Response latency is fast, making it viable for real-time applications like chatbots handling straightforward queries. For tasks that fit comfortably under 4K tokens—basic sentiment analysis, short email drafts, simple Q&A—it delivers acceptable quality at a fraction of the cost of frontier models.
Trade-offs
The 4K context window is the critical limitation: you cannot fit most full documents, long conversations, or multi-file codebases. Instruction-following and reasoning lag behind GPT-4 and current GPT-3.5 versions by a measurable margin. OpenAI has deprecated active development on this version, so you won't see improvements or expanded capabilities. For any new project, GPT-3.5 Turbo (latest) offers a 16K window and better performance at similar pricing.
Specifications
- Provider
- openai
- Category
- llm
- Context length
- 4,095 tokens
- Max output
- 4,096 tokens
- Modalities
- text
- License
- proprietary
- Released
- 2024-01-25
Pricing
- Input
- $1.00/Mtok
- Output
- $2.00/Mtok
- Model ID
openai/gpt-3.5-turbo-0613
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| openai | 4k | $1.00/Mtok | $2.00/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Email Sentiment Triage
Classify this customer email into one of three categories: URGENT (angry or time-sensitive), ROUTINE (standard inquiry), or POSITIVE (praise or thanks). Respond with only the category name. Email: [paste email text here]Open in a Space →
Product Description Generator
Write a 2-3 sentence product description for an e-commerce site based on these features. Keep it under 50 words and focus on customer benefits. Product: [name] Features: [list key features]Open in a Space →
Social Media Caption Draft
Turn these notes into an engaging Instagram caption. Keep it under 150 characters, include 1-2 relevant emoji, and end with a call-to-action. Notes: [your content notes]Open in a Space →
FAQ Response Template
Write a clear, friendly FAQ answer to this common customer question. Keep it under 100 words and include one actionable next step. Question: [customer question]Open in a Space →
Meeting Title Summarizer
Read this meeting agenda and create a clear 4-6 word meeting title that captures the main topic. Agenda: [paste agenda text]Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Summarise this 800-word customer support transcript into three bullet points highlighting the customer's main issue, our resolution, and any follow-up needed.
This example would produce a concise three-bullet summary extracting the core complaint (e.g. billing discrepancy), the support agent's resolution steps (refund issued, account credited), and next actions (customer to verify receipt within 3-5 days). The model structures information clearly and maintains factual accuracy when condensing straightforward dialogues, though it may occasionally miss nuanced subtext or emotional undertones present in the original exchange.
GPT-3.5 Turbo v0613 excels at structured summarisation tasks where the input follows predictable patterns. The 4K token window handles most support transcripts comfortably. However, this older checkpoint predates instruction-tuning improvements in later releases, so complex multi-step reasoning or ambiguous requests may produce less reliable outputs than current models.
Write a professional email declining a meeting request for next Tuesday. Keep it polite, suggest alternative times later in the month, and keep it under 100 words.
This example would generate a courteous 80-word email opening with appreciation for the invitation, clearly stating unavailability on Tuesday, and proposing two specific alternative slots (e.g. 'Would the 24th at 2pm or the 27th at 10am work for you?'). The tone remains professional without being stiff, and the structure follows standard business email conventions—greeting, decline, alternatives, closing.
Straightforward templated writing is a strength of this checkpoint. The model reliably follows length constraints and tone guidance for routine correspondence. The v0613 release handles basic instruction-following well, though it lacks the nuanced style adaptation and context retention of GPT-4 or later 3.5 versions when requests involve multiple conflicting constraints.
Extract all product names, prices, and SKU codes from this e-commerce page HTML snippet and return them as a JSON array.
This example would parse the HTML, identify product data within common e-commerce markup patterns (divs with class names like 'product-card', 'price', 'sku'), and return a clean JSON array: [{'name': 'Wireless Mouse', 'price': '$24.99', 'sku': 'WM-2301'}, ...]. The extraction handles standard formatting reliably, though irregular HTML structures or obfuscated class names may cause the model to miss entries or hallucinate fields not present in the source.
This checkpoint performs adequately on structured data extraction from semi-predictable formats like e-commerce HTML. The 4K context window limits how much page content you can process in one request—roughly 2-3 product listings with full markup. For large-scale scraping or complex nested structures, newer models with larger windows and better parsing offer fewer errors.
Use-case deep-dives
When GPT-3.5 Turbo v0613 makes sense for early-stage chat experiments
A 3-person startup building their first customer-facing chatbot should start here. At $1.50 blended per million tokens, this model costs roughly 20× less than GPT-4 variants while handling straightforward Q&A, appointment booking, and FAQ routing without issue. The 4095-token context is tight—you'll hit limits on conversations longer than 8-10 exchanges—but for prototyping flows and validating user intent patterns, that constraint forces good design. If your bot sees under 10,000 messages/month and doesn't need nuanced reasoning or multi-turn memory, this older checkpoint delivers functional responses at a price that won't drain seed funding. Once you prove the concept and traffic crosses 50,000 messages/month, migrate to a newer 16k-context model.
Why this model still works for simple support-ticket tagging at scale
A 12-person e-commerce support team routing 2,000 inbound emails daily can use GPT-3.5 Turbo v0613 to tag tickets into 6-8 categories (refund, shipping, product question) before human review. Each classification call uses roughly 150 tokens input and 10 tokens output—at $1.50 blended that's $0.00024 per email, or about $15/month for 60,000 tickets. The model handles single-label classification reliably when categories are distinct and the prompt is well-structured. You'll see accuracy drop on edge cases (complaints spanning multiple categories, sarcasm, non-English), but for 80% of volume this checkpoint is fast and cheap enough to justify the occasional misroute. If accuracy falls below 85% after spot-checking 200 tickets, upgrade to GPT-4o-mini for $0.15 blended and tighter reasoning.
When to use this model for internal stand-up notes under 1,000 words
A 6-person product team recording daily 15-minute stand-ups (roughly 800-word transcripts) can feed those transcripts to GPT-3.5 Turbo v0613 for bullet-point summaries without hitting context limits. At 1,200 tokens input and 200 tokens output per summary, you're spending $0.0016 per meeting—negligible even at 250 meetings/year. The model extracts action items and blockers competently when the transcript is clean and the team uses consistent terminology. It struggles with crosstalk, multiple speakers on the same topic, or meetings that reference prior context outside the current transcript. If your stand-ups run longer than 20 minutes or reference previous decisions frequently, the 4k window becomes a bottleneck and you'll need GPT-4o-mini's 128k context to avoid truncation.
Frequently asked
Is GPT-3.5 Turbo v0613 still good for production chatbots in 2024?
No, not for new projects. This June 2023 snapshot is two generations behind GPT-4o and lacks function calling improvements that shipped in later 3.5 versions. The 4K context window breaks on any conversation longer than a few exchanges. Use the latest gpt-3.5-turbo endpoint instead, which routes to newer snapshots with 16K context and better instruction following.
Is GPT-3.5 Turbo v0613 cheaper than Claude Haiku or Gemini Flash?
Yes on input ($1.00/Mtok vs $0.25-$0.80 for competitors), but output costs $2.00/Mtok which erases the advantage on any task generating more than a sentence. Gemini 1.5 Flash at $0.075/$0.30 per Mtok is 13x cheaper overall and handles 1M token context. The only reason to use v0613 is if you need deterministic behavior from this exact checkpoint for regression testing.
Can GPT-3.5 Turbo v0613 handle multi-turn conversations with context?
Barely. The 4095 token limit means you get roughly 3000 tokens for conversation history after system prompt and response budget. That's 6-8 exchanges before you're truncating context. Later 3.5 snapshots support 16K tokens; GPT-4o supports 128K. If your use case involves any meaningful conversation history, this version will fail in production.
How does v0613 compare to the current gpt-3.5-turbo endpoint?
The current endpoint uses gpt-3.5-turbo-0125 (January 2024), which has 4x the context window, better function calling, improved instruction following, and costs 50% less on input. v0613 exists for teams that pinned this snapshot for reproducibility and haven't migrated. Unless you're maintaining legacy behavior, there's no technical reason to use v0613 over the latest version.
Should I use this model for summarizing documents or extracting data?
Only if your documents are under 3000 tokens (roughly 2 pages). The 4K context window can't fit most real documents, and you'll spend more time chunking than the $1/Mtok input saves you. For document work, use GPT-4o-mini at $0.15/$0.60 per Mtok with 128K context, or Gemini Flash if cost matters more than quality.