LLMopenai

OpenAI: GPT-3.5 Turbo (older v0613)

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.

Anyone in the Space can @-mention OpenAI: GPT-3.5 Turbo (older v0613) with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

GPT-3.5 Turbo v0613 is OpenAI's legacy fast model from mid-2023, now superseded by newer versions with larger context windows and better performance. At $1/$2 per Mtok it remains cheap, but the 4K context window severely limits document work and the model trails current alternatives on reasoning and instruction-following. Reach for this only if you're maintaining legacy integrations that pin to v0613 or need the absolute lowest OpenAI API cost for simple classification tasks.

Best for

Legacy system compatibility requiring v0613
High-volume simple classification under budget
Short-form content generation at scale

Strengths

The pricing remains competitive for high-throughput workloads where cost per token matters more than capability. Response latency is fast, making it viable for real-time applications like chatbots handling straightforward queries. For tasks that fit comfortably under 4K tokens—basic sentiment analysis, short email drafts, simple Q&A—it delivers acceptable quality at a fraction of the cost of frontier models.

Trade-offs

The 4K context window is the critical limitation: you cannot fit most full documents, long conversations, or multi-file codebases. Instruction-following and reasoning lag behind GPT-4 and current GPT-3.5 versions by a measurable margin. OpenAI has deprecated active development on this version, so you won't see improvements or expanded capabilities. For any new project, GPT-3.5 Turbo (latest) offers a 16K window and better performance at similar pricing.

Specifications

Provider: openai
Category: llm
Context length: 4,095 tokens
Max output: 4,096 tokens
Modalities: text
License: proprietary
Released: 2024-01-25

Pricing

Input: $1.00/Mtok
Output: $2.00/Mtok
Model ID: openai/gpt-3.5-turbo-0613

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Seats5 peopleMessages / seat / day80Avg turn size2 ktokOutput share30 %

Estimated monthly spend

$22.88

17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

Provider	Context	Input	Output	P50 latency	Throughput	30d uptime
openai	4k	$1.00/Mtok	$2.00/Mtok	—	—	—

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Email Sentiment Triage

Classify this customer email into one of three categories: URGENT (angry or time-sensitive), ROUTINE (standard inquiry), or POSITIVE (praise or thanks). Respond with only the category name.

Email: [paste email text here]

Open in a Space →

Product Description Generator

Write a 2-3 sentence product description for an e-commerce site based on these features. Keep it under 50 words and focus on customer benefits.

Product: [name]
Features: [list key features]

Open in a Space →

Social Media Caption Draft

Turn these notes into an engaging Instagram caption. Keep it under 150 characters, include 1-2 relevant emoji, and end with a call-to-action.

Notes: [your content notes]

Open in a Space →

FAQ Response Template

Write a clear, friendly FAQ answer to this common customer question. Keep it under 100 words and include one actionable next step.

Question: [customer question]

Open in a Space →

Meeting Title Summarizer

Read this meeting agenda and create a clear 4-6 word meeting title that captures the main topic.

Agenda: [paste agenda text]

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Summarise this 800-word customer support transcript into three bullet points highlighting the customer's main issue, our resolution, and any follow-up needed.

Output

This example would produce a concise three-bullet summary extracting the core complaint (e.g. billing discrepancy), the support agent's resolution steps (refund issued, account credited), and next actions (customer to verify receipt within 3-5 days). The model structures information clearly and maintains factual accuracy when condensing straightforward dialogues, though it may occasionally miss nuanced subtext or emotional undertones present in the original exchange.

Notes

GPT-3.5 Turbo v0613 excels at structured summarisation tasks where the input follows predictable patterns. The 4K token window handles most support transcripts comfortably. However, this older checkpoint predates instruction-tuning improvements in later releases, so complex multi-step reasoning or ambiguous requests may produce less reliable outputs than current models.

Prompt

Write a professional email declining a meeting request for next Tuesday. Keep it polite, suggest alternative times later in the month, and keep it under 100 words.

Output

This example would generate a courteous 80-word email opening with appreciation for the invitation, clearly stating unavailability on Tuesday, and proposing two specific alternative slots (e.g. 'Would the 24th at 2pm or the 27th at 10am work for you?'). The tone remains professional without being stiff, and the structure follows standard business email conventions—greeting, decline, alternatives, closing.

Notes

Straightforward templated writing is a strength of this checkpoint. The model reliably follows length constraints and tone guidance for routine correspondence. The v0613 release handles basic instruction-following well, though it lacks the nuanced style adaptation and context retention of GPT-4 or later 3.5 versions when requests involve multiple conflicting constraints.

Prompt

Extract all product names, prices, and SKU codes from this e-commerce page HTML snippet and return them as a JSON array.

Output

This example would parse the HTML, identify product data within common e-commerce markup patterns (divs with class names like 'product-card', 'price', 'sku'), and return a clean JSON array: [{'name': 'Wireless Mouse', 'price': '$24.99', 'sku': 'WM-2301'}, ...]. The extraction handles standard formatting reliably, though irregular HTML structures or obfuscated class names may cause the model to miss entries or hallucinate fields not present in the source.

Notes

This checkpoint performs adequately on structured data extraction from semi-predictable formats like e-commerce HTML. The 4K context window limits how much page content you can process in one request—roughly 2-3 product listings with full markup. For large-scale scraping or complex nested structures, newer models with larger windows and better parsing offer fewer errors.

Use-case deep-dives

Budget chatbot prototyping

When GPT-3.5 Turbo v0613 makes sense for early-stage chat experiments

A 3-person startup building their first customer-facing chatbot should start here. At $1.50 blended per million tokens, this model costs roughly 20× less than GPT-4 variants while handling straightforward Q&A, appointment booking, and FAQ routing without issue. The 4095-token context is tight—you'll hit limits on conversations longer than 8-10 exchanges—but for prototyping flows and validating user intent patterns, that constraint forces good design. If your bot sees under 10,000 messages/month and doesn't need nuanced reasoning or multi-turn memory, this older checkpoint delivers functional responses at a price that won't drain seed funding. Once you prove the concept and traffic crosses 50,000 messages/month, migrate to a newer 16k-context model.

High-volume email classification

Why this model still works for simple support-ticket tagging at scale

A 12-person e-commerce support team routing 2,000 inbound emails daily can use GPT-3.5 Turbo v0613 to tag tickets into 6-8 categories (refund, shipping, product question) before human review. Each classification call uses roughly 150 tokens input and 10 tokens output—at $1.50 blended that's $0.00024 per email, or about $15/month for 60,000 tickets. The model handles single-label classification reliably when categories are distinct and the prompt is well-structured. You'll see accuracy drop on edge cases (complaints spanning multiple categories, sarcasm, non-English), but for 80% of volume this checkpoint is fast and cheap enough to justify the occasional misroute. If accuracy falls below 85% after spot-checking 200 tickets, upgrade to GPT-4o-mini for $0.15 blended and tighter reasoning.

Lightweight meeting summarization

When to use this model for internal stand-up notes under 1,000 words

A 6-person product team recording daily 15-minute stand-ups (roughly 800-word transcripts) can feed those transcripts to GPT-3.5 Turbo v0613 for bullet-point summaries without hitting context limits. At 1,200 tokens input and 200 tokens output per summary, you're spending $0.0016 per meeting—negligible even at 250 meetings/year. The model extracts action items and blockers competently when the transcript is clean and the team uses consistent terminology. It struggles with crosstalk, multiple speakers on the same topic, or meetings that reference prior context outside the current transcript. If your stand-ups run longer than 20 minutes or reference previous decisions frequently, the 4k window becomes a bottleneck and you'll need GPT-4o-mini's 128k context to avoid truncation.

Frequently asked

Is GPT-3.5 Turbo v0613 still good for production chatbots in 2024?

No, not for new projects. This June 2023 snapshot is two generations behind GPT-4o and lacks function calling improvements that shipped in later 3.5 versions. The 4K context window breaks on any conversation longer than a few exchanges. Use the latest gpt-3.5-turbo endpoint instead, which routes to newer snapshots with 16K context and better instruction following.

Is GPT-3.5 Turbo v0613 cheaper than Claude Haiku or Gemini Flash?

Yes on input ($1.00/Mtok vs $0.25-$0.80 for competitors), but output costs $2.00/Mtok which erases the advantage on any task generating more than a sentence. Gemini 1.5 Flash at $0.075/$0.30 per Mtok is 13x cheaper overall and handles 1M token context. The only reason to use v0613 is if you need deterministic behavior from this exact checkpoint for regression testing.

Can GPT-3.5 Turbo v0613 handle multi-turn conversations with context?

Barely. The 4095 token limit means you get roughly 3000 tokens for conversation history after system prompt and response budget. That's 6-8 exchanges before you're truncating context. Later 3.5 snapshots support 16K tokens; GPT-4o supports 128K. If your use case involves any meaningful conversation history, this version will fail in production.

How does v0613 compare to the current gpt-3.5-turbo endpoint?

The current endpoint uses gpt-3.5-turbo-0125 (January 2024), which has 4x the context window, better function calling, improved instruction following, and costs 50% less on input. v0613 exists for teams that pinned this snapshot for reproducibility and haven't migrated. Unless you're maintaining legacy behavior, there's no technical reason to use v0613 over the latest version.

Should I use this model for summarizing documents or extracting data?

Only if your documents are under 3000 tokens (roughly 2 pages). The 4K context window can't fit most real documents, and you'll spend more time chunking than the $1/Mtok input saves you. For document work, use GPT-4o-mini at $0.15/$0.60 per Mtok with 128K context, or Gemini Flash if cost matters more than quality.