Directory
Every AI model, in one place.
Pricing, benchmarks, provider latency, and how teams actually use each one.
4 matches
About sound models
Sound models generate audio from text descriptions or convert written content into spoken output. Teams use them for prototyping audio interfaces, creating voiceovers without recording sessions, or adding speech to applications where hiring voice talent doesn't scale. The key technical differentiator is output fidelity — whether the model produces clean prosody and natural intonation or falls into the uncanny valley of robotic cadence.
Choosing inside this category comes down to speed versus quality. Fast TTS models return audio in under a second but sacrifice expressiveness; higher-fidelity options take longer to generate and cost more per request. If you're building real-time features like voice assistants, latency matters more than perfect pronunciation. If you're producing content for human listeners who'll notice awkward pacing, invest in the slower, better models.
- openaiOpenAI: GPT Audio
The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced...
Voice128k ctx$2.50/M - openaiOpenAI: GPT Audio Mini
A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million...
Voice128k ctx$0.60/M - openaiTTS-1 (Fast)
Fast text-to-speech with natural voice
Voice4k ctx$0.01/M - openaiTTS-1 HD (High Quality)
High-quality text-to-speech output
Voice4k ctx$0.03/M