Directory
Every AI model, in one place.
Pricing, benchmarks, provider latency, and how teams actually use each one.
2 matches
About sound models
Sound models generate audio from text descriptions or convert written content into spoken output. Teams use them for prototyping audio interfaces, creating voiceovers without recording sessions, or adding speech to applications where hiring voice talent doesn't scale. The key technical differentiator is output fidelity — whether the model produces clean prosody and natural intonation or falls into the uncanny valley of robotic cadence.
Choosing inside this category comes down to speed versus quality. Fast TTS models return audio in under a second but sacrifice expressiveness; higher-fidelity options take longer to generate and cost more per request. If you're building real-time features like voice assistants, latency matters more than perfect pronunciation. If you're producing content for human listeners who'll notice awkward pacing, invest in the slower, better models.
- googleGoogle: Lyria 3 Pro Preview
Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...
Voice1049k ctx$0.00/M - googleGoogle: Lyria 3 Clip Preview
30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...
Voice1049k ctx$0.00/M