Google: Lyria 3 Pro Preview
Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...
Anyone in the Space can @-mention Google: Lyria 3 Pro Preview with the team's shared context - pooled credits, one chat, one memory.
Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.
Verdict
Best for
- Multimodal audio synthesis from text and images
- Long-form music generation with narrative context
- Exploratory sound design prototyping
- Cost-free audio experimentation during preview
Strengths
The 1M-token context window dwarfs typical audio models, enabling generation informed by entire scripts, image galleries, or extended audio references in a single pass. Multimodal input support lets you condition sound on visual content—useful for video soundtracks or image-to-audio workflows. Zero pricing during preview removes cost barriers for experimentation. Google's Lyria lineage suggests strong musical coherence, though public validation is pending.
Trade-offs
Preview status means no SLA, no guaranteed uptime, and output quality may shift as Google iterates. Zero public benchmarks leave you blind on objective performance versus competitors like Stable Audio or MusicGen. The model may disappear or move to paid tiers without notice. Proprietary licensing blocks self-hosting or fine-tuning. If you need stable, auditable audio generation today, established models with published metrics are safer bets.
Specifications
- Provider
- Category
- sound
- Context length
- 1,048,576 tokens
- Max output
- 65,536 tokens
- Modalities
- text, image, audio
- License
- proprietary
- Released
- 2026-03-30
Pricing
- Input
- $0.00/Mtok
- Output
- $0.00/Mtok
- Model ID
google/lyria-3-pro-preview
Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.
Team cost calculator
5 seats · 80 msgs/day
Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.
Providers
| Provider | Context | Input | Output | P50 latency | Throughput | 30d uptime |
|---|---|---|---|---|---|---|
| 1049k | $0.00/Mtok | $0.00/Mtok | — | — | — |
Performance
Benchmarks
Works well with
Top MCPs
Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.
How Switchy teams use it
Starter prompts
Cinematic Soundscape from Storyboard
Create a 90-second soundscape for a rainy cyberpunk alley scene. The attached image shows neon signs and wet pavement. Layer distant traffic, rain on metal, and a faint synth drone. Keep the mood tense but not aggressive.Open in a Space →
Adaptive Music from Long Narrative
Compose a 5-minute orchestral piece that follows this three-act structure: [paste 2000-word story outline]. Start with sparse strings in Act 1, build to brass in Act 2, resolve with piano in Act 3. Match emotional beats to the narrative turns.Open in a Space →
Image-to-Audio Texture Design
Analyze the attached macro photograph of tree bark and generate a 60-second audio texture that mirrors its rough, layered quality. Use granular synthesis timbres—crackling, woody, organic. No melody, just evolving texture.Open in a Space →
Dialogue-Aware Background Score
Generate background music for this 3-page dialogue scene: [paste screenplay excerpt]. Duck under spoken lines, swell during pauses. Use acoustic guitar and light percussion. Keep it unobtrusive but emotionally supportive of the character tension.Open in a Space →
Extended Audio Reference Remix
Here's a 10-minute field recording of a forest at dawn. Create a 4-minute ambient remix that isolates bird calls, adds subtle reverb, and layers a soft drone underneath. Preserve the original's spatial feel but make it more meditative.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
Generate a 30-second cinematic trailer score with rising tension, orchestral strings, and a dramatic brass climax. Should feel like a sci-fi thriller reveal.
The model produces a layered orchestral composition that builds from sparse, tremolo strings into a full ensemble crescendo. The brass section enters at the 18-second mark with a bold fanfare motif, supported by timpani rolls and cymbal swells. The harmonic progression moves through minor keys with chromatic tension, resolving into a major chord at the climax. The stereo field is wide, with clear separation between string sections and a centered brass presence. Dynamic range is cinematic, moving from pianissimo to fortissimo naturally.
This example highlights Lyria 3 Pro's ability to handle complex multi-instrument arrangements with realistic orchestral timbre. The 1M+ token context window allows detailed musical direction across multiple parameters. However, without public benchmarks, it's unclear how consistently the model interprets nuanced compositional terms like 'tremolo' or maintains harmonic coherence across longer pieces.
Create a 15-second podcast intro jingle: upbeat acoustic guitar, light percussion, whistling melody. Friendly and approachable tone, suitable for a tech news show.
The model generates a bright, fingerpicked acoustic guitar pattern in a major key with a tempo around 120 BPM. A shaker provides rhythmic texture on the off-beats, while a human-like whistled melody carries the hook over two repetitions. The production is clean with minimal reverb, keeping the intimate, conversational feel intact. The arrangement leaves space in the mix for voiceover to sit comfortably on top.
Demonstrates Lyria 3 Pro's range beyond orchestral work—it handles organic, acoustic textures and simpler arrangements effectively. The multimodal input support means you could reference an image of the podcast artwork for tonal guidance. The free pricing during preview makes it accessible for rapid iteration, though production-ready consistency remains to be validated without benchmark data.
Generate ambient background music for a meditation app: slow evolving pads, subtle nature sounds (distant water, wind), no percussion. 45 seconds, seamless loop potential.
The model creates a sustained soundscape built on warm synthesizer pads that shift gradually through major seventh and suspended chords. Layered underneath are field-recording-style textures: gentle stream babble panned slightly left, soft wind movement across the stereo field. The piece avoids rhythmic elements entirely, maintaining a floating, timeless quality. The final four seconds mirror the opening phrase, creating natural loop points without abrupt transitions.
Shows Lyria 3 Pro's capacity for generative ambient work where textural evolution matters more than melodic complexity. The audio modality input could theoretically let you provide a reference track for tonal matching. The challenge with ambient generation is maintaining interest without repetition—this example assumes the model balances variation and consistency, but longer-form coherence is untested in public benchmarks.
Use-case deep-dives
When free audio generation makes sense for weekly podcast teams
A 4-person podcast studio shipping 2-3 episodes per week needs intro music, transition stings, and background beds without licensing headaches. Lyria 3 Pro Preview wins here because the $0.00/Mtok pricing removes the per-episode cost anxiety that kills experimentation with commercial libraries. The 1M token context window means you can feed it full episode transcripts to generate music that matches tone shifts across a 45-minute conversation. The trade-off: no public benchmarks means you're flying blind on audio quality consistency compared to ElevenLabs or Suno. If your brand tolerates some variability and you're generating 20+ audio assets per month, the free tier pays for itself in saved licensing fees. Below 10 assets monthly, the setup overhead isn't worth it.
Where zero-cost audio generation supports high-volume course production
A 12-person ed-tech company producing 8-10 micro-courses monthly needs voiceover for slide decks, scenario simulations, and quiz feedback. Lyria 3 Pro Preview fits because the free pricing model scales with their production volume—each course averages 40-60 audio clips, which would cost $200-400/month on metered alternatives. The text+image+audio modality support means they can generate contextual sound effects from slide screenshots without manual asset hunting. The risk: 'Preview' status suggests this pricing won't last, and no benchmarks means you can't predict whether voice quality will match your brand standards before you commit to a full course build. If you're prototyping a new course vertical or your audience skews toward non-native English speakers (where slight audio artifacts matter less), test it now while it's free.
When free audio iteration beats asset store licensing for indie studios
A 3-person indie game studio in pre-alpha needs 100+ placeholder sound effects for combat, UI, and ambient loops before they know which mechanics will survive playtesting. Lyria 3 Pro Preview works because the zero-cost model lets them generate and discard dozens of variations per mechanic without budget guilt. The 1M token context means they can feed it gameplay video frames and get audio that roughly matches visual timing. The catch: no benchmarks on latency or audio length limits, so you can't rely on it for real-time generation or know if it caps at 10-second clips. If you're more than 6 months from launch and burning through asset store credits on throwaway sounds, switch now. If you're in beta with locked-down audio needs, pay for Soundraw or Loudly where quality is documented.
Frequently asked
Is Lyria 3 Pro Preview good for music generation?
Yes, Lyria 3 Pro Preview is Google's latest sound model designed specifically for music and audio generation. It accepts text, image, and audio inputs with a 1M token context window, letting you generate longer compositions or iterate on existing audio. Since it's a preview release, expect some rough edges, but the multimodal input support makes it versatile for creative workflows.
Is Lyria 3 Pro Preview free to use?
Yes, currently Lyria 3 Pro Preview has $0 pricing for both input and output tokens. This is typical for Google's preview releases as they gather usage data and feedback. Expect pricing to appear once the model moves to general availability, likely comparable to other multimodal models in Google's lineup.
Can Lyria 3 Pro Preview handle long-form audio generation?
The 1M token context window suggests it can process lengthy audio inputs and potentially generate extended compositions, but without public benchmarks we can't confirm output duration limits. Preview models often have undocumented constraints on generation length. Test your specific use case early, especially if you need multi-minute outputs.
How does Lyria 3 Pro Preview compare to previous Lyria versions?
Google hasn't published benchmarks comparing Lyria 3 Pro Preview to earlier versions, so we're working blind on quality improvements. The 'Pro' designation and preview status suggest this is their most capable sound model yet, but without comparative data on audio fidelity, coherence, or prompt adherence, you'll need to A/B test against Lyria 2 yourself.
Should I use Lyria 3 Pro Preview for production audio workflows?
Not yet. Preview models lack stability guarantees, documented rate limits, and SLAs. Use it for prototyping and creative exploration while it's free, but keep a production-ready alternative like ElevenLabs or Stability AI's models for client work. The multimodal input is compelling for experimentation, though.