SOUNDgoogle

Google: Lyria 3 Pro Preview

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...

Anyone in the Space can @-mention Google: Lyria 3 Pro Preview with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Lyria 3 Pro Preview is Google's experimental audio generation model with a massive 1M-token context window and zero-cost preview access. It handles text, image, and audio inputs to produce sound output, positioning it for multimodal audio synthesis tasks. The lack of public benchmarks and preview status mean production reliability is unproven. If you need exploratory audio generation with rich context—music from long scripts, soundscapes from image sequences—this is worth testing while it's free. Wait for the stable release if you need predictable output quality.

Best for

  • Multimodal audio synthesis from text and images
  • Long-form music generation with narrative context
  • Exploratory sound design prototyping
  • Cost-free audio experimentation during preview

Strengths

The 1M-token context window dwarfs typical audio models, enabling generation informed by entire scripts, image galleries, or extended audio references in a single pass. Multimodal input support lets you condition sound on visual content—useful for video soundtracks or image-to-audio workflows. Zero pricing during preview removes cost barriers for experimentation. Google's Lyria lineage suggests strong musical coherence, though public validation is pending.

Trade-offs

Preview status means no SLA, no guaranteed uptime, and output quality may shift as Google iterates. Zero public benchmarks leave you blind on objective performance versus competitors like Stable Audio or MusicGen. The model may disappear or move to paid tiers without notice. Proprietary licensing blocks self-hosting or fine-tuning. If you need stable, auditable audio generation today, established models with published metrics are safer bets.

Specifications

Provider
google
Category
sound
Context length
1,048,576 tokens
Max output
65,536 tokens
Modalities
text, image, audio
License
proprietary
Released
2026-03-30

Pricing

Input
$0.00/Mtok
Output
$0.00/Mtok
Model ID
google/lyria-3-pro-preview

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
Freeno token cost
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
google1049k$0.00/Mtok$0.00/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Cinematic Soundscape from Storyboard

Create a 90-second soundscape for a rainy cyberpunk alley scene. The attached image shows neon signs and wet pavement. Layer distant traffic, rain on metal, and a faint synth drone. Keep the mood tense but not aggressive.
Open in a Space →

Adaptive Music from Long Narrative

Compose a 5-minute orchestral piece that follows this three-act structure: [paste 2000-word story outline]. Start with sparse strings in Act 1, build to brass in Act 2, resolve with piano in Act 3. Match emotional beats to the narrative turns.
Open in a Space →

Image-to-Audio Texture Design

Analyze the attached macro photograph of tree bark and generate a 60-second audio texture that mirrors its rough, layered quality. Use granular synthesis timbres—crackling, woody, organic. No melody, just evolving texture.
Open in a Space →

Dialogue-Aware Background Score

Generate background music for this 3-page dialogue scene: [paste screenplay excerpt]. Duck under spoken lines, swell during pauses. Use acoustic guitar and light percussion. Keep it unobtrusive but emotionally supportive of the character tension.
Open in a Space →

Extended Audio Reference Remix

Here's a 10-minute field recording of a forest at dawn. Create a 4-minute ambient remix that isolates bird calls, adds subtle reverb, and layers a soft drone underneath. Preserve the original's spatial feel but make it more meditative.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Generate a 30-second cinematic trailer score with rising tension, orchestral strings, and a dramatic brass climax. Should feel like a sci-fi thriller reveal.

Output

The model produces a layered orchestral composition that builds from sparse, tremolo strings into a full ensemble crescendo. The brass section enters at the 18-second mark with a bold fanfare motif, supported by timpani rolls and cymbal swells. The harmonic progression moves through minor keys with chromatic tension, resolving into a major chord at the climax. The stereo field is wide, with clear separation between string sections and a centered brass presence. Dynamic range is cinematic, moving from pianissimo to fortissimo naturally.

Notes

This example highlights Lyria 3 Pro's ability to handle complex multi-instrument arrangements with realistic orchestral timbre. The 1M+ token context window allows detailed musical direction across multiple parameters. However, without public benchmarks, it's unclear how consistently the model interprets nuanced compositional terms like 'tremolo' or maintains harmonic coherence across longer pieces.

Prompt

Create a 15-second podcast intro jingle: upbeat acoustic guitar, light percussion, whistling melody. Friendly and approachable tone, suitable for a tech news show.

Output

The model generates a bright, fingerpicked acoustic guitar pattern in a major key with a tempo around 120 BPM. A shaker provides rhythmic texture on the off-beats, while a human-like whistled melody carries the hook over two repetitions. The production is clean with minimal reverb, keeping the intimate, conversational feel intact. The arrangement leaves space in the mix for voiceover to sit comfortably on top.

Notes

Demonstrates Lyria 3 Pro's range beyond orchestral work—it handles organic, acoustic textures and simpler arrangements effectively. The multimodal input support means you could reference an image of the podcast artwork for tonal guidance. The free pricing during preview makes it accessible for rapid iteration, though production-ready consistency remains to be validated without benchmark data.

Prompt

Generate ambient background music for a meditation app: slow evolving pads, subtle nature sounds (distant water, wind), no percussion. 45 seconds, seamless loop potential.

Output

The model creates a sustained soundscape built on warm synthesizer pads that shift gradually through major seventh and suspended chords. Layered underneath are field-recording-style textures: gentle stream babble panned slightly left, soft wind movement across the stereo field. The piece avoids rhythmic elements entirely, maintaining a floating, timeless quality. The final four seconds mirror the opening phrase, creating natural loop points without abrupt transitions.

Notes

Shows Lyria 3 Pro's capacity for generative ambient work where textural evolution matters more than melodic complexity. The audio modality input could theoretically let you provide a reference track for tonal matching. The challenge with ambient generation is maintaining interest without repetition—this example assumes the model balances variation and consistency, but longer-form coherence is untested in public benchmarks.

Use-case deep-dives

Podcast post-production workflow

When free audio generation makes sense for weekly podcast teams

A 4-person podcast studio shipping 2-3 episodes per week needs intro music, transition stings, and background beds without licensing headaches. Lyria 3 Pro Preview wins here because the $0.00/Mtok pricing removes the per-episode cost anxiety that kills experimentation with commercial libraries. The 1M token context window means you can feed it full episode transcripts to generate music that matches tone shifts across a 45-minute conversation. The trade-off: no public benchmarks means you're flying blind on audio quality consistency compared to ElevenLabs or Suno. If your brand tolerates some variability and you're generating 20+ audio assets per month, the free tier pays for itself in saved licensing fees. Below 10 assets monthly, the setup overhead isn't worth it.

E-learning course audio narration

Where zero-cost audio generation supports high-volume course production

A 12-person ed-tech company producing 8-10 micro-courses monthly needs voiceover for slide decks, scenario simulations, and quiz feedback. Lyria 3 Pro Preview fits because the free pricing model scales with their production volume—each course averages 40-60 audio clips, which would cost $200-400/month on metered alternatives. The text+image+audio modality support means they can generate contextual sound effects from slide screenshots without manual asset hunting. The risk: 'Preview' status suggests this pricing won't last, and no benchmarks means you can't predict whether voice quality will match your brand standards before you commit to a full course build. If you're prototyping a new course vertical or your audience skews toward non-native English speakers (where slight audio artifacts matter less), test it now while it's free.

Game prototype sound design

When free audio iteration beats asset store licensing for indie studios

A 3-person indie game studio in pre-alpha needs 100+ placeholder sound effects for combat, UI, and ambient loops before they know which mechanics will survive playtesting. Lyria 3 Pro Preview works because the zero-cost model lets them generate and discard dozens of variations per mechanic without budget guilt. The 1M token context means they can feed it gameplay video frames and get audio that roughly matches visual timing. The catch: no benchmarks on latency or audio length limits, so you can't rely on it for real-time generation or know if it caps at 10-second clips. If you're more than 6 months from launch and burning through asset store credits on throwaway sounds, switch now. If you're in beta with locked-down audio needs, pay for Soundraw or Loudly where quality is documented.

Frequently asked

Is Lyria 3 Pro Preview good for music generation?

Yes, Lyria 3 Pro Preview is Google's latest sound model designed specifically for music and audio generation. It accepts text, image, and audio inputs with a 1M token context window, letting you generate longer compositions or iterate on existing audio. Since it's a preview release, expect some rough edges, but the multimodal input support makes it versatile for creative workflows.

Is Lyria 3 Pro Preview free to use?

Yes, currently Lyria 3 Pro Preview has $0 pricing for both input and output tokens. This is typical for Google's preview releases as they gather usage data and feedback. Expect pricing to appear once the model moves to general availability, likely comparable to other multimodal models in Google's lineup.

Can Lyria 3 Pro Preview handle long-form audio generation?

The 1M token context window suggests it can process lengthy audio inputs and potentially generate extended compositions, but without public benchmarks we can't confirm output duration limits. Preview models often have undocumented constraints on generation length. Test your specific use case early, especially if you need multi-minute outputs.

How does Lyria 3 Pro Preview compare to previous Lyria versions?

Google hasn't published benchmarks comparing Lyria 3 Pro Preview to earlier versions, so we're working blind on quality improvements. The 'Pro' designation and preview status suggest this is their most capable sound model yet, but without comparative data on audio fidelity, coherence, or prompt adherence, you'll need to A/B test against Lyria 2 yourself.

Should I use Lyria 3 Pro Preview for production audio workflows?

Not yet. Preview models lack stability guarantees, documented rate limits, and SLAs. Use it for prototyping and creative exploration while it's free, but keep a production-ready alternative like ElevenLabs or Stability AI's models for client work. The multimodal input is compelling for experimentation, though.

Data last verified 7 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.