SOUNDgoogle

Google: Lyria 3 Clip Preview

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...

Anyone in the Space can @-mention Google: Lyria 3 Clip Preview with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Lyria 3 Clip Preview generates short audio clips from text and image prompts with Google's latest music synthesis model. The million-token context window suggests support for detailed creative direction, though the 'preview' designation signals this is an experimental release with likely quality or feature limitations. Free pricing makes it worth testing for prototyping music beds, sound effects, or audio mockups where production polish isn't critical. Wait for the full release if you need broadcast-ready output.

Best for

  • Rapid audio prototyping for creative projects
  • Generating placeholder music for mockups
  • Exploring text-to-audio concepts at zero cost
  • Sound effect generation from descriptions

Strengths

The million-token context window allows exceptionally detailed prompts — you can describe musical structure, instrumentation, mood shifts, and reference images in a single request. Zero-cost pricing removes friction for experimentation. Multimodal input (text plus image) enables workflows where you generate audio that matches visual assets, useful for video editors and game developers needing synchronized soundscapes.

Trade-offs

The 'preview' label typically means limited clip length, possible quality inconsistencies, and no service-level guarantees. Without public benchmarks, audio fidelity relative to competitors like Suno or Stable Audio remains unproven. Google's experimental releases sometimes disappear or change dramatically between preview and GA. Proprietary licensing may restrict commercial use — check terms before building production workflows around it.

Specifications

Provider
google
Category
sound
Context length
1,048,576 tokens
Max output
65,536 tokens
Modalities
text, image, audio
License
proprietary
Released
2026-03-30

Pricing

Input
$0.00/Mtok
Output
$0.00/Mtok
Model ID
google/lyria-3-clip-preview

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
Freeno token cost
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
google1049k$0.00/Mtok$0.00/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Cinematic Tension Build

Generate a 30-second cinematic music cue. Start with quiet sustained strings in C minor, gradually adding layers: cellos at 8 seconds, French horns at 15 seconds, building to a dramatic crescendo with timpani hits at 25 seconds. Tempo 80 BPM, dark and suspenseful mood.
Open in a Space →

Upbeat Product Demo Music

Create a 45-second upbeat instrumental track suitable for a product demo video. Acoustic guitar and ukulele melody, light percussion, major key (G major), tempo 120 BPM. Friendly and optimistic tone, no vocals. Include a natural ending fade.
Open in a Space →

Sci-Fi Ambient Soundscape

Produce a 60-second ambient soundscape for a sci-fi space station interior. Layered synth pads with slow evolving textures, occasional metallic resonances, subtle electronic pulses at 0.5 Hz. No melody or rhythm, purely atmospheric. Dark and mysterious mood.
Open in a Space →

Retro 8-Bit Game Theme

Generate a 30-second 8-bit style video game theme. Square wave and triangle wave synthesis only, reminiscent of NES soundtracks. Upbeat melody in D major, tempo 140 BPM, with a repeating 8-bar loop structure. Playful and energetic character.
Open in a Space →

Nature Documentary Underscore

Create a 40-second orchestral underscore for nature documentary footage. Soft strings and woodwinds, harp accents, slow tempo 65 BPM. Peaceful and contemplative mood in E-flat major. Dynamics stay at mezzo-piano to leave room for voiceover, with a gentle swell at 25 seconds.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Create a 10-second ambient soundscape for a sci-fi game menu screen. Think distant machinery hums, subtle electrical crackles, and a low metallic drone. No melody, just atmosphere.

Output

The model would generate a layered audio clip opening with a deep, resonant metallic drone at around 80Hz, overlaid with intermittent electrical crackle textures panned across the stereo field. Around the 4-second mark, a distant mechanical hum fades in, cycling with slight pitch variation to suggest industrial machinery. The spatial positioning creates depth, with foreground crackles and background drones clearly separated. The overall mix feels cohesive and intentionally sparse, avoiding melodic elements as requested.

Notes

This example highlights Lyria 3's strength in generating non-musical sound design with spatial awareness. The model handles abstract audio direction well when given concrete reference points like frequency ranges and texture descriptors. The 10-second duration constraint is respected, though longer clips may show more variation in timbral evolution.

Prompt

Generate a cheerful 8-second jingle for a coffee brand ad. Upbeat acoustic guitar, light percussion, major key. Should feel morning-optimistic without being overly saccharine.

Output

The model would produce a bright acoustic guitar melody in G major, starting with a two-bar fingerpicked intro before adding handclap and shaker percussion on beats 2 and 4. The melodic contour rises in the first four seconds, then resolves with a satisfying descending phrase. The guitar tone is warm and slightly compressed, with natural string resonance. The percussion sits cleanly in the mix without overpowering the melodic content, and the overall energy curve matches the 'morning-optimistic' brief.

Notes

Demonstrates Lyria 3's ability to interpret subjective mood descriptors and translate them into musical choices—major tonality, ascending melodies, acoustic timbres. The model balances commercial music conventions with the specific constraint to avoid excessive sweetness. With a 1M+ token context window, users could provide reference tracks or detailed style guides for more precise control.

Prompt

I need a realistic Foley sound: someone walking on gravel, 6 seconds, steady pace, medium-weight footsteps. Outdoor ambience, slight wind in background.

Output

The model would generate six seconds of rhythmic gravel footsteps, each step producing a distinct crunch with subtle variations in stone displacement and weight distribution. The footfall pattern maintains a consistent 1.2-second interval, suggesting a deliberate walking pace. Low-level wind noise sits beneath the footsteps at around -18dB, providing outdoor context without competing for attention. Each footstep includes the characteristic gravel texture—initial impact, stone shift, and settling—that distinguishes it from other hard surfaces.

Notes

Shows Lyria 3's competence in generating naturalistic Foley with physical accuracy. The model understands material properties (gravel vs. concrete) and can layer environmental context appropriately. However, extended sequences may reveal repetition patterns, as procedural variation in Foley is challenging for generative models to sustain beyond 10-15 seconds without sounding looped.

Use-case deep-dives

Podcast teaser generation

When free audio preview matters more than production polish

A 4-person podcast studio shipping 12 episodes monthly needs 15-second social teasers cut from each episode's transcript and key moments. Lyria 3 Clip Preview handles this at zero cost per clip, which matters when you're generating 180+ previews a month for Instagram, TikTok, and YouTube Shorts. The 1M-token context window means you can feed an entire 90-minute transcript plus timestamp markers in one request and get back audio snippets that match the emotional arc of each segment. Quality won't match a $2/minute premium model, but for social discovery where 80% of clips get under 500 views, the economics flip: you're testing hooks at scale, not producing final masters. If a teaser breaks 10K views, re-cut it with a paid model. Otherwise, Lyria 3 Clip Preview keeps your preview pipeline at zero marginal cost while you focus budget on the main episode mix.

E-learning voiceover drafts

Why this model works for internal training prototypes, not client delivery

A 9-person L&D team building compliance modules for a 200-employee company needs voiceover for 40 slide decks before finalizing scripts with stakeholders. Lyria 3 Clip Preview generates scratch audio from draft narration so reviewers hear pacing and tone issues before locking copy—at zero cost for what's effectively throwaway audio. The massive context window handles full course outlines (15-20 slides per deck) in one pass, maintaining voice consistency across sections without manual stitching. This is a prototyping play: you're not shipping these files to learners, you're using them to get sign-off 2 rounds faster because stakeholders hear the content instead of reading it. Once scripts are final, route to a production model that costs $0.50-$1.50 per module but delivers broadcast clarity. Lyria 3 Clip Preview collapses your review cycle from 6 weeks to 3 by making audio feedback cheap enough to iterate.

Real-time game NPC barks

When zero-latency preview audio beats pre-recorded dialogue trees

A 12-person indie game studio prototyping a narrative RPG needs 300+ NPC voice lines for a vertical slice demo, but hiring voice actors before gameplay is locked burns $8K-$15K on lines that might get cut. Lyria 3 Clip Preview generates placeholder barks from dialogue scripts at zero cost, letting designers test conversation flow and emotional beats in-engine before committing to recording sessions. The 1M-token context means you can feed the entire character bible, relationship graph, and quest context so the model maintains personality across scattered lines—critical when an NPC appears in 8 different scenes with different moods. Audio quality is preview-grade, not shippable, but that's the point: you're validating writing and pacing, not final performance. Once the demo tests well with publishers, route final scripts to a premium voice model or human actors. Lyria 3 Clip Preview de-risks your narrative design by making iteration free until you know what's worth producing.

Frequently asked

Is Google Lyria 3 Clip Preview good for music generation?

Yes, Lyria 3 is Google's latest audio generation model designed specifically for creating music clips. It accepts text, image, and audio inputs with a 1M token context window, letting you generate music from descriptions, reference images, or existing audio samples. The preview version is free, making it accessible for testing music generation workflows before committing to production use.

Is Lyria 3 Clip Preview free to use?

Yes, both input and output are $0.00 per million tokens during the preview period. This makes it significantly cheaper than paid alternatives like ElevenLabs or Stability AI's audio models. Expect pricing to change once Google moves Lyria 3 out of preview — preview models typically transition to paid tiers within 3-6 months of launch.

Can Lyria 3 handle long-form audio generation?

The 1M token context window suggests it can process lengthy prompts and reference audio, but as a clip preview model, output length is likely constrained to shorter segments. Google hasn't published benchmarks on maximum generation duration. If you need full-length songs or extended soundscapes, you'll probably need to stitch multiple clips or wait for a non-preview release.

How does Lyria 3 compare to previous Lyria versions?

Google hasn't released public benchmarks for Lyria 3 yet, so direct quality comparisons are speculative. The jump to version 3 and the addition of image input modality suggest improved conditioning capabilities. The 1M token context is a substantial increase over typical audio models, indicating better handling of complex prompts and longer reference materials than earlier versions.

Should I use Lyria 3 for production audio in apps?

Not yet. Preview models come with no SLA, unpredictable availability, and pricing that will change. Use it for prototyping and testing your audio generation pipeline, but don't ship customer-facing features that depend on it. Once Google releases a stable version with published pricing and performance guarantees, reassess based on your latency and cost requirements versus alternatives.

Data last verified 8 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.