Google Lyria: The Essentials

Last updated: April 21, 2026

Covers Google Lyria 2 (model_lyria-2), Google Lyria 3 Clip (model_lyria-3-clip), and Google Lyria 3 Pro (model_lyria-3-pro)

asset_8A4Z5oH8WSvdtE9531jD58om_A clean, minimalist banner for 'Google Lyria_ The Essentials', rendered in a modern 3D art style with soft, diffused lighting. The scene is a top-down view of a light-colored, unclutt.png

The Google Lyria family generates original music from text descriptions. Lyria 2 is the first-generation model, producing 30-second instrumental clips. Lyria 3 brings significant advances: Clip delivers polished 30-second pieces with more expressive prompting, while Pro generates complete, structured songs of up to approximately 2 minutes. All three models embed Google's SynthID audio watermark in their output.

Which Model Should I Use?

			ModelIDOutput lengthBest for
Lyria 3 Pro	`model_lyria-3-pro`	Up to ~2 minutes, MP3 or WAV	Full songs with verse/chorus structure, film scores, complete background tracks
Lyria 3 Clip	`model_lyria-3-clip`	30 seconds, MP3	Ad stings, UI moments, game events, social content, rapid iteration
Lyria 2	`model_lyria-2`	30 seconds, 48kHz stereo	Instrumental clips when negative prompt control is needed (only model with negativePrompt)

For most new projects, start with Lyria 3 Pro. Use Lyria 3 Clip when you only need a short clip and want fast iteration.

Parameters

Lyria 3 Pro and Lyria 3 Clip

Both models share the same two inputs and differ only in output length: Pro generates up to approximately 2 minutes, Clip generates 30 seconds.

The Prompt describes the music you want. Include genre, instruments, tempo in BPM, key, mood, and production style. For Lyria 3 Pro, you can embed structure tags like [Verse], [Chorus], and [Outro] directly in the prompt to guide how the song is arranged. The prompt accepts up to 5,000 characters and is optional, though leaving it empty produces unpredictable results.

Reference Images let you pass up to 10 Scenario asset IDs as visual mood references. The model reads the tone, era, and atmosphere of the images and reflects them in the music. Images work alongside the text prompt, not instead of it.

Lyria 2

The Prompt is required for Lyria 2 and accepts up to 2,048 characters. Describe the genre, instruments, mood, tempo, and production style. Lyria 2 generates instrumental music only, so vocal requests in the prompt are ignored.

Negative Prompt is unique to Lyria 2 among the Lyria family. Use it to explicitly exclude elements from the output, such as "drums", "vocals", or "distorted guitar". This is the only Lyria model that supports this control.

Seed is optional. Set it to any integer to lock the random seed for reproducible output.

Lyria 2 is instrumental only. The model does not generate vocals regardless of what the prompt requests. If you need a track with vocals or lyrics, use Lyria 3 Pro.

Structure Tags for Lyria 3 Pro

Lyria 3 Pro supports structural tags embedded directly in the prompt. These tags guide the model in arranging the song into distinct sections, producing a more composed and intentional output.

	TagWhat it signals
`[Intro]`	Opening section, typically sets mood before the main theme
`[Verse]`	Main narrative section, moderate energy
`[Chorus]`	High-energy hook, the emotional peak of the song
`[Bridge]`	Contrasting section that provides variation before the final chorus
`[Outro]`	Closing section, typically resolves or fades the track

Example prompt with structure tags (Lyria 3 Pro):
"Indie folk, acoustic guitar and warm vocal harmonies, 90 BPM, key of G major, bittersweet mood.
[Intro] Gentle fingerpicking sets the scene.
[Verse] Soft lead vocal over sparse guitar.
[Chorus] Full harmony vocals with strummed chords and light percussion.
[Bridge] Stripped back to just guitar and a single voice.
[Outro] Fade on the chorus melody."

For Lyria 3 Clip, structure tags are less impactful given the 30-second length, but including a single mood or energy descriptor (e.g. [Chorus] to request a high-energy segment) can shape the result.

Image-to-Music

Both Lyria 3 models accept up to 10 reference images alongside the text prompt. The model analyzes the visual content to infer the sonic mood, era, and atmosphere. A photograph of a rainy city at night might yield dark ambient textures. A vibrant festival scene might produce energetic percussion and brass. Images work best as mood references, not as literal instruction.

Images must be provided as Scenario asset IDs. You can generate images with any image model first, then pass the resulting asset IDs into a Lyria 3 job. Mixing text and image inputs is supported: the text prompt and the images are weighted together, so a strong text prompt with one reference image will lean heavily on the text.

Example: using a generated image as mood reference
1. Generate a concept art image: dark forest, moonlight, mist
2. Pass the asset ID to Lyria 3 Pro with a minimal text prompt:
   prompt: "Orchestral, cinematic, slow build, 60 BPM"
   images: ["asset_xxx"]

Prompt Format

All three Lyria models respond to the same prompt elements. A well-formed Lyria prompt includes most or all of the following components:

	ComponentExamples
Genre / style	lo-fi hip hop, baroque, Afrobeats, heavy metal, bossa nova
Specific instruments	nylon string guitar, upright bass, Rhodes piano, shakuhachi
Tempo	80 BPM, slow, driving 140 BPM
Key	D minor, F major, E Dorian
Mood	melancholic, triumphant, tense, dreamy
Production style	analog warmth, studio polished, raw live recording feel
Structure tags (Pro only)	[Intro], [Verse], [Chorus], [Bridge], [Outro]

Use Cases

Game background music: Use Lyria 3 Pro for ambient loops and level themes. Structure tags let you compose tracks that build from a quiet intro to an active mid-section, suitable for exploration-to-combat transitions when edited in post.
Short-form video and social content: Lyria 3 Clip delivers a polished 30-second piece quickly, ideal for Reels, TikTok, and YouTube Shorts where a brief, mood-matched soundtrack is needed.
Film and video scoring: Lyria 3 Pro can generate a complete cue for a short scene. Combine image references from the scene with a structured text prompt to align the music to the visual mood.
Advertising and brand content: Generate tracks that match a specific brand tone. Lyria 3 Clip is well-suited for short ad stings and product video soundtracks.
UI and game audio events: Short clips (Lyria 3 Clip or Lyria 2) work well for menu music, achievement sounds, and transition cues where duration is fixed and mood is specific.
Creative prototyping: Use Lyria 3 Clip to quickly test genre and mood combinations before committing to a longer Lyria 3 Pro generation.

Tips for Better Results

Name instruments specifically, not just genres. "Nylon string guitar, upright bass, and brushed snare" produces more accurate results than "jazz." Genre labels set the overall envelope; instrument names define the texture.
Include tempo in BPM for rhythm-driven music. "120 BPM" is more reliable than "fast." The model responds better to concrete tempo values when the rhythmic character of the output matters.
Use structure tags with Pro for arranged songs. Without tags, Lyria 3 Pro generates a coherent but less intentionally structured piece. Adding [Verse], [Chorus], and [Outro] produces tracks that feel composed rather than continuous.
Avoid contradictory style instructions. Combining incompatible elements (e.g. "aggressive thrash metal, calm and relaxing") produces inconsistent results. Choose one dominant mood and build around it.
Use Lyria 2's negativePrompt to remove specific elements. If a prompt consistently adds unwanted content (e.g. drums appear when you want pure piano), set negativePrompt to exclude them explicitly. This is the only Lyria model with this control.
Keep image references thematically consistent. When passing multiple images to Lyria 3, images with similar mood and color palette produce more coherent audio than a mixed set of references.
Simplify before iterating. If a complex prompt produces unexpected results, strip it down to the core genre and mood, confirm the base output is correct, then add detail progressively.

Known Limitations

Lyria 2 generates instrumental music only. Vocals are not supported regardless of prompt content. Use Lyria 3 Pro if vocal output is required.
Lyria 3 Pro output length is approximate. The model targets up to ~2 minutes but the actual duration varies by prompt and structure. Exact timing cannot be specified.
No negativePrompt on Lyria 3 models. Unlike Lyria 2, neither Lyria 3 Pro nor Lyria 3 Clip supports a negativePrompt parameter. Unwanted elements must be steered away from using the main prompt only (e.g. "no drums", "fully instrumental" in the prompt text).
No seed control on Lyria 3. Lyria 3 Pro and Clip do not expose a seed parameter, so results cannot be made strictly reproducible. Each run with the same prompt may produce a different output.
SynthID watermark is always present. All Lyria outputs embed Google's SynthID inaudible audio watermark. This cannot be disabled.
Genre coverage is uneven. Broadly popular genres (pop, jazz, classical, electronic) produce more reliable results than highly regional or niche styles. Complex fusion prompts may reduce coherence.