ElevenLabs Music: The Essentials
Last updated: April 20, 2026
Covers ElevenLabs Music (model_elevenlabs-music) and ElevenLabs Music Advanced (model_elevenlabs-music-advanced)

The ElevenLabs Music family generates original music from text descriptions. The standard model takes a single prompt and delivers a complete track in seconds. The Advanced model lets you define the song's internal structure section by section, giving you precise control over how a composition builds and changes over time. Both models output studio-ready audio in MP3 or Opus format.
Which Model Should I Use?
ModelID | Input | Best for | |
Simple |
| Text prompt, duration, vocal/instrumental toggle | Quick tracks, background music, game audio, rapid iteration |
Structured |
| Global styles, per-section styles and narrative lines, up to 20 sections | Full songs with distinct acts, cinematic scores, tracks where the mood must shift at a specific moment |
Use ElevenLabs Music when you need a track fast and the overall feel is what matters. Switch to ElevenLabs Music Advanced when the song needs to tell a story or when you require deliberate transitions between energy levels, moods, or musical phases.
Parameters
ElevenLabs Music
The only required input is the Prompt, a text description of the music you want, up to 2048 characters. Describe the genre, instruments, mood, tempo, and any other qualities that matter. The more specific the prompt, the closer the result will be to your intent.
Duration sets how long the track should be, from 3 seconds to 3 minutes. The default is 30 seconds. Longer tracks give the model room to develop themes and create a proper beginning, middle, and end. Cost scales with duration.
Enable Instrumental Mode to remove all vocals from the output. This is the recommended approach for background music, game audio, and any context where lyrics would be disruptive. Even when your prompt does not mention vocals, the model may add them if the genre typically includes singing, so this toggle is the only reliable way to guarantee an instrumental result.
Output Format controls the codec, sample rate, and bitrate of the file. The default is MP3 at 44.1 kHz and 128 kbps, which is a good balance of quality and size. See the format table below for all available options.
Seed is optional. Set it to any integer to lock the random seed. The same prompt and seed will produce the same output every time, which is useful when iterating on a result you want to refine.
ElevenLabs Music Advanced
The Advanced model requires two inputs: Global Styles and Sections.
Global Styles is a list of style tags that apply to the entire song, with a maximum of 10 tags. The default is "pop". Use tags to define the overall genre, tempo, instrumentation, and mood of the track. Pair it with Excluded Styles to steer the model away from unwanted elements that might otherwise bleed into the output.
Sections is an ordered list of song phases, with a maximum of 20. Each section has:
Section Name (required): a label such as "intro", "verse", "chorus", "bridge", or "outro".
Duration (required): the target length for this section in milliseconds. The model uses this as a guide and may vary the actual length slightly at transitions.
Section Styles (optional): style tags for this section only, such as "quiet", "building tension", or "full band". These refine or shift the global styles for this segment without affecting the rest of the song.
Excluded Section Styles (optional): style tags to suppress in this section only.
Lyrics (optional): the lyrics for this section. Each entry in the list is one line of singable text. Leave this empty for instrumental sections like intros and outros.
Output Format and Seed work the same as in ElevenLabs Music.
Output Format Options
The default is MP3 at 44.1 kHz and 128 kbps, a solid balance between file size and quality. For final deliverables, use MP3 at 192 kbps or Opus at 48 kHz / 192 kbps for the highest available quality, best for archiving and professional use. For the smallest possible file, the 22 kHz / 32 kbps MP3 option works but at a noticeable quality cost. Additional MP3 and Opus variants at lower bitrates are available for bandwidth-constrained use cases.
How ElevenLabs Music Works
ElevenLabs Music takes your text description and generates a complete audio track from scratch. The model interprets genre, instrumentation, mood, tempo, and any other qualities you describe, combining them into a coherent musical output. The more specific your prompt, the more accurately the result matches your intent.
Duration is controlled with musicLengthMs. Shorter tracks (under 30 seconds) tend to be complete musical ideas. Longer tracks (60 to 180 seconds) give the model room to introduce variation, develop themes, and provide a proper beginning, middle, and end. CU cost scales with duration at roughly 2.5 CU per second of output.
The forceInstrumental flag removes all vocals from the output. It works reliably and is the recommended approach for any background music, game audio, or ambient use case where lyrics would be intrusive.
Prompt example (instrumental):
"Retro 80s synthwave with pulsing arpeggios, gated reverb drums, neon-soaked atmosphere"
musicLengthMs: 45000
forceInstrumental: true
outputFormat: "mp3_44100_192"
How ElevenLabs Music Advanced Works
ElevenLabs Music Advanced structures a song as an ordered list of named sections. Each section has its own duration, local style tags, and narrative description. The model reads all sections together and composes a track where each phase transitions naturally into the next, while staying consistent with the global style tags applied to the whole song.
Global styles set the sonic identity for the entire track. Local styles refine or shift the character within a single section. This layered approach lets you build a song that starts quietly and ends with a full-band climax, or a cinematic score that moves through tension, action, and resolution without sounding like three separate tracks edited together.
The lines field inside each section is where you provide the actual lyrics for that part of the song. Each string in the array is one line of singable text. If you leave lines empty for a section, the model will generate vocals without fixed lyrics or produce an instrumental passage depending on the style tags.
Sections example (film score):
positiveGlobalStyles: ["cinematic", "orchestral", "epic"]
negativeGlobalStyles: ["electronic"]
sections: [
{
sectionName: "tension buildup",
durationMs: 15000,
positiveLocalStyles: ["suspenseful", "low strings", "mounting dread"],
lines: ["Slow string ostinato builds with rising dissonance"]
},
{
sectionName: "battle",
durationMs: 25000,
positiveLocalStyles: ["explosive", "full orchestra", "relentless"],
lines: ["Full orchestra erupts into fierce battle music with pounding drums"]
},
{
sectionName: "resolution",
durationMs: 10000,
positiveLocalStyles: ["victorious", "warm", "resolving"],
lines: ["Triumphant resolution with warm brass and final chord"]
}
]Use Cases
Game audio: Generate background music for menus, levels, and cutscenes. Use
forceInstrumental: trueand match the genre to the game's visual tone. Use the Advanced model to create tracks that shift from ambient exploration to intense action within a single asset.Video and film scoring: Score trailers, short films, and video ads. The Advanced model excels here, letting you sync musical sections to visual beats (quiet opening, rising action, climax, resolution) without needing a digital audio workstation.
Marketing and social content: Create branded background tracks that match campaign visuals. Quick iteration with the standard model allows testing different genres and moods before committing to a final direction.
Podcast and video production: Generate intros, outros, and transition stings. Short tracks (3 to 10 seconds) work well as bumpers. Longer tracks (60 to 90 seconds) serve as episode background music.
Prototyping and creative exploration: Use the standard model to explore a large space of genres and styles quickly. Once a direction is confirmed, switch to the Advanced model to build a polished, structured version.
Education and e-learning: Generate calm, non-distracting instrumental backgrounds for explainer videos, course content, and presentations. A 90 to 120-second loop at low bitrate keeps file sizes manageable.
Tips for Better Results
Be specific about instruments. Prompts that name specific instruments ("nylon string guitar", "upright bass", "brushed snare") produce more accurate results than genre labels alone. Genre tags set the overall feel, but instrument names define the texture.
Always set forceInstrumental for background tracks. Even when your prompt does not mention vocals, the model may add them if the described genre typically includes singing. Setting
forceInstrumental: trueis the only reliable way to guarantee an instrumental output.Use seed for reproducibility. If you generate a result you want to iterate on, note the seed value and reuse it with modified parameters. Changing only the duration or format with the same seed gives you a consistent base to work from.
For the Advanced model, keep local styles complementary to global ones. Local style tags work best when they refine, rather than contradict, the global styles. If your global style is "orchestral", a local tag of "electronic drop" will produce conflicting output. Use local tags to shift intensity or instrumentation within the same sonic universe.
Use the lines field for actual lyrics. In the Advanced model, the
linesarray inside each section is the lyrics field. Each string is one line of singable text the model will use as vocal content. Leave it empty on instrumental sections such as intros and outros.Scale duration to the number of sections. A two-section song with each section at 60 seconds gives the model enough time to develop each phase. Very short sections (under 5 seconds) may not render distinctly. Aim for at least 8 to 10 seconds per section for audible transitions.
Use opus_48000_192 for final deliverables. The default MP3 format is fine for drafts and iteration. For final assets going into productions, switch to
opus_48000_192for the highest available quality. The file size increase is modest compared to the quality gain.
Known Limitations
Maximum duration is 3 minutes (180 seconds). The Scenario implementation caps
musicLengthMsat 180,000 ms. The ElevenLabs provider API supports up to 600 seconds, but this extended range is not available through Scenario.Section durations are approximate. The
durationMsvalue in each section of the Advanced model is a guide, not a precise boundary. The model may extend or shorten a section slightly to produce a natural transition. Hard time-sync against external video requires post-production trim.No PCM output format. The ElevenLabs API supports PCM audio output (raw waveform). This format is not exposed in the Scenario implementation. Use
opus_48000_192as the highest-quality alternative.Music Finetunes not supported. The ElevenLabs API includes the ability to fine-tune music generation on custom audio. This feature is not available through Scenario.
No store-for-inpainting option. The provider API supports a flag to store audio for future inpainting (editing a specific segment of a generated track). This parameter is not exposed in Scenario.
Vocal content in instrumental outputs at low forceInstrumental precedence. In rare cases where the global or local style strongly implies a vocal genre, faint vocal artifacts may appear even with
forceInstrumental: true. Re-generating with a more strongly instrumental style description (e.g. adding "no vocals", "fully instrumental" to the prompt or style tags) reduces the occurrence.CU cost scales with total audio duration. For the Advanced model, CU cost is determined by the sum of all section durations. A five-section track totaling 90 seconds costs the same as a single 90-second track in the standard model. Budget accordingly when planning large batches.