Composition plans

Precise control over music generation with structured JSON

Composition plans provide fine-grained control over music generation. A music_v2 plan is an ordered list of chunks, where each chunk defines a section of the song with its own styles, lyrics, and duration. Use text prompts for quick prototyping and composition plans when you need specific chunk structure, precise lyrics timing, or complex arrangements.

Composition plans and text prompts are mutually exclusive. Use one or the other, not both.

1{
2 "chunks": [
3 {
4 "text": "[Verse 1]\nWoke up today with a feeling inside\nSomething is changing I cannot hide\nThe sun on my face and the wind at my back\nI'm finally ready to get on track",
5 "duration_ms": 16000,
6 "positive_styles": [
7 "upbeat pop",
8 "female vocalist with clear tone",
9 "acoustic guitar and light synths",
10 "gentle and conversational vocals",
11 "light drums in background",
12 "polished production",
13 "120 BPM",
14 "C major"
15 ],
16 "negative_styles": ["dark", "aggressive", "slow tempo", "a cappella"],
17 "context_adherence": "high"
18 },
19 {
20 "text": "[Verse 2]\nUsed to be scared of the world outside\nBuilding up walls where I used to hide\nBut now I see clearly what I need to do\nTake that first step into something new",
21 "duration_ms": 16000,
22 "positive_styles": [
23 "confident vocals",
24 "fuller guitar strumming",
25 "steady drum beat",
26 "bass joins in"
27 ],
28 "negative_styles": ["a cappella", "sparse", "quiet"],
29 "context_adherence": "high"
30 },
31 {
32 "text": "[Pre-Chorus]\nNo more waiting for tomorrow\nThis is my time now",
33 "duration_ms": 8000,
34 "positive_styles": [
35 "building intensity",
36 "rising synth melody",
37 "driving drums",
38 "full band playing"
39 ],
40 "negative_styles": ["a cappella", "dropping out"],
41 "context_adherence": "high"
42 },
43 {
44 "text": "[Chorus]\nI'm breaking through\nNothing's gonna stop me now\nI'm breaking through\nFinally found out how",
45 "duration_ms": 16000,
46 "positive_styles": [
47 "powerful and anthemic vocals",
48 "full band at maximum energy",
49 "punchy drums and bass",
50 "layered synths and guitar"
51 ],
52 "negative_styles": ["a cappella", "minimal", "stripped back"],
53 "context_adherence": "high"
54 },
55 {
56 "text": "[Outro]",
57 "duration_ms": 8000,
58 "positive_styles": [
59 "instrumental fade out",
60 "guitar melody repeating",
61 "drums softening",
62 "gentle ending"
63 ],
64 "negative_styles": ["vocals", "abrupt ending", "building"],
65 "context_adherence": "high"
66 }
67 ]
68}

Chunk-based composition plans require the music_v2 model. Pass model_id="music_v2" when composing.

Quickstart

Follow the music quickstart to set up your API key and install the SDK, then use composition plans for more control.

1

Generate music with a composition plan

1import os
2from dotenv import load_dotenv
3from elevenlabs.client import ElevenLabs
4
5load_dotenv()
6elevenlabs = ElevenLabs(api_key=os.environ.get("ELEVENLABS_API_KEY"))
7
8composition_plan = {
9 "chunks": [
10 {
11 "text": "[Verse]\nWalking down an empty street\nWondering who I'll meet",
12 "duration_ms": 15000,
13 "positive_styles": ["pop", "upbeat", "female vocals", "soft vocals", "acoustic guitar"],
14 "negative_styles": ["dark", "slow"],
15 "context_adherence": "high"
16 },
17 {
18 "text": "[Chorus]\nThis is my moment\nI won't let it go",
19 "duration_ms": 15000,
20 "positive_styles": ["powerful vocals", "full band"],
21 "negative_styles": [],
22 "context_adherence": "high"
23 }
24 ]
25}
26
27audio = elevenlabs.music.compose(
28 composition_plan=composition_plan,
29 model_id="music_v2",
30 # with_timestamps=True, # Optional: return word-level timestamps
31)
32
33with open("output.mp3", "wb") as f:
34 for chunk in audio:
35 f.write(chunk)
2

Generate a plan from a prompt

Generate a composition plan from a text description, then modify it before generating:

1plan = elevenlabs.music.composition_plan.create(
2 prompt="An upbeat pop song about summer adventures",
3 music_length_ms=60000,
4 model_id="music_v2"
5)
6
7# Modify the generated plan
8plan["chunks"][0]["text"] = "[Verse 1]\nCustom lyrics here"
9
10audio = elevenlabs.music.compose(composition_plan=plan, model_id="music_v2")

Structure reference

Chunks

A composition plan is an ordered list of up to 30 chunks. Each chunk generates one section of the song from its text and styles. The first chunk is the most important: its styles set the overall tone and genre for the whole song.

FieldTypeDescription
textstringSection name in square brackets ([Verse 1]), lyrics lines, and inline directions in braces ({scratching}).
duration_msnumberLength in milliseconds (3,000 - 120,000).
positive_stylesarrayStyles and directions to include (max 50).
negative_stylesarrayStyles and directions to avoid (max 50). Defaults to empty.
context_adherencestringlow, medium, or high (default). How closely the chunk follows its surrounding chunks.

A song can have up to 30 chunks. Total duration must be between 3 seconds and 10 minutes, with each chunk between 3 and 120 seconds.

The styles for the first chunk are the most important as they set the overall tone and genre. Aim for at least 6-7 styles in early chunks until the direction is established. Generic styles like “great production quality” are good defaults to append to the list.

Chunks can also reference audio from a stored song to keep existing sections unchanged or condition new audio on them. See music inpainting for editing and combining existing songs.

Writing lyrics

The text field combines the section name, lyrics, and inline directions:

  • Section name in square brackets: [Verse 1], [Chorus], [Bridge]
  • Lyrics as plain text, with each line separated by a line break (\n)
  • Phonetic sounds in parentheses: (hmmm hmmm), (ooh), (yeah)
  • Inline directions in curly braces: {guitar solo}, {scratching}, {instrumental break}

Use curly braces for short, inline cues. For broader characteristics that apply to the whole chunk — genre, instrumentation, or overall vocal style — use positive_styles instead.

1{
2 "text": "[Verse]\n(soft female vocals) I've been waiting\n(instrumental break)\nfor you"
3}

In the corrected example, the overall vocal style moves to positive_styles, while the short inline cue stays in text using curly braces instead of parentheses.

Style tips

Be specific with style descriptors:

1{
2 "positive_styles": [
3 "warm acoustic guitar with light fingerpicking",
4 "soft female vocals with intimate delivery",
5 "gentle percussion with brushed snare",
6 "80 BPM"
7 ]
8}

Use negative styles liberally to prevent unwanted sounds. Styles must be in English (lyrics can be any language).

If you include copyrighted content in styles, the API returns a bad_composition_plan error with a suggested alternative. See handling copyrighted material.

Examples

Cinematic instrumental

1{
2 "chunks": [
3 {
4 "text": "[Tension Build]",
5 "duration_ms": 15000,
6 "positive_styles": [
7 "cinematic",
8 "orchestral",
9 "epic",
10 "low strings tremolo",
11 "building intensity",
12 "80 BPM",
13 "D minor"
14 ],
15 "negative_styles": ["vocals", "lyrics", "pop", "electronic", "bright"],
16 "context_adherence": "high"
17 },
18 {
19 "text": "[Climax]",
20 "duration_ms": 15000,
21 "positive_styles": ["full orchestra", "brass fanfare", "triumphant"],
22 "negative_styles": ["quiet", "vocals"],
23 "context_adherence": "high"
24 },
25 {
26 "text": "[Resolution]",
27 "duration_ms": 10000,
28 "positive_styles": ["gentle strings", "piano melody", "fading out"],
29 "negative_styles": ["intense", "vocals"],
30 "context_adherence": "high"
31 }
32 ]
33}
1{
2 "chunks": [
3 {
4 "text": "[Intro]",
5 "duration_ms": 5000,
6 "positive_styles": [
7 "upbeat",
8 "modern pop",
9 "energetic",
10 "120 BPM",
11 "instrumental",
12 "catchy hook"
13 ],
14 "negative_styles": ["sad", "slow", "dark", "vocals"],
15 "context_adherence": "high"
16 },
17 {
18 "text": "[Voiceover]\nIntroducing the future of productivity\nWork smarter, not harder",
19 "duration_ms": 10000,
20 "positive_styles": ["spoken voiceover", "confident male voice", "background music"],
21 "negative_styles": ["singing"],
22 "context_adherence": "high"
23 },
24 {
25 "text": "[Outro]",
26 "duration_ms": 5000,
27 "positive_styles": ["musical sting", "memorable"],
28 "negative_styles": ["vocals"],
29 "context_adherence": "high"
30 }
31 ]
32}

Next steps