Music inpainting | ElevenLabs Documentation

Music inpainting with the music_v2 model lets you modify specific parts of a song while keeping the rest intact. Store a generated song, then reference its parts in a composition plan to keep them unchanged, regenerate them, or condition new audio on the original.

How it works

A music_v2 composition plan is an ordered list of chunks. Each chunk is one of two types:

Generation chunk — generates new audio from text and styles. Use it to regenerate a section or add new material.
Audio reference chunk — inserts a slice of a stored song unchanged. Use it to keep a section of an existing song exactly as it is.

Quickstart

Store a song for inpainting

You can store a song for inpainting in two ways: generate a new song with store_for_inpainting, or upload an existing audio file.

Generate and store

Upload existing file

1 import os
2 from dotenv import load_dotenv
3 from elevenlabs.client import ElevenLabs
4 
5 load_dotenv()
6 elevenlabs = ElevenLabs(api_key=os.environ.get("ELEVENLABS_API_KEY"))
7 
8 # Generate a song and store it for later inpainting
9 response = elevenlabs.music.compose_detailed(
10     prompt="An upbeat pop song with verse and chorus",
11     music_length_ms=60000,
12     model_id="music_v2",
13     store_for_inpainting=True
14 )
15 song_id = response.song_id
16 
17 # Save the audio
18 with open("original.mp3", "wb") as f:
19     f.write(response.audio)

Keep and regenerate chunks

Build a plan that mixes audio reference chunks (kept) with generation chunks (regenerated), then pass it to compose with model_id="music_v2":

1 # Keep the first 30 seconds, regenerate the rest with a new style
2 composition_plan = {
3     "chunks": [
4         # Keep the original first 30 seconds unchanged
5         {
6             "song_id": song_id,
7             "range": {"start_ms": 0, "end_ms": 30000}
8         },
9         # Regenerate the chorus with a new style
10         {
11             "text": "[Chorus]\nWe're rising up tonight\nNothing can stop us now",
12             "duration_ms": 30000,
13             "positive_styles": ["bigger drums", "layered vocals", "anthemic"],
14             "negative_styles": ["sparse", "minimal"],
15             "context_adherence": "high"
16         }
17     ]
18 }
19 
20 audio = elevenlabs.music.compose(
21     composition_plan=composition_plan,
22     model_id="music_v2",
23 )
24 
25 with open("edited.mp3", "wb") as f:
26     for chunk in audio:
27         f.write(chunk)

Conditioning

A generation chunk can be conditioned on a slice of stored audio with conditioning_ref. The model regenerates the chunk while staying close to the reference’s musical characteristics. Control how tightly the chunk follows the reference with condition_strength (low, medium, high, or xhigh).

1 {
2   "text": "[Chorus]\nThis is my moment\nI won't let it go",
3   "duration_ms": 15000,
4   "positive_styles": ["powerful vocals", "full band", "anthemic"],
5   "negative_styles": [],
6   "context_adherence": "high",
7   "conditioning_ref": {
8     "song_id": "vVtPM1Sas70E2LIhQFch",
9     "range": { "start_ms": 30000, "end_ms": 45000 }
10   },
11   "condition_strength": "high"
12 }

The first chunk influences the generation of all subsequent chunks. To condition the entire song on a reference, apply conditioning_ref starting from the first chunk.

A conditioning reference can be at most 30 seconds (30,000ms) long.

Context adherence

Each generation chunk has a context_adherence level that controls how closely it follows its neighboring chunks:

high (default) — stays consistent with surrounding chunks. Use it for smooth transitions between kept and regenerated audio.
medium — balances consistency with creative freedom.
low — lets the chunk deviate from its context and be more creative.

Examples

Edit a single section

Generate a movie trailer, then regenerate just the outro with different lyrics.

Generate the original

1 composition_plan = {
2     "chunks": [
3         {
4             "text": "[Intro]\nIn a world beyond code\nWhere sound becomes life",
5             "duration_ms": 15000,
6             "positive_styles": ["cinematic", "epic", "orchestral", "low strings", "suspenseful"],
7             "negative_styles": ["acoustic", "pop", "minimalistic"],
8             "context_adherence": "high"
9         },
10         {
11             "text": "[Build]\nTechnology awakens the future\nShaping every word into power",
12             "duration_ms": 20000,
13             "positive_styles": ["rising brass", "full orchestra", "epic"],
14             "negative_styles": ["acoustic", "pop"],
15             "context_adherence": "high"
16         },
17         {
18             "text": "[Bridge]\n(ah ah ah ah)",
19             "duration_ms": 15000,
20             "positive_styles": ["ethereal choir", "crescendo"],
21             "negative_styles": [],
22             "context_adherence": "high"
23         },
24         {
25             "text": "[Outro]\nThe voice of tomorrow, unleashed\nElevenLabs",
26             "duration_ms": 10000,
27             "positive_styles": ["deep narration", "epic finale"],
28             "negative_styles": [],
29             "context_adherence": "high"
30         }
31     ]
32 }
33 
34 response = elevenlabs.music.compose_detailed(
35     composition_plan=composition_plan,
36     model_id="music_v2",
37     store_for_inpainting=True
38 )
39 song_id = response.song_id

Edit just the outro

Keep the first three sections (seconds 0–50) with a single audio reference chunk, and regenerate the outro:

1 edited_plan = {
2     "chunks": [
3         # Keep the intro, build, and bridge unchanged
4         {
5             "song_id": song_id,
6             "range": {"start_ms": 0, "end_ms": 50000}
7         },
8         # Regenerate the outro with new lyrics
9         {
10             "text": "[Outro]\nThe future has arrived\nElevenLabs",
11             "duration_ms": 10000,
12             "positive_styles": ["deep narration", "epic finale"],
13             "negative_styles": [],
14             "context_adherence": "high"
15         }
16     ]
17 }
18 
19 audio = elevenlabs.music.compose(composition_plan=edited_plan, model_id="music_v2")

Extend a song

Add a new intro and outro to an existing song.

Generate the original

1 response = elevenlabs.music.compose_detailed(
2     prompt="Berlin night club techno",
3     music_length_ms=60000,
4     model_id="music_v2",
5     store_for_inpainting=True
6 )
7 song_id = response.song_id

Extend with new intro and outro

Wrap a kept slice of the original between two new generation chunks:

1 extend_plan = {
2     "chunks": [
3         # New intro
4         {
5             "text": "[Intro]",
6             "duration_ms": 30000,
7             "positive_styles": ["techno", "building tension", "filtered synths"],
8             "negative_styles": [],
9             "context_adherence": "high"
10         },
11         # Keep the core of the original (seconds 10-50)
12         {
13             "song_id": song_id,
14             "range": {"start_ms": 10000, "end_ms": 50000}
15         },
16         # New outro
17         {
18             "text": "[Outro]",
19             "duration_ms": 30000,
20             "positive_styles": ["techno", "fading out", "sparse"],
21             "negative_styles": [],
22             "context_adherence": "high"
23         }
24     ]
25 }
26 
27 audio = elevenlabs.music.compose(composition_plan=extend_plan, model_id="music_v2")

Create a seamless loop

Generate a musical phrase and create a loop using a “glue” chunk that bridges the same slice repeated twice.

Generate a short clip

1 composition_plan = {
2     "chunks": [
3         {
4             "text": "[Solo Acoustic]",
5             "duration_ms": 10000,
6             "positive_styles": ["acoustic guitar", "fingerpicking", "warm tone", "soft dynamics"],
7             "negative_styles": ["electric", "drums", "electronic"],
8             "context_adherence": "high"
9         }
10     ]
11 }
12 
13 response = elevenlabs.music.compose_detailed(
14     composition_plan=composition_plan,
15     model_id="music_v2",
16     store_for_inpainting=True
17 )
18 song_id = response.song_id

Create a loop with a glue chunk

1 loop_plan = {
2     "chunks": [
3         # Loop start - keep a slice of the original
4         {
5             "song_id": song_id,
6             "range": {"start_ms": 3000, "end_ms": 8000}
7         },
8         # Glue - generate a smooth transition between the two slices
9         {
10             "text": "[Glue]",
11             "duration_ms": 3000,
12             "positive_styles": ["acoustic guitar", "fingerpicking", "smooth transition"],
13             "negative_styles": [],
14             "context_adherence": "high"
15         },
16         # Loop end - keep the same slice again
17         {
18             "song_id": song_id,
19             "range": {"start_ms": 3000, "end_ms": 8000}
20         }
21     ]
22 }
23 
24 audio = elevenlabs.music.compose(composition_plan=loop_plan, model_id="music_v2")

Generate a similar song

Condition a brand-new song on a short slice of an existing one to carry over its musical characteristics. Because the first chunk influences every chunk that follows, applying conditioning_ref to the first chunk shapes the entire generation, even though only that chunk references the stored audio.

Generate the original

1 response = elevenlabs.music.compose_detailed(
2     prompt="An upbeat pop song with bright synths and driving drums",
3     music_length_ms=60000,
4     model_id="music_v2",
5     store_for_inpainting=True
6 )
7 song_id = response.song_id

Generate a new song conditioned on the original

1 similar_plan = {
2     "chunks": [
3         # The first chunk conditions the whole song on the reference
4         {
5             "text": "[Verse]\nSalt on my skin from a borrowed sea\nCounting the heartbeats it takes to break free",
6             "duration_ms": 30000,
7             "positive_styles": ["pop", "energetic", "bright synths", "driving drums"],
8             "negative_styles": ["sparse", "minimal"],
9             "context_adherence": "high",
10             "conditioning_ref": {
11                 "song_id": song_id,
12                 "range": {"start_ms": 0, "end_ms": 10000}
13             },
14             "condition_strength": "high"
15         },
16         {
17             "text": "[Chorus]\nWe're rising up tonight\nNothing can stop us now",
18             "duration_ms": 30000,
19             "positive_styles": ["bigger drums", "layered vocals", "anthemic"],
20             "negative_styles": ["sparse", "minimal"],
21             "context_adherence": "high"
22         }
23     ]
24 }
25 
26 audio = elevenlabs.music.compose(composition_plan=similar_plan, model_id="music_v2")

Chunk reference

A music_v2 plan contains up to 30 chunks. Each chunk is either a generation chunk or an audio reference chunk.

Generation chunk

Field	Type	Description
`text`	string	Section name in square brackets (`[Verse 1]`), lyrics lines, and inline directions in braces (`{scratching}`).
`duration_ms`	number	Length in milliseconds (3,000 - 120,000).
`positive_styles`	array	Styles and directions to include (max 50).
`negative_styles`	array	Styles and directions to avoid (max 50). Defaults to empty.
`context_adherence`	string	`low`, `medium`, or `high` (default). How closely the chunk follows its surrounding chunks.
`conditioning_ref`	object \| null	Optional `{ song_id, range }` slice of stored audio to condition on. Defaults to `null`.
`condition_strength`	string \| null	`low`, `medium` (default), `high`, or `xhigh`. How strongly the chunk follows the conditioning reference.

The styles for the first chunk are the most important as they set the overall tone and genre. Aim for at least 6-7 styles in early chunks until the direction is established.

Audio reference chunk

Field	Type	Description
`song_id`	string	ID of the stored song to source audio from.
`range`	object	`{ start_ms, end_ms }` slice of the stored song to insert unchanged.

Constraints

Constraint	Value
Maximum chunks per plan	30
Minimum chunk duration	3 seconds (3,000ms)
Maximum chunk duration	2 minutes (120,000ms)
Maximum conditioning reference	30 seconds (30,000ms)
Minimum time range	50ms

Next steps

Composition plans

Learn composition plan fundamentals

API reference

Complete API documentation