Music inpainting

Edit and combine sections of existing songs

Music inpainting with the music_v2 model lets you modify specific parts of a song while keeping the rest intact. Store a generated song, then reference its parts in a composition plan to keep them unchanged, regenerate them, or condition new audio on the original.

How it works

A music_v2 composition plan is an ordered list of chunks. Each chunk is one of two types:

  • Generation chunk — generates new audio from text and styles. Use it to regenerate a section or add new material.
  • Audio reference chunk — inserts a slice of a stored song unchanged. Use it to keep a section of an existing song exactly as it is.

Quickstart

1

Store a song for inpainting

You can store a song for inpainting in two ways: generate a new song with store_for_inpainting, or upload an existing audio file.

1import os
2from dotenv import load_dotenv
3from elevenlabs.client import ElevenLabs
4
5load_dotenv()
6elevenlabs = ElevenLabs(api_key=os.environ.get("ELEVENLABS_API_KEY"))
7
8# Generate a song and store it for later inpainting
9response = elevenlabs.music.compose_detailed(
10 prompt="An upbeat pop song with verse and chorus",
11 music_length_ms=60000,
12 model_id="music_v2",
13 store_for_inpainting=True
14)
15song_id = response.song_id
16
17# Save the audio
18with open("original.mp3", "wb") as f:
19 f.write(response.audio)
2

Keep and regenerate chunks

Build a plan that mixes audio reference chunks (kept) with generation chunks (regenerated), then pass it to compose with model_id="music_v2":

1# Keep the first 30 seconds, regenerate the rest with a new style
2composition_plan = {
3 "chunks": [
4 # Keep the original first 30 seconds unchanged
5 {
6 "song_id": song_id,
7 "range": {"start_ms": 0, "end_ms": 30000}
8 },
9 # Regenerate the chorus with a new style
10 {
11 "text": "[Chorus]\nWe're rising up tonight\nNothing can stop us now",
12 "duration_ms": 30000,
13 "positive_styles": ["bigger drums", "layered vocals", "anthemic"],
14 "negative_styles": ["sparse", "minimal"],
15 "context_adherence": "high"
16 }
17 ]
18}
19
20audio = elevenlabs.music.compose(
21 composition_plan=composition_plan,
22 model_id="music_v2",
23)
24
25with open("edited.mp3", "wb") as f:
26 for chunk in audio:
27 f.write(chunk)

Conditioning

A generation chunk can be conditioned on a slice of stored audio with conditioning_ref. The model regenerates the chunk while staying close to the reference’s musical characteristics. Control how tightly the chunk follows the reference with condition_strength (low, medium, high, or xhigh).

1{
2 "text": "[Chorus]\nThis is my moment\nI won't let it go",
3 "duration_ms": 15000,
4 "positive_styles": ["powerful vocals", "full band", "anthemic"],
5 "negative_styles": [],
6 "context_adherence": "high",
7 "conditioning_ref": {
8 "song_id": "vVtPM1Sas70E2LIhQFch",
9 "range": { "start_ms": 30000, "end_ms": 45000 }
10 },
11 "condition_strength": "high"
12}

The first chunk influences the generation of all subsequent chunks. To condition the entire song on a reference, apply conditioning_ref starting from the first chunk.

A conditioning reference can be at most 30 seconds (30,000ms) long.

Context adherence

Each generation chunk has a context_adherence level that controls how closely it follows its neighboring chunks:

  • high (default) — stays consistent with surrounding chunks. Use it for smooth transitions between kept and regenerated audio.
  • medium — balances consistency with creative freedom.
  • low — lets the chunk deviate from its context and be more creative.

Examples

Edit a single section

Generate a movie trailer, then regenerate just the outro with different lyrics.

1

Generate the original

1composition_plan = {
2 "chunks": [
3 {
4 "text": "[Intro]\nIn a world beyond code\nWhere sound becomes life",
5 "duration_ms": 15000,
6 "positive_styles": ["cinematic", "epic", "orchestral", "low strings", "suspenseful"],
7 "negative_styles": ["acoustic", "pop", "minimalistic"],
8 "context_adherence": "high"
9 },
10 {
11 "text": "[Build]\nTechnology awakens the future\nShaping every word into power",
12 "duration_ms": 20000,
13 "positive_styles": ["rising brass", "full orchestra", "epic"],
14 "negative_styles": ["acoustic", "pop"],
15 "context_adherence": "high"
16 },
17 {
18 "text": "[Bridge]\n(ah ah ah ah)",
19 "duration_ms": 15000,
20 "positive_styles": ["ethereal choir", "crescendo"],
21 "negative_styles": [],
22 "context_adherence": "high"
23 },
24 {
25 "text": "[Outro]\nThe voice of tomorrow, unleashed\nElevenLabs",
26 "duration_ms": 10000,
27 "positive_styles": ["deep narration", "epic finale"],
28 "negative_styles": [],
29 "context_adherence": "high"
30 }
31 ]
32}
33
34response = elevenlabs.music.compose_detailed(
35 composition_plan=composition_plan,
36 model_id="music_v2",
37 store_for_inpainting=True
38)
39song_id = response.song_id
2

Edit just the outro

Keep the first three sections (seconds 0–50) with a single audio reference chunk, and regenerate the outro:

1edited_plan = {
2 "chunks": [
3 # Keep the intro, build, and bridge unchanged
4 {
5 "song_id": song_id,
6 "range": {"start_ms": 0, "end_ms": 50000}
7 },
8 # Regenerate the outro with new lyrics
9 {
10 "text": "[Outro]\nThe future has arrived\nElevenLabs",
11 "duration_ms": 10000,
12 "positive_styles": ["deep narration", "epic finale"],
13 "negative_styles": [],
14 "context_adherence": "high"
15 }
16 ]
17}
18
19audio = elevenlabs.music.compose(composition_plan=edited_plan, model_id="music_v2")

Extend a song

Add a new intro and outro to an existing song.

1

Generate the original

1response = elevenlabs.music.compose_detailed(
2 prompt="Berlin night club techno",
3 music_length_ms=60000,
4 model_id="music_v2",
5 store_for_inpainting=True
6)
7song_id = response.song_id
2

Extend with new intro and outro

Wrap a kept slice of the original between two new generation chunks:

1extend_plan = {
2 "chunks": [
3 # New intro
4 {
5 "text": "[Intro]",
6 "duration_ms": 30000,
7 "positive_styles": ["techno", "building tension", "filtered synths"],
8 "negative_styles": [],
9 "context_adherence": "high"
10 },
11 # Keep the core of the original (seconds 10-50)
12 {
13 "song_id": song_id,
14 "range": {"start_ms": 10000, "end_ms": 50000}
15 },
16 # New outro
17 {
18 "text": "[Outro]",
19 "duration_ms": 30000,
20 "positive_styles": ["techno", "fading out", "sparse"],
21 "negative_styles": [],
22 "context_adherence": "high"
23 }
24 ]
25}
26
27audio = elevenlabs.music.compose(composition_plan=extend_plan, model_id="music_v2")

Create a seamless loop

Generate a musical phrase and create a loop using a “glue” chunk that bridges the same slice repeated twice.

1

Generate a short clip

1composition_plan = {
2 "chunks": [
3 {
4 "text": "[Solo Acoustic]",
5 "duration_ms": 10000,
6 "positive_styles": ["acoustic guitar", "fingerpicking", "warm tone", "soft dynamics"],
7 "negative_styles": ["electric", "drums", "electronic"],
8 "context_adherence": "high"
9 }
10 ]
11}
12
13response = elevenlabs.music.compose_detailed(
14 composition_plan=composition_plan,
15 model_id="music_v2",
16 store_for_inpainting=True
17)
18song_id = response.song_id
2

Create a loop with a glue chunk

1loop_plan = {
2 "chunks": [
3 # Loop start - keep a slice of the original
4 {
5 "song_id": song_id,
6 "range": {"start_ms": 3000, "end_ms": 8000}
7 },
8 # Glue - generate a smooth transition between the two slices
9 {
10 "text": "[Glue]",
11 "duration_ms": 3000,
12 "positive_styles": ["acoustic guitar", "fingerpicking", "smooth transition"],
13 "negative_styles": [],
14 "context_adherence": "high"
15 },
16 # Loop end - keep the same slice again
17 {
18 "song_id": song_id,
19 "range": {"start_ms": 3000, "end_ms": 8000}
20 }
21 ]
22}
23
24audio = elevenlabs.music.compose(composition_plan=loop_plan, model_id="music_v2")

Generate a similar song

Condition a brand-new song on a short slice of an existing one to carry over its musical characteristics. Because the first chunk influences every chunk that follows, applying conditioning_ref to the first chunk shapes the entire generation, even though only that chunk references the stored audio.

1

Generate the original

1response = elevenlabs.music.compose_detailed(
2 prompt="An upbeat pop song with bright synths and driving drums",
3 music_length_ms=60000,
4 model_id="music_v2",
5 store_for_inpainting=True
6)
7song_id = response.song_id
2

Generate a new song conditioned on the original

1similar_plan = {
2 "chunks": [
3 # The first chunk conditions the whole song on the reference
4 {
5 "text": "[Verse]\nSalt on my skin from a borrowed sea\nCounting the heartbeats it takes to break free",
6 "duration_ms": 30000,
7 "positive_styles": ["pop", "energetic", "bright synths", "driving drums"],
8 "negative_styles": ["sparse", "minimal"],
9 "context_adherence": "high",
10 "conditioning_ref": {
11 "song_id": song_id,
12 "range": {"start_ms": 0, "end_ms": 10000}
13 },
14 "condition_strength": "high"
15 },
16 {
17 "text": "[Chorus]\nWe're rising up tonight\nNothing can stop us now",
18 "duration_ms": 30000,
19 "positive_styles": ["bigger drums", "layered vocals", "anthemic"],
20 "negative_styles": ["sparse", "minimal"],
21 "context_adherence": "high"
22 }
23 ]
24}
25
26audio = elevenlabs.music.compose(composition_plan=similar_plan, model_id="music_v2")

Chunk reference

A music_v2 plan contains up to 30 chunks. Each chunk is either a generation chunk or an audio reference chunk.

Generation chunk

FieldTypeDescription
textstringSection name in square brackets ([Verse 1]), lyrics lines, and inline directions in braces ({scratching}).
duration_msnumberLength in milliseconds (3,000 - 120,000).
positive_stylesarrayStyles and directions to include (max 50).
negative_stylesarrayStyles and directions to avoid (max 50). Defaults to empty.
context_adherencestringlow, medium, or high (default). How closely the chunk follows its surrounding chunks.
conditioning_refobject | nullOptional { song_id, range } slice of stored audio to condition on. Defaults to null.
condition_strengthstring | nulllow, medium (default), high, or xhigh. How strongly the chunk follows the conditioning reference.

The styles for the first chunk are the most important as they set the overall tone and genre. Aim for at least 6-7 styles in early chunks until the direction is established.

Audio reference chunk

FieldTypeDescription
song_idstringID of the stored song to source audio from.
rangeobject{ start_ms, end_ms } slice of the stored song to insert unchanged.

Constraints

ConstraintValue
Maximum chunks per plan30
Minimum chunk duration3 seconds (3,000ms)
Maximum chunk duration2 minutes (120,000ms)
Maximum conditioning reference30 seconds (30,000ms)
Minimum time range50ms

Next steps