Music inpainting
Music inpainting with the music_v2 model lets you modify specific parts of a song while keeping the rest intact. Store a generated song, then reference its parts in a composition plan to keep them unchanged, regenerate them, or condition new audio on the original.
How it works
A music_v2 composition plan is an ordered list of chunks. Each chunk is one of two types:
- Generation chunk — generates new audio from
textand styles. Use it to regenerate a section or add new material. - Audio reference chunk — inserts a slice of a stored song unchanged. Use it to keep a section of an existing song exactly as it is.
Quickstart
Conditioning
A generation chunk can be conditioned on a slice of stored audio with conditioning_ref. The model regenerates the chunk while staying close to the reference’s musical characteristics. Control how tightly the chunk follows the reference with condition_strength (low, medium, high, or xhigh).
The first chunk influences the generation of all subsequent chunks. To condition the entire song
on a reference, apply conditioning_ref starting from the first chunk.
A conditioning reference can be at most 30 seconds (30,000ms) long.
Context adherence
Each generation chunk has a context_adherence level that controls how closely it follows its neighboring chunks:
high(default) — stays consistent with surrounding chunks. Use it for smooth transitions between kept and regenerated audio.medium— balances consistency with creative freedom.low— lets the chunk deviate from its context and be more creative.
Examples
Edit a single section
Generate a movie trailer, then regenerate just the outro with different lyrics.
Extend a song
Add a new intro and outro to an existing song.
Create a seamless loop
Generate a musical phrase and create a loop using a “glue” chunk that bridges the same slice repeated twice.
Generate a similar song
Condition a brand-new song on a short slice of an existing one to carry over its musical characteristics. Because the first chunk influences every chunk that follows, applying conditioning_ref to the first chunk shapes the entire generation, even though only that chunk references the stored audio.
Chunk reference
A music_v2 plan contains up to 30 chunks. Each chunk is either a generation chunk or an audio reference chunk.
Generation chunk
The styles for the first chunk are the most important as they set the overall tone and genre. Aim for at least 6-7 styles in early chunks until the direction is established.