Composition plans
Composition plans provide fine-grained control over music generation. A music_v2 plan is an ordered list of chunks, where each chunk defines a section of the song with its own styles, lyrics, and duration. Use text prompts for quick prototyping and composition plans when you need specific chunk structure, precise lyrics timing, or complex arrangements.
Composition plans and text prompts are mutually exclusive. Use one or the other, not both.
Chunk-based composition plans require the music_v2 model. Pass model_id="music_v2" when
composing.
Quickstart
Follow the music quickstart to set up your API key and install the SDK, then use composition plans for more control.
Structure reference
Chunks
A composition plan is an ordered list of up to 30 chunks. Each chunk generates one section of the song from its text and styles. The first chunk is the most important: its styles set the overall tone and genre for the whole song.
A song can have up to 30 chunks. Total duration must be between 3 seconds and 10 minutes, with each chunk between 3 and 120 seconds.
The styles for the first chunk are the most important as they set the overall tone and genre. Aim for at least 6-7 styles in early chunks until the direction is established. Generic styles like “great production quality” are good defaults to append to the list.
Chunks can also reference audio from a stored song to keep existing sections unchanged or condition new audio on them. See music inpainting for editing and combining existing songs.
Writing lyrics
The text field combines the section name, lyrics, and inline directions:
- Section name in square brackets:
[Verse 1],[Chorus],[Bridge] - Lyrics as plain text, with each line separated by a line break (
\n) - Phonetic sounds in parentheses:
(hmmm hmmm),(ooh),(yeah) - Inline directions in curly braces:
{guitar solo},{scratching},{instrumental break}
Use curly braces for short, inline cues. For broader characteristics that apply to the whole chunk — genre, instrumentation, or overall vocal style — use positive_styles instead.
In the corrected example, the overall vocal style moves to positive_styles, while the short inline cue stays in text using curly braces instead of parentheses.
Style tips
Be specific with style descriptors:
Use negative styles liberally to prevent unwanted sounds. Styles must be in English (lyrics can be any language).
If you include copyrighted content in styles, the API returns a bad_composition_plan error with a suggested alternative. See handling copyrighted material.