Dubbing Studio | ElevenLabs Documentation

Create a Dubbing Studio project

Check the ‘Create Dubbing Studio’ box when creating a dub.

Click on Create Dub. Once the Dubbing Studio project is created, you will be able to open it.

Core Concepts

Speaker Cards

Speaker cards show the original transcription and translation (if you add one) of dialogue from the source video. You can click ‘Transcribe Audio’ to retranscribe the original speech, or click the arrow to re-translate an existing transcription.

Edit Transcripts and Translations

Both transcriptions and translations can be edited freely - just click inside a speaker card and start typing to edit the text.

Speaker Identification

You can see the name of each speaker in the top left of the speaker card. To change the name of a speaker or reassign a clip to a different speaker, you’ll need to use the Timeline.

Timeline

The timeline contains many important elements of Dubbing Studio, covered in more detail in different sections below:

There are 3 main ways to navigate the timeline:

Click and drag the cursor
Horizontally scroll
Input a specific timecode on the right side of the timeline

Adjust clips and regenerate audio

Drag the handles on the left or right side of a clip to adjust its length.
Click the refresh icon to regenerate the audio for that clip.

Dynamic vs. Fixed Generations

NOTE: By default, all regenerations in Dubbing Studio are Fixed Generations, which means that the system will keep the duration of the clip fixed regardless of how much text it contains. This can lead to speech speeding up or slowing down significantly if you adjust the length of a clip without changing the text, or if you add/remove a large number of words to a clip.

Consider a clip with the phrase ‘I’m doing well.’ If that clip were set to last 10 seconds and the audio were generated using Fixed Generations, the speech would sound slow and drawn out.

Alternatively, you can use Dynamic Generations by right clicking a segment and selecting it from the options. This will attempt to adjust the length of the clip to the length of the text and make the audio sound more natural.

But be careful – using Dynamic Generations could affect sync and timing in your videos. If, for example, you select Dynamic Generation for a clip with many words in it, and there is not enough room before the next clip for it to properly expand, the audio may not generate properly.

Stale Audio

Stale audio refers to audio that needs to be regenerated for one of many reasons (clip length changes, settings changes, transcription/translation changes, etc). You can regenerate stale clips individually or click ‘Generate Stale Audio’ to bulk generate all stale audio clips.

Clip History

You can right click a clip and select ‘Clip History’ to view previous generations and select the one that sounds best.

Split and Merge clips

To split a clip, move the cursor to a specific timecode and click ‘Split’.
To merge two clips, drag the ends of the clips together and click ‘Merge.’

As you split and merge clips, the speaker cards above the timeline will update to reflect these changes.

Reassign clips to different speakers

To reassign a clips to a different speaker, click the segment and drag it to another track.

Add additional audio tracks

Use the action buttons at the bottom of the timeline to add new audio tracks

Voice Settings

Voice Selection

To select the voice that will be used to generate audio on a specific speaker track, click the settings cog icon on the left side of the timeline near the speaker name.

There are 3 main types of voices to choose from in Dubbing Studio:

Clip clone - this creates a unique voice clone for each clip based on the source audio for that clip
Track clone - this creates a single voice clone for the whole track based on all source audio for a given speaker
Other voices - you can also choose from thousands of voices available in our Voice Library, each with detailed metadata and tags to help you choose the right one

You can also create, save, and reuse a voice from a specific clip by right clicking the clip and selecting ‘Create Voice from Selection.‘

Setting Track vs. Clip Level Settings

You can set voice settings at two levels:

Track Level - changes will apply across all clips in the track, which can help with stability and consistency.
Clip Level - changes will only apply to a specific clip. To set clip-level settings, use the panel on the right side of the timeline. Disable the ‘inherit track settings’ toggle and configure your desired settings.

Exports

Click ‘Export’ in the bottom right of Dubbing Studio to open the export menu.

Dubbing Studio currently supports the following export formats:

AAC (audio)
MP3 (audio)
WAV (audio)
.zip of audio tracks
.zip of audio clips
AAF (timeline data)
SRT (subtitles/captions)
CSV (speaker, start_time, end_time, transcription, translation)

Make sure you select the correct language when exporting.

Additional Features

Voiceover Tracks: Voiceover tracks create new Speakers. You can click and add clips on the timeline wherever you like. After creating a clip, start writing your desired text on the speaker cards above. You’ll first need to translate that text, then you can press “Generate”. You can also use our voice changer tool by clicking on the microphone icon on the right side of the screen to use your own voice and then change it into the selected voice.
SFX Tracks: Add a SFX track, then click anywhere on that track to create a SFX clip. Similar to our independent SFX feature, simply start writing your prompt in the Speaker card above and click “Generate” to create your new SFX audio. You can lengthen or shorten SFX clips and move them freely around your timeline to fit your project - make sure to press the “stale” button if you do so.
Upload Audio: This option allows you to upload a non voiced track such as sfx, music or background track. Please keep in mind that if voices are present in this track, they won’t be detected so it will not be possible to translate or correct them.

Manual Dub

In cases where you already have an accurate dubbing script prepared and want to ensure your Dubbing Studio project sticks to your exact clips and speaker assignment, you can use the Manual Dub option during creation.

To create a Manual Dub, you’ll need:

Video file
Background audio file
Foreground audio file
CSV where each row contains a speaker, start_time, end_time, transcription, and translation field

The CSV file must strictly follow the predefined format in order to be processed correctly. Please see below for samples in the three supported timecodes:

seconds
hours:minutes:seconds:frame
hours:minutes:seconds,milliseconds

Example CSV files

1 speaker,start_time,end_time,transcription,translation
2 Adam,"0.10000","1.15000","Hello, how are you?","Hola, ¿cómo estás?"
3 Adam,"1.50000","3.50000","I'm fine, thank you.","Estoy bien, gracias."

speaker	start_time	end_time	transcription	translation
Joe	0:00:00.000	0:00:02.000	Hey!	Hallo!
Maria	0:00:02.000	0:00:06.000	Oh, hi, Joe. It has been a while.	Oh, hallo, Joe. Es ist schon eine Weile her.
Joe	0:00:06.000	0:00:11.000	Yeah, I know. Been busy.	Ja, ich weiß. War beschäftigt.
Maria	0:00:11.000	0:00:17.000	Yeah? What have you been up to?	Ja? Was hast du gemacht?
Joe	0:00:17.000	0:00:23.000	Traveling mostly.	Hauptsächlich gereist.
Maria	0:00:23.000	0:00:30.000	Oh, anywhere I would know?	Oh, irgendwo, das ich kenne?
Joe	0:00:30.000	0:00:36.000	Spain.	Spanien.

Create a Dubbing Studio project

Core Concepts

Speaker Cards

Speaker Cards

Edit Transcripts and Translations

Speaker Identification

Timeline

Timeline

Basic navigation

Adjust clips and regenerate audio

Dynamic vs. Fixed Generations

Stale Audio

Clip History

Split and Merge clips

Reassign clips to different speakers

Add additional audio tracks

Voice Settings

Voice Settings

Voice Selection

Setting Track vs. Clip Level Settings

Exports

Exports

Additional Features

Manual Dub

Example CSV files