Studio
The ultimate end-to-end workflow for creating amazing dubs.
Overview
If you selected the option to create a Dubbing Studio project, once your dub finishes generating, you will see “Edit” when you click the three dots next to the dub in the list of dubs on your Dubbing page. Click “Edit” to open your Dubbing Studio project.
At first, when you open the studio, it might seem overwhelming, as there’s a lot of information to take in. However, if you have used an audio or video editor before, you will most likely feel right at home with the layout of the studio.
- First, it is important to note that the initial version of the dub is an automated dub, and cannot be personalised. We compensate for this by providing credits equal to the cost of creating the project that can be used within the project. These will allow you to edit your content at no extra cost, giving you the opportunity to fully customize and regenerate your dub at least once.
- In the middle, you will see the speaker cards, which show the transcribed audio as well as the translated transcription. If you only see one set of cards, don’t fret - to see both you have to select the language you want to work on. This defaults to the original.
- On the right-hand side, you will see the video clip that you uploaded to be dubbed. You can move this clip around and place it wherever you want. You can also resize it by dragging the corners of the clip.
- Below all of this, you have the timeline, which shows the different voices the AI extracted on individual tracks as well as clips indicating when a specific voice is speaking, along with the corresponding clips for the original audio.
- The timeline is also divided into a few different parts. On the left side, you can see the names of each speaker track. You can rename them here to keep your project organized. You can click the cogwheel and change settings across the whole track. However, keep in mind that if you do this, you will have to regenerate the already generated audio clips.
- In the middle section, you have the actual timeline, which includes all of the speech clips mentioned earlier.
- On the right-hand side, you have the settings for individual clips. When you have a clip selected, this is where you change the settings for that specific clip. You can change the volume, voice settings, or even the voice itself for the selected clip.
- Below this, you have the current dubs available for this project. You will see the original, which is just the original audio, and then all subsequent dubs that you have created for this project in all of the different languages. Click the plus button will add another dub to the project.
Speaker Cards
Right in the middle of the studio view, you will see the speaker cards. These cards represent the text that is being spoken by a specific voice. You can both change the transcribed text – the text that the AI has automatically transcribed from the audio – and the translated text – the text the AI has automatically translated for you from the transcription.
When you first open the studio, you will most likely only see the transcribed text from the original audio and not the translated text. However, at the bottom, below the timeline, you will see a toggle where you can switch between all the languages the project is dubbed in. When you create it initially, you will only have the original language plus the language that you selected when creating the dub. If you click the other language, you should see that the speaker cards get split into two versions of the same text: one is the original text, and the second is the dubbed text.
This toggle also determines which language you hear the dub in. If you have the original selected, you will only hear the original language, but if you select one of the other languages that the video is dubbed into, you will hear those languages. I would recommend toggling the language that you have dubbed your project into so you can follow along with the guide a little bit easier.
Timeline
Below the speaker cards, you will find the timeline. This is where you will refine and change the actual audio generated from the text in the speaker cards. It is segmented into different parts. On the left side, you have the tracks for each voice in the audio. In the middle, you have the clips that represent when a voice is speaking. On the right-hand side, you have the settings for the currently selected clip. We will go through all of these parts.
Tracks
When you create your dub, you either specify the number of speakers manually (this is the recommended method) or let the AI automatically detect the number of speakers. Each speaker will be assigned a track, and each speaker will have clips on that track which represent when they’re speaking and when they are not. These clips then represent the speaker cards – more on that later in the clips section.
On the left-hand side of each track, you have a few options. You can click the name, which usually just says “speaker” when you first create the dub, and then change it to the character name to keep it more organized.
On each track with a dubbed voice (not the tracks with the original voices), you will see a little cogwheel. If you click this, it will bring up some very important settings for each track. Here, you can change things such as stability, similarity, style for the whole track, as well as change how the clone is created. For example, you can select to have a clone created for each clip on the track individually (Clip Clone), create a single clone created from all clips for this speaker (Track Clone), or select a voice that you already have saved in My Voices. There’s a third way to create a clone, which I will go through in the clips section.
Lastly, on each track, you have three dots that you can click to access the ability to remove the track from the project if you feel like it was created incorrectly by the AI. Perhaps it picked up some background noise and thought it was a speaker, but it was not, which means you can discard this track.
Clips
Subsequently, each of these tracks will contain clips that represent the dialogue, audio, and speaker cards. These will be automatically created when you first create your dub.
If you click on a clip, the speaker cards will also jump to the appropriate location so you can easily find and edit transcriptions, translations, and performances. You will see two clips on top of each other in the same color; the top clip represents the original audio, and the bottom represents the dubbed audio. You can move these clips independently to adjust the audio within them. When you click a clip, it will be highlighted both in the timeline and in the speaker cards. This makes it very easy to edit specific clips without having to sync both views, as they do that automatically.
On each clip that represents a dubbed section, you will find two circling arrows which you can click to regenerate that specific clip. This will need to be done each time you have, for example, changed the settings, the voice, or the translation. You will need to regenerate the clip where this change occurred. If a clip needs to be regenerated, it will say “stale” next to these arrows. Regeneratating clips will cost credits.
If you have two clips that are very close together, you can click the gray icon between the two clips to combine them into a single, longer clip. Additionally, where the playhead is, you can click this gray icon to separate a clip into two individual clips.
If you drag either edge of a clip, you will extend or truncate it. You might notice that when you extend or truncate, the voice will either speed up or slow down, and the pitch will either go up or down as well. This is just an approximation, but you will have to regenerate the clip for the AI to be able to generate speech that will fit within the clip length and sound natural.
On the right-hand side, you will also see a few options. In contrast to the left-hand side options, which affect the whole track, these are the individual clip options. Here, you can set and change settings that will only affect the currently selected clip instead of the whole track. For example, you can set different values for stability, similarity, style, as well as adjust the volume. You can even specify a particular clone to be used for that particular clip only.
Lastly, you can right-click a clip to access a few more options. You can transcribe the audio again if you feel like the transcription was incorrect or if you’ve made changes to the clip. You can also delete the clip if you feel it shouldn’t be there. The most interesting option here is probably that you can create a clone from a specific clip. One helpful tip is to find a clip that you like, where you feel the voice is good, right-click to create a clone from that clip, and then assign that clone to the whole track to achieve a consistent voice throughout. This is just one tip and may not work for all circumstances, but it can work very well in some cases.
Adding Voiceover and SFX Tracks
Below the track list, you will see the following options:
-
Dubbed Speaker Tracks: If you encounter multiple speakers mixed within a single track, you can create a new dubbed speaker track. This allows you to isolate and transfer clips containing additional voices to the new track.
-
Voiceover Tracks: Voiceover tracks create new Speakers. You can click and add clips on the timeline wherever you like. After creating a clip, start writing your desired text on the speaker cards above. You’ll first need to translate that text, then you can press “Generate”. You can also use our voice changer tool by clicking on the microphone icon on the right side of the screen to use your own voice and then change it into the selected voice.
-
SFX Tracks: Add a SFX track, then click anywhere on that track to create a SFX clip. Similar to our independent SFX feature, simply start writing your prompt in the Speaker card above and click “Generate” to create your new SFX audio. You can lengthen or shorten SFX clips and move them freely around your timeline to fit your project - make sure to press the “stale” button if you do so.
-
Upload Audio: This option allows you to upload a non voiced track such as sfx, music or background track. Please keep in mind that if voices are present in this track, they won’t be detected so it will not be possible to translate or correct them.
”Dynamic” vs. “Fixed” Generation
In Dubbing Studio, all regenerations made to the text are “Fixed” generations by default. This means that no matter how much text is in a Speaker card, that respective clip will not change its length. This is helpful to keep the timing of the video with the speech. However, this can be problematic if there are too many or too few words within the speaker card, as this can result in sped up or slowed down speech.
This is where “Dynamic” generation can help. You can access this by right clicking on a clip and selecting “Generate Audio (Dynamic Duration). You’ll notice now that the length of the clip will more appropriately match the text spoken for that section. For example, the phrase “I’m doing well!” should only occupy a small clip - if the clip was very long, the speech would be slurred and drawn out. This is where Dynamic generation can be helpful.
Just note, though, that this could affect the syncing and timing of your video. Additionally, if you choose “Dynamic Duration” for a clip that has many words, the clip will need to lengthen - if there is a clip directly in front of it, it will not have enough room to generate properly, so make sure you leave some space between your clips!
Manual Import
When creating your dub, you have a special option during the creation process that is only available to the dubbing studio; manual dubbing. This option allows you to create a manual dub where you upload all of the files individually. You can upload the video file, the background audio, and the audio of only the speakers. Additionally, you should include a CSV file indicating the names of the speakers, the start and end time of when they are speaking, the original text, and the translated text. It’s similar to a subtitle file but with a lot more information. This file needs to adhere to a very strict format to work correctly.
Timecodes supported in CSV file include:
seconds (example file)
hours:minutes:seconds:frame (example file)
hours:minutes:seconds,milliseconds (example file)
speaker | start_time | end_time | transcription | translation |
---|---|---|---|---|
Joe | 0:00:00.000 | 0:00:02.000 | Hey! | Hallo! |
Maria | 0:00:02.000 | 0:00:06.000 | Oh, hi, Joe. It has been a while. | Oh, hallo, Joe. Es ist schon eine Weile her. |
Joe | 0:00:06.000 | 0:00:11.000 | Yeah, I know. Been busy. | Ja, ich weiß. War beschäftigt. |
Maria | 0:00:11.000 | 0:00:17.000 | Yeah? What have you been up to? | Ja? Was hast du gemacht? |
Joe | 0:00:17.000 | 0:00:23.000 | Traveling mostly. | Hauptsächlich gereist. |
Maria | 0:00:23.000 | 0:00:30.000 | Oh, anywhere I would know? | Oh, irgendwo, das ich kenne? |
Joe | 0:00:30.000 | 0:00:36.000 | Spain. | Spanien. |