Studio
Overview
Studio is an end-to-end workflow for creating long-form content. With this tool you can upload an entire book, document or webpage and generate a voiceover narration for it. The result can then be downloaded as a single MP3 file or as individual MP3 files for each chapter.
Guide
You can use our Audio Native feature to easily and effortlessly embed any narration project onto your website.
Settings
Voices
Voices
We offer many types of voices, including the curated Default Voices library; completely synthetic voices created using our Voice Design tool; you can create your own collection of cloned voices using our two technologies: Instant Voice Cloning and Professional Voice Cloning. Browse through our voice library to find the perfect voice for your production.
Not all voices are equal, and a lot depends on the source audio used to create that voice. Some voices will perform better than others, while some will be more stable than others. Additionally, certain voices will be more easily cloned by the AI than others, and some voices may work better with one model and one language compared to another. All of these factors are important to consider when selecting your voice.
Voice settings
Voice settings
Our users have found different workflows that work for them. The most common setting is stability around 50 and similarity near 75, with minimal changes thereafter. Of course, this all depends on the original voice and the style of performance you’re aiming for.
It’s important to note that the AI is non-deterministic; setting the sliders to specific values won’t guarantee the same results every time. Instead, the sliders function more as a range, determining how wide the randomization can be between each generation.
Stability
The stability slider determines how stable the voice is and the randomness between each generation. Lowering this slider introduces a broader emotional range for the voice. As mentioned before, this is also influenced heavily by the original voice. Setting the slider too low may result in odd performances that are overly random and cause the character to speak too quickly. On the other hand, setting it too high can lead to a monotonous voice with limited emotion.
For a more lively and dramatic performance, it is recommended to set the stability slider lower and generate a few times until you find a performance you like.
On the other hand, if you want a more serious performance, even bordering on monotone at very high values, it is recommended to set the stability slider higher. Since it is more consistent and stable, you usually don’t need to generate as many samples to achieve the desired result. Experiment to find what works best for you!
Similarity
The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity slider is set too high, the AI may reproduce artifacts or background noise when trying to mimic the voice if those were present in the original recording.
Style exaggeration
With the introduction of the newer models, we also added a style exaggeration setting. This setting attempts to amplify the style of the original speaker. It does consume additional computational resources and might increase latency if set to anything other than 0. It’s important to note that using this setting has shown to make the model slightly less stable, as it strives to emphasize and imitate the style of the original voice.
In general, we recommend keeping this setting at 0 at all times.
Speaker boost
This setting boosts the similarity to the original speaker. However, using this setting requires a slightly higher computational load, which in turn increases latency. The differences introduced by this setting are generally rather subtle.
Pronunciation dictionaries
Pronunciation dictionaries
Sometimes you may want to specify the pronunciation of certain words, such as character or brand names, or specify how acronyms should be read. Pronunciation dictionaries allow this functionality by enabling you to upload a lexicon or dictionary file that includes rules about how specified words should be pronounced, either using a phonetic alphabet (phoneme tags) or word substitutions (alias tags).
Whenever one of these words is encountered in a project, the AI will pronounce the word using the specified replacement. When checking for a replacement word in a pronunciation dictionary, the dictionary is checked from start to end and only the first replacement is used.
You can add a pronunciation dictionary to your project from the General tab in Project settings.
Exporting
Exporting
Within the “Export” tab under General settings you can add additional metadata such as Title, Author, ISBN and a Description to your project. This information will automatically be added to the downloaded audio files.
FAQ
Free regenerations
In Studio, provided you don’t change the text, you can regenerate a selected paragraph or section of text twice for free.
If free regenerations are available for the selected paragraph or text, you will see “Regenerate”. If you hover over the “Regenerate” button, the number of free regenerations remaining will be displayed.
Once your free regenerations have been used, the button will display “Generate”, and you will be charged for subsequent generations.
Auto-regeneration for bulk conversions
When converting a full chapter or project, auto-regeneration automatically checks the output for volume issues, voice similarity, and mispronunciations. If ElevenLabs detects any issues, the tool will automatically regenerate the audio up to twice, at no extra cost.
This feature may increase the processing time but helps ensure higher quality output for your bulk conversions.