# Introduction
> Explore our Guides and API Reference to get the most out of ElevenLabs.
## Welcome
In this documentation we will help you get started with [ElevenLabs](https://elevenlabs.io). Before we get started, we would like to mention that we also offer a [Help Center](https://help.elevenlabs.io/hc/en-us) which is more of an FAQ. Here, you can find answers to individual questions and interact with our chatbot. Additionally, you can submit tickets directly to our support team if you have any inquiries.
### Create
We will cover everything, beginning with Text to Speech and Speech to Speech, where you will generate your first audio using our Default Voices.
We also provide several features that extend beyond the realm of speech, including our Sound Effects Generator and our upcoming [Music Generator](https://www.youtube.com/watch?v=d8k4Pit4_ZU) (release date and name to be determined).
Our Text-to-Speech technology, also known as Speech Synthesis, is the core of ElevenLabs. It serves as the foundation for many of the features we offer and powers many services worldwide. This technology transforms text into incredibly realistic speech.
To ensure you get the most out of this feature, it is important to use an appropriate voice for what you are trying to achieve and to familiarize yourself with the different models we offer, as both of these factors have a tremendous effect on the delivery and quality of the output.
Our Speech-to-Speech technology, also known as Voice Changer, converts a source voice (audio input) into a different voice while retaining the source voice's accent, cadence, and overall delivery, but with the timbre and vocal quality of the selected voice.
This feature is great for standalone usage, as it provides your voice acting with a wider range of tonalities. Additionally, when used in conjunction with Text-to-Speech, it allows for easy correction of pronunciations and adds specific performances or characteristics, emulating subtle vocal nuances for a more human touch. You can make the AI whisper, sigh, laugh, or cry by simply acting it out and then using the voice changer.
Our Sound Effects generator allows you to create a wide range of audio effects by inputting descriptive prompts. This feature is great across variety of uses, such as film sound design, video game audio, music production, and much more. Users can generate sounds by typing a description into a text box, and the AI will produce multiple variations based on the given prompt.
The tool offers settings to control the duration of the sound and how closely the output adheres to the prompt.
### Voices
Once you have created your first audio output, we will proceed to cloning or designing your first voice on the My Voices page. After setting up your voices, you will be able to use your own voices to generate audio.
Voice Design allows for the creation of unique voices from text prompts, filling gaps when specific voices aren't available in the Voice Library or where you might even prefer a synthetically generated voice. You can create both realistic voices, which focus on attributes like age, accent, and emotion, and character voices, which allow you to create more creative voices by using more creative language.
Instant Voice Cloning enables you to create voice clones quickly from short audio samples, without the need for training. It uses a technology to create voice instantaneously, capturing its tone and inflections. While effective for many voices, it may struggle with unique accents or voices not encountered during training. High-quality, consistent audio samples are crucial for optimal results.
Professional Voice Cloning creates hyper-realistic voice models by training on larger datasets of a speaker's voice. Available from the Creator tier and above, it offers extremely higher accuracy and can capture intricate details, such as accents and tonal nuances. This method requires more audio data, between 1 - 3 hours, and a few hours to train the model but results in a voice clone that closely resembles the original.
The Voice Library is a marketplace where users can share and discover a wide variety of voices, including professional voice clones from real people who have cloned their voices and decided to share them with the rest of the community.
It offers filters and search options to help users find specific voice styles, languages, and accents. Users can add voices to "My Voices" for use across ElevenLabs features, and shared voices can earn either credits or monetary rewards based on usage.
### Workflows
Moving on, we will also go through our workflow-specific section, where we offer a variety of tools and workflows for different needs.
First, we will start with Projects, which is our end-to-end solution for creating voiceovers for long-form content, such as articles or audiobooks, with just a few clicks.
We will cover Dubbing, our solution for making content more accessible in all languages while preserving the original voice and striving to maintain the same performance across languages. This service comes in two flavors: automatic and studio.
Our automatic solution allows users to create dubs in any language supported by the AI with just a few clicks. Meanwhile, the Dubbing Studio provides an end-to-end workflow with great controllability for producing perfect dubs.
Finally, we will cover our Conversation AI platform, which provides an easy setup process for quickly and easily deploying conversational AI, as well as API endpoints and SDKs, to allow for seamless integration into your existing applications or flows.
Projects is a comprehensive tool for creating long-form audio content, such as audiobooks. It allows users to upload documents or web pages and generate voiceover narrations. It is easy to manage and keep track of all of your projects, select voices, and adjust settings for quality and download options. Projects is available on paid tiers, offering an efficient workflow for producing lengthy audio content.
Dubbing allows you to create high-quality dubs in various languages while preserving the original performance. You can upload or import video/audio files and select the original and target languages. The tool supports multiple formats and offers options for modifying specific parts of the dub. It's available on all plans, with advanced features in the Dubbing Studio.
Automatic Dubbing quickly translates and replaces the original audio of a video with a dub in a new language, maintaining the original speaker's voice characteristics. It provides a final output without the option for edits, making it ideal for fast and straightforward dubbing needs. You can upload files or import them via URL to start the process.
Dubbing Studio offers an advanced dubbing experience, allowing you to edit and customize dubs extensively. It supports manual transcription adjustments, voice selection, and precise timing edits. You can manage speaker tracks, regenerate audio, and export the final output in various formats. This feature provides full control over the dubbing process, ideal for detailed and tailored projects.
Voiceover Studio allows you to create interactive audio content with flexibility. It combines an audio timeline with text-to-speech, speech-to-speech, and sound effects, enabling the creation of dialogues between multiple speakers. You can upload videos or start projects from scratch, adding voiceover and sound effects tracks. Available on the Creator plan and above, it offers tools for crafting detailed and dynamic voiceover projects.
Audio Native is an embedded audio player that automatically voices web page content using ElevenLabs' text-to-speech service. It allows embedding pre-generated audio from projects into web pages with a simple HTML snippet. Available on the Creator plan and above, it includes metrics for tracking audience engagement through a listener dashboard.
Conversational AI is a platform for deploying interactive voice agents that can interact with you or your users in natural conversations. It integrates Speech to Text, Language Models, and Text to Speech, along with features like interruption handling and turn-taking logic. Users can customize agents with different voices and system prompts, making it suitable for applications like customer service, virtual assistants, and interactive characters.
## Signing up
You can sign up using the traditional method of email plus password or using Google OAuth.
If you choose to sign up with your email, you will be asked to verify your email address before you can start using the service. Once you have verified your email, you will be taken to the Speech Synthesis page, where you can immediately start using the service. Simply type anything into the box and press "generate" to convert the text into voiceover narration. Please note that each time you press "generate" anywhere on the website, it will deduct credits from your quota.
If you sign up using Google OAuth, your account will be intrinsically linked to your Google account, meaning you will not be able to change your email address, as it will always be linked to your Google email.
## Subscriptions
Once you sign up, you will be automatically assigned to the free tier. To view your subscription, click on "My Account" in the bottom left corner and select ["Subscription"](https://elevenlabs.io/app/subscription). You can read more about the different plans [here](https://elevenlabs.io/pricing). If you scroll down, you will find a comparison table that can be quite helpful in highlighting the differences between the various plans.
We offer five public plans: Free, Starter, Creator, Pro, Scale, and Business. In addition, we also offer a sixth option - Enterprise - tailored to the unique needs and usage of our clients.
You can see details of all our plans on the subscription page. This includes information about the total monthly credit quota, the number of custom voices you can have saved simultaneously, and the quality of audio produced.
Cloning is only available on the Starter tier and above. However, the free plan offers three custom voices that you can create using our Voice Design tool, or you can add voices from the Voice Library if they are not limited to paid tiers only.
You can upgrade your subscription at any time, and any unused quota from your previous plan will roll over to the new one. As long as you don’t cancel or downgrade, unused credits at the end of the month will carry over to the next month, up to a maximum of two months’ worth of credits. For more information, please visit our Help Center articles:
* ["How does credit rollover work?"](https://help.elevenlabs.io/hc/en-us/articles/27561768104081-How-does-credit-rollover-work)
* ["What happens to my subscription and quota at the end of the month?"](https://help.elevenlabs.io/hc/en-us/articles/13514114771857-What-happens-to-my-subscription-and-quota-at-the-end-of-the-month)
From the subscription page, you can also downgrade your subscription at any point in time if you would like. When downgrading, it won't take effect until the current cycle ends, ensuring that you won't lose any of the monthly quota before your month is up.
When generating content on our paid plans, you get commercial rights to use that content. If you are on the free plan, you can use the content non-commercially with attribution. Read more about the license in our [Terms of Service](https://elevenlabs.io/terms) and in our Help Center [here](https://help.elevenlabs.io/hc/en-us/articles/13313564601361-Can-I-publish-the-content-I-generate-on-the-platform-).
For more information on payment methods, please refer to the [Help Center](https://help.elevenlabs.io/).
# Overview
> A guide on how to generate voiceovers using your voice on ElevenLabs.
Our Text to Speech technology is the backbone of ElevenLabs. Many of the features we offer are built around this technology, and numerous excellent services around the web are powered by our technology, where the highest quality AI-generated speech is needed.
The speech model takes text and converts it into extremely realistic speech. On the surface, it’s a fairly simple concept, but the execution is anything but. There are a few things to keep in mind to achieve the best possible results, and we will try to cover most of it.
We are constantly working on improving our service and technology, adding new features and settings. Therefore, it can be helpful to check back periodically to ensure you have the latest information and are following the most recent guidelines.
There are two main factors that we emphasize as being of utmost importance to ensure the best possible experience when using our Text to Speech.
Getting yourself familiar with these different settings and options will be very important in getting the best possible result. For Text to Speech, there are three main selections you need to make.
We offer many types of voices, including Default Voices that have been specfically curated to be the highest quality; completely synthetic voices created using our Voice Design tool; you can create your own collection of cloned voices using our two technologies: Instant Voice Clones and Professional Voice Clones; browse through our voice library to find the perfect voice for your production.
Not all voices are equal, and a lot depends on the source audio used to create that voice. Some voices will perform better than others, while some will be more stable than others. Additionally, certain voices will be more easily cloned by the AI than others, and some voices may work better with one model and one language compared to another. All of these factors are important to consider when selecting your voice.
As of December 2024, ElevenLabs offers two families of models: standard (high-quality) models and Flash models, which are optimized for low latency. Each family includes both English-only and multilingual models, tailored for specific use cases with strengths in either speed, accuracy, or language diversity.
* **Standard models** (Multilingual v2, Multilingual v1, English v1) are optimized for quality and accuracy, ideal for content creation. These models offer the best quality and stability but have higher latency.
* **Flash models** (Flash v2, Flash v2.5) are designed for low-latency applications like real-time conversational AI. They deliver great performance with faster processing speeds, though with a slight trade-off in accuracy and stability.
If you want to find more detailed specifications about which languages each model offers, you can find all that information in our help article [here](https://help.elevenlabs.io/hc/en-us/articles/17883183930129-What-models-do-you-offer-and-what-is-the-difference-between-them-).
For advice on how to deal with issues that might arise, please see our [guide to troubleshooting.]()
Our users have found different workflows that work for them. The one you'll see most often is setting stability around 50 and similarity near 75, with minimal changes thereafter. Of course, this all depends on the original voice and the style of performance you're aiming for.
It's important to note that the AI is non-deterministic; setting the sliders to specific values won't guarantee the same results every time. Instead, the sliders function more as a range, determining how wide the randomization can be between each generation. Setting stability low means a wider range of randomization, often resulting in a more emotive performance, but this is also highly dependent on the voice itself.
For a more lively and dramatic performance, it is recommended to set the stability slider lower and generate a few times until you find a performance you like.
On the other hand, if you want a more serious performance, even bordering on monotone on very high values, it is recommended to set the stability slider higher. And since it's more consistent and stable, you usually don't need to do as many generations to get what you are looking for. Experiment to find what works best for you!
## Good to know
The first factor, and one of the most important, is that good, high-quality, and consistent input will result in good, high-quality, and consistent output.
If you provide the AI with audio that is less than ideal—for example, audio with a lot of noise, reverb on clear speech, multiple speakers, or inconsistency in volume or performance and delivery—the AI will become more unstable, and the output will be more unpredictable.
If you plan on cloning your own voice, we strongly recommend that you go through our guidelines in the documentation for creating proper voice clones, as this will provide you with the best possible foundation to start from. Even if you intend to use only Instant Voice Clones, it is advisable to read the Professional Voice Cloning section as well. This section contains valuable information about creating voice clones, even though the requirements for these two technologies are slightly different.
The second factor to consider is that the voice you select will have a tremendous effect on the output. Not only, as mentioned in the first factor, is the quality and consistency of the samples used to create that specific clone extremely important, but also the language and tonality of the voice.
If you want a voice that sounds happy and cheerful, you should use a voice that has been cloned using happy and cheerful samples. Conversely, if you desire a voice that sounds introspective and brooding, you should select a voice with those characteristics.
However, it is also crucial to use a voice that has been trained in the correct language. For example, all of the professional voice clones we offer as default voices are English voices and have been trained on English samples. Therefore, if you have them speak other languages, their performance in those languages can be unpredictable. It is essential to use a voice that has been cloned from samples where the voice was speaking the language you want the AI to then speak.
This may seem slightly trivial, but it can make a big difference. The AI tries to understand how to read something based on the context of the text itself, which means not only the words used but also how they are put together, how punctuation is applied, the grammar, and the general formatting of the text.
This can have a small but impactful influence on the AI's delivery. If you were to misspell a word, the AI won't correct it and will try to read it as written.
The settings of the AI are nondeterministic, meaning that even with the same initial conditions (voice, settings, model), it will give you slightly different output, similar to how a voice actor will deliver a slightly different performance each time.
This variability can be due to various factors, such as the options mentioned earlier: voice, settings, model. Generally, the breadth of that variability can be controlled by the stability slider. A lower stability setting means a wider range of variability between generations, but it also introduces inter-generational variability, where the AI can be a bit more performative.
A wider variability can often be desirable, as setting the stability too high can make certain voices sound monotone as it does give the AI the same leeway to generate more variable content. However, setting the stability too low can also introduce other issues where the generations become unstable, especially with certain voices that might have used less-than-ideal audio for the cloning process.
The default setting of 50 is generally a great starting point for most applications.
# Voice Selection
We offer many types of voices, including Default Voices that have been specifically curated to be of the highest quality; completely synthetic voices created using our Voice Design tool; and the option to create your own collection of cloned voices using our two technologies: Instant Voice Clones and Professional Voice Clones. You can browse through our voice library to find the perfect voice for your production.
Not all voices are equal, and a lot depends on the source audio used to create that voice. Some voices will perform better than others, while some will be more stable than others. Additionally, certain voices will be more easily cloned by the AI than others, and some voices may work better with one model and one language compared to another. All of these factors are important to consider when selecting your voice.
## Default Voices
Default voices are a curated set of voices optimized for core use cases and available to all ElevenLabs users by default. They are designed to ensure long-term availability, consistent quality, and priority support for new model developments.
These voices are crafted through multi-year partnerships with voice actors, making them reliable for extended use.
## Clone Voices
Cloned voices are created using either Instant Voice Cloning or Professional Voice Cloning.
* **Instant Voice Cloning** allows you to clone a voice using short audio samples, providing quick results but with less fidelity. This method is suitable for creating a basic clone without extensive training.
* **Professional Voice Cloning** involves training a model on larger datasets of a specific speaker's voice, resulting in a more accurate and realistic clone. This method is available for users on the Creator plan or higher and requires more time and resources.
Cloned voices are private and not shared publicly unless specifically whitelisted or shared through the Voice Library.
## Synthetic Voices
Synthetic voices are generated by AI using the Voice Design tool. They are created from text prompts and offer flexibility in attributes like gender, age, and accent. These voices are not based on real human voices and can be used to fill gaps when specific voices aren't available in the Voice Library. Synthetic voices cannot be shared with others and are available to all users for creating unique voice outputs.
## Voice Library
The Voice Library is a marketplace where the community can share their Professional Voice Clones for others to use. It offers a wide variety of voices contributed by users, allowing you to explore and utilize different voice options. You can search for voices using filters like language, accent, and more to find the ideal voice for your needs.
# Voice Settings
Our users have found different workflows that work for them. The one you'll see most often is setting stability around 50 and similarity near 75, with minimal changes thereafter. Of course, this all depends on the original voice and the style of performance you're aiming for.
It's important to note that the AI is non-deterministic; setting the sliders to specific values won't guarantee the same results every time. Instead, the sliders function more as a range, determining how wide the randomization can be between each generation. Setting stability low means a wider range of randomization, often resulting in a more emotive performance, but this is also highly dependent on the voice itself.
For a more lively and dramatic performance, it is recommended to set the stability slider lower and generate a few times until you find a performance you like.
On the other hand, if you want a more serious performance, even bordering on monotone at very high values, it is recommended to set the stability slider higher. Since it's more consistent and stable, you usually don't need to do as many generations to get what you are looking for. Experiment to find what works best for you!
## Stability
The stability slider determines how stable the voice is and the randomness between each generation. Lowering this slider introduces a broader emotional range for the voice. As mentioned before, this is also influenced heavily by the original voice. Setting the slider too low may result in odd performances that are overly random and cause the character to speak too quickly. On the other hand, setting it too high can lead to a monotonous voice with limited emotion.
## Similarity
The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity slider is set too high, the AI may reproduce artifacts or background noise when trying to mimic the voice if those were present in the original recording.
## Style Exaggeration
With the introduction of the newer models, we also added a style exaggeration setting. This setting attempts to amplify the style of the original speaker. It does consume additional computational resources and might increase latency if set to anything other than 0. It's important to note that using this setting has shown to make the model slightly less stable, as it strives to emphasize and imitate the style of the original voice.
In general, we recommend keeping this setting at 0 at all times.
## Speaker Boost
This is another setting that was introduced in the new models. The setting itself is quite self-explanatory – it boosts the similarity to the original speaker. However, using this setting requires a slightly higher computational load, which in turn increases latency. The differences introduced by this setting are generally rather subtle.
# Model Selection
As of December 2024, ElevenLabs offers two families of models: standard (high-quality) models and Flash models, which are optimized for low latency. Each family includes both English-only and multilingual models, tailored for specific use cases with strengths in either speed, accuracy, or language diversity.
* **Standard models** (Multilingual v2, Multilingual v1, English v1) are optimized for quality and accuracy, ideal for content creation. These models offer the best quality and stability but have higher latency.
* **Flash models** (v2.5 Flash, v2 Flash) are designed for low-latency applications like real-time conversational AI. They deliver great performance with faster processing speeds, though with a slight trade-off in accuracy and stability.
If you want to find more detailed specifications about which languages each model offers, you can find all that information in our help article [here](https://help.elevenlabs.io/hc/en-us/articles/17883183930129-What-models-do-you-offer-and-what-is-the-difference-between-them-).
For advice on how to deal with issues that might arise, please see our guide to [troubleshooting](/docs/product/troubleshooting/overview).
## **Standard Models**
**Eleven Multilingual v2**
ur most advanced speech synthesis model, Multilingual v2, offers high stability, diverse language support, and exceptional accuracy in 29 languages. While slower than the Flash models, it delivers more lifelike speech, making it ideal for content creation such as voiceovers, audiobooks, and post-production.
* English (UK)
* English (USA)
* English (Australia)
* English (Canada)
* Japanese
* Chinese
* German
* Hindi
* French (France)
* French (Canada)
* Korean
* Portuguese (Brazil)
* Portuguese (Portugal)
* Italian
* Spanish (Spain)
* Spanish (Mexico)
* Indonesian
* Dutch
* Turkish
* Filipino
* Polish
* Swedish
* Bulgarian
* Romanian
* Arabic (Saudi Arabia)
* Arabic (UAE)
* Czech
* Greek
* Finnish
* Croatian
* Malay
* Slovak
* Danish
* Tamil
* Ukrainian
* Russian
**Important notes**: The accuracy of this model depends heavily on the quality of the input samples. Lower-quality samples can introduce errors, which the AI might attempt to replicate. For the best results, use high-quality, consistent voice samples, especially when trying to preserve accents or tonal details across languages.
* Best quality
* Unparalleled accuracy
* More stable
* Higher latency
**Eleven English v1**
Our very first model, English v1, laid the groundwork for future advancements. While still functional, it is now outclassed by Multilingual v2 (for content creation) and Flash v2 (for low-latency applications). We recommend using our newer models for better quality and speed.
**Eleven Multilingual v1**
Multilingual v1 was our first attempt at generating speech in multiple languages, but it is now considered experimental and has been surpassed by Multilingual v2 and Flash v2.5. We recommend avoiding this model for production use due to its limitations and lower accuracy.
* English (USA)
* English (UK)
* English (Australia)
* English (Canada)
* German
* Polish
* Spanish (Spain)
* Spanish (Mexico)
* Italian
* French (France)
* French (Canada)
* Portuguese (Portugal)
* Portuguese (Brazil)
* Hindi
## **Flash Models**
**Eleven v2.5 Flash**
v2.5 Flash generates speech in 32 languages with low latency, optimized for real-time conversational AI use cases. This model is much faster than Multilingual v2 and now supports new languages such as Vietnamese, Hungarian, and Norwegian. It is best for developers requiring rapid, natural speech across multiple languages, but it lacks the stylistic range of Multilingual v2.
Model latency is as low as 75ms (excl. network), making it ideal for real-time interactions.
* Great quality
* High accuracy with Professional Voice Clones
* Slightly less stable
* Optimized for low latency
- English (USA)
- English (UK)
- English (Australia)
- English (Canada)
- Japanese
- Chinese
- German
- Hindi
- French (France)
- French (Canada)
- Korean
- Portuguese (Brazil)
- Portuguese (Portugal)
- Italian
- Spanish (Spain)
- Spanish (Mexico)
- Indonesian
- Dutch
- Turkish
- Filipino
- Polish
- Swedish
- Bulgarian
- Romanian
- Arabic (Saudi Arabia)
- Arabic (UAE)
- Czech
- Greek
- Finnish
- Croatian
- Malay
- Slovak
- Danish
- Tamil
- Ukrainian
- Russian
- Hungarian
- Norwegian
- Vietnamese
**Eleven Flash v2**
A low-latency, English-only model optimized for conversational applications. Flash v2 is similar in performance to Flash v2.5 but focused exclusively on English, making it ideal for English-only use cases where speed is critical.
* Great quality
* High accuracy with Professional Voice Clones
* Slightly less stable
* Optimized for low latency
- English (USA)
- English (UK)
- English (Australia)
- English (Canada)
# Prompting
> Effective techniques to guide ElevenLabs AI in adding pauses, conveying emotions, and pacing the speech.
## Pause
There are a few ways to introduce a pause or break and influence the rhythm and cadence of the speaker. The most consistent way is programmatically using the syntax ``. This will create an exact and natural pause in the speech. It is not just added silence between words, but the AI has an actual understanding of this syntax and will add a natural pause.
An example could look like this:
```
"Give me one second to think about it." "Yes, that would work."
```
Break time should be described in seconds, and the AI can handle pauses of up to 3 seconds in length and can be used in Speech Synthesis and via the API. It is not yet available for Projects.
However, since this is more than just inserted silence, how the AI handles these pauses can vary. As usual, the voice used plays a pivotal role in the output. Some voices, for example, voices trained on data with "uh"s and "ah"s in them, have been shown to sometimes insert those vocal mannerisms during the pauses like a real speaker might. This is more prone to happen if you add a break tag at the very start or very end of your text.
Please avoid using an excessive number of break tags as that has shown to potentially cause some instability in the AI. The speech of the AI might start speeding up and become very fast, or it might introduce more noise in the audio and a few other strange artifacts. We are working on resolving this.
### Alternatives
These options are inconsistent and might not always work. We recommend using the syntax above for consistency.
One trick that seems to provide the most consistence output - sans the above option - is a simple dash `-` or the em-dash `—`. You can even add multiple dashes such as `-- --` for a longer pause.
```
"It - is - getting late."
```
Ellipsis `...` can sometimes also work to add a pause between words but usually also adds some "hesitation" or "nervousness" to the voice that might not always fit.
```
I... yeah, I guess so..."
```
## Pronunciation
This feature is currently only supported by the "Eleven English V1" and "Eleven Turbo V2" models.
In certain instances, you may want the model to pronounce a word, name, or phrase in a specific way. Pronunciation can be specified using standardised pronunciation alphabets. Currently we support the International Phonetic Alphabet (IPA) and the CMU Arpabet. Pronunciations are specified by wrapping words using the Speech Synthesis Markup Language (SSML) phoneme tag.
To use this feature, you need to wrap the desired word or phrase in the `word` tag for IPA, or `word` tag for CMU Arpabet. Replace `"your-IPA-Pronunciation-here"` or `"your-CMU-pronunciation-here"` with the desired IPA or CMU Arpabet pronunciation.
An example for IPA:
```
actually
```
An example for CMU Arpabet:
```
actually
```
It is important to note that this only works per word. Meaning that if you, for example, have a name with a first and last name that you want to be pronounced a certain way, you will have to create the pronunciation for each word individually.
English is a lexical stress language, which means that within multi-syllable words, some syllables are emphasized more than others. The relative salience of each syllable is crucial for proper pronunciation and meaning distinctions. So, it is very important to remember to include the lexical stress when writing both IPA and ARPAbet as otherwise, the outcome might not be optimal.
Take the word "talon", for example.
Incorrect:
```
talon
```
Correct:
```
talon
```
The first example might switch between putting the primary emphasis on AE and AH, while the second example will always be pronounced reliably with the emphasis on AE and no stress on AH.
If you write it as:
```
talon
```
It will always put emphasis on AH instead of AE.
With the current implementation, we recommend using the CMU ARPAbet as it seems to be a bit more consistent and predictable with the current iteration of AI models. Some people get excellent results with IPA, but we have noticed that ARPAbet seems to work better with the current AI and be more consistent for a lot of users. However, we are working on improving this.
## Emotion
At the moment, there is no clear way to infuse emotion into the generated speech based on prompts or anything else. However, we are working hard on this technology, and we are all building something that has not been done before.
For the time being, the suggestions below work, and they work especially well with the English V1 model; unfortunately, they work less well with other models. The AI will read the whole text that you provide, just like an audiobook, but it will try to infuse some emotion based on the text that you give it.
If you want the AI to express a specific emotion, the best approach is to write in a style similar to that of a book. To find good prompts to use, you can flip through some books and identify words and phrases that convey the desired emotion.
For instance, you can use dialogue tags to express emotions, such as `he said, confused`, or `he shouted angrily`. These types of prompts will help the AI understand the desired emotional tone and try to generate a voiceover that accurately reflects it. With this approach, you can create highly customized voiceovers that are perfect for a variety of applications.
```
"Are you sure about that?" he said, confused.
"Don’t test me!" he shouted angrily.
```
You will also have to somehow remove the prompt as the AI will read exactly what you give it. The AI can also sometimes infer the intended emotion from the text’s context, even without the use of tags.
```
"That is funny!"
"You think so?"
```
This is not always perfect since you are relying on the AI discretion to understand if something is sarcastic, funny, or just plain from the context of the text.
## Pacing
Based on varying user feedback and test results, it's been theorized that using a singular long sample for voice cloning has brought more success for some, compared to using multiple smaller samples. The current theory is that the AI stitches these samples together without any separation, causing pacing issues and faster speech. This is likely why some people have reported fast-talking clones.
To control the pacing of the speaker, you can use the same approach as in emotion, where you write in a style similar to that of a book. While it's not a perfect solution, it can help improve the pacing and ensure that the AI generates a voiceover at the right speed. With this technique, you can create high-quality voiceovers that are both customized and easy to listen to.
```
"I wish you were right, I truly do, but you're not," he said slowly.
```
# Overview
> A guide on using our voice changer tool for the most natural-sounding speech-to-speech conversion
Voice changer (previously Speech-to-Speech) allows you to convert one voice (source voice) into another (cloned voice) while preserving the tone and delivery of the original voice.
The possibilities are endless! Voice changer can be used to complement Text-to-Speech (TTS) by fixing pronunciation errors or infusing that special performance you've been wanting to exude. Voice changer is especially useful for emulating those subtle, idiosyncratic characteristics of the voice that give a more emotive and human feel. Some key features include:
* **Greater accuracy with whispering**
* **The ability to create audible sighs, laughs, or cries**
* **Greatly improved detection of tone and emotion**
* **Accurately follows the input speaking cadence**
* **Language/accent retention**
**Source audio (Brian):**
**Output audio (Lily):**
## Record or Upload
Audio can be uploaded either directly with an audio file, or spoken live through a microphone. The audio file must be less than 50mb in size, and either the audio file or your live recording cannot exceed 5 minutes in length. This is consistent among all subscription tiers, and is to ensure a stable output. If you have material longer than 5 minutes, we recommend breaking it up into smaller sections and generating them separately. Additionally, if your file size is too large, you may need to compress/convert it to an mp3.
To upload, either click the "Upload Audio" button in the audio box, or drag and drop your audio file directly onto it.
To record, first press the "Record Audio" button in the audio box, and then once you are ready to begin recording, press the Microphone button to start. After you're finished recording, press the "Stop" button.
You will then see the audio file of this recording, which you can then playback to listen to - this is helpful to determine if you are happy with your performance/recording, or if you notice background noise that may inhibit the AI's ability to produce a clean output. The character cost will be displayed on the bottom-left corner, and you will not be charged this quota for recording anything - only when you press "Generate". **The cost for a voice changer generation is solely duration-based at 1000 characters per minute.**
If you need to re-do the recording, simply press the trash icon to remove it and start over. When you're happy with your recording, you can select any voice or model you prefer, and you do not need to re-record the input audio.
## Models
Voice changer is now available for all 29 languages currently supported by the Multilingual v2 model. The English v2 model is also available for specifically English speech, but the Multilingual v2 model generally performs better, even for English audio.
The settings for each model are consistent with the settings in our TTS m2 models. If the input audio is very expressive and energetic with lots of dynamic range, it's best to keep Style all the way down to 0% and Stability all the up to 100% - we don't want to inhibit the performance with the AI's interpretation, so this will give the most consistent and stable results.
## Other Tips and Tricks
Voice changer is exceptional in **preserving accents** and **natural speech cadences** with many different output voices you desire. For example, if you decide to upload an audio sample with a voice that is native to Portuguese, your output voices will adopt that same language and accent. Again, the input sample is the most important factor, and this is the data that voice changer will work with. If a British voice is chosen (let's take our Default voice "George" as an example), but your recorded voice is an American accent, the final output will be George's voice with an American accent.
When recording your voice, ensure that the input gain of your microphone is suitable. A quiet recording may make it more difficult for the AI to pick up what is being said, while a louder recording could produce audio clipping which is also undesirable. Additionally, try your best to **prevent background noise** from being present in the recording, as the AI will pick up everything, and it may try to "voice" any miscellaneous noises that it hears.
**Optional:** If you're recording in a noisy environment, you may want to use our [Voice Isolator tool](https://elevenlabs.io/app/voice-isolator) on the recording. You can then add the edited audio file to voice changer as an upload.
Be expressive! Whether you're shouting, crying, laughing, or anything in between, voice changer will copy that performance to a tee. We're constantly striving to increase the realism of AI through many different features, and voice changer is our most useful tool in this regard. You can get really creative here!
# Overview
> Get the most out of our Sound Effects Generator tool and learn how to create everything from blockbuster sound design for films to everyday sounds for your video game.
It is said that audio is more important than visuals. Most people can accept bad visuals, but most can't stand bad audio. Audio also evokes emotions and sets moods for your audience; it can be subtle, or it can be bombastic. Depending on the type of sounds and music that you use in your production, it can completely change the emotional context and meaning behind what you are trying to convey.
However, sometimes it's quite difficult to find that perfect sound. But it has now become much easier with ElevenLabs, as our sound effects generator allows you to generate any sound imaginable by inputting a prompt, streamlining the process tremendously. Of course, this is not only a great tool for independent filmmakers or indie game developers. It is also a fantastic resource for big productions, sound designers, and producers because you can generate such a vast array of sounds.
We will go through some of them here in this documentation. Keep in mind that this is just scratching the surface. While the feature might seem simple at first glance, the understanding that the AI has of natural language, combined with the type of sound effects it can generate, opens up infinite possibilities.
The general layout for sound effects is fairly straightforward. You have a window where you will input a prompt, some settings, and a generate button. When you first open the web page, you will have a few suggestions below the text box to showcase what some of the prompts might look like that you can easily try out.
Each time you press generate, the AI will create full variations of the prompt you've given. The cost for using the sound effects generator is based on the length of the generated audio. If you let the AI decide the audio length itself, the cost is 200 characters per generation. If you set the duration yourself, the cost is 40 characters per second.
## Prompting
A prompt is a piece of text or instruction that communicates to the AI model what kind of response or output is expected. The prompt serves as a starting point or context for the AI to understand the user's intent and generate relevant and coherent output accordingly.
In this section, we will go through how to construct a good prompt as well as what a prompt entails. We will then categorize these prompts into simple prompts and complex prompts. In general, simple prompts instruct the AI to generate one sound, while complex prompts guide the AI to generate a series of sounds.
The AI understands both natural language, which will be explored further in complex prompts, and a lot of music terminology. Sound Effects currently works best when prompts are written in English.
### Simple Prompts
Simple prompts are just that: they are straightforward, one-sided prompts where we try to get the AI to generate a single sound effect. This could be, for example, "person walking on grass" or "glass breaking." These types of prompts will generate a single type of sound effect with a few variations within the same generation or in subsequent generations. All in all, they are fairly simple.
However, there are ways to improve these prompts by adding a little more detail. Even if they are simple prompts, they can yield better output by enhancing the prompt itself. For example, something that sometimes works is adding details like "high-quality, professionally recorded footsteps on grass, sound effects foley." It may require some experimentation to find a good balance between being descriptive and keeping it brief enough for the AI to understand the prompt.
> Opening a creaking door
> Chopping wood
These types of prompts generate a single type of sound, but they might produce multiple variations of that sound within the same audio file. The AI is quite prone to doing that even without additional prompting, especially for short sounds like chopping wood, and also since that is a continuous action.
### Complex Prompts
When referring to complex prompts, we don't mean the length or the adjectives or adverbs used in the prompts. Although those can increase the complexity of the prompt, when we say complex prompts, we mean prompts where you have multiple sound effects or a sequence of sound effects happening in a specific order and the AI being able to replicate this.
> A man walks through a hallway and then falls down some stairs
Let's take the prompt above as an example. The AI needs to understand both what a man walking through the hallway sounds like and what a man falling down some stairs sounds like. It needs to understand the sequence in which these two actions are supposed to occur based on how you wrote it and then combine these sounds to make both of them sound coherent and correct. This is what we mean by a complex prompt because it involves both an understanding of sound and an understanding of the natural language explaining what you want.
The AI can handle this; for example, the result for the example prompt above should ideally be accurate.
However, in general, this is much more complicated for the AI to do because it is more complex. For the best results, we recommend generating individual sound effects and then combining them in an audio editor of your choice, much like you would with a real production where you have individual sound effects that are then combined.
> A woman is singing in a church. Then someone coughs.
## Settings
Once you've set your prompt and know what you want to generate, you can adjust the settings. Set how long you want the generated audio to be and how influential the prompt should be to the output.
There are just two settings:
**Duration:** Determine how long your generations should be. Depending on what you set this as, you can get quite different results. For example, if I write "kick drum" and set the length to 11 seconds, I might get a full drum loop with a kick drum in it, but that might not be what I want. On the other hand, if I set the length to 1 second, I might just get a one-shot with a single instance of a kick drum.
**Prompt Influence:** Slide the scale to make your generation perfectly adhere to your prompt or allow for a little creativity. This setting ranges from giving the AI more creativity in how it interprets the prompt to telling the AI to be more strict in following the exact prompt that you've given.
## Sound Effects
Now that we are dealing with prompts, it is important to learn some terminology when it comes to audio to get the most out of the feature. You will have to prompt the AI with words and sentences in a way that it understands, and in this case, it understands both natural language and audio terminology.
There are many words that people working with audio know very well, and these are used in their daily vocabulary. However, for ordinary people, those words are completely foreign and might not mean anything. I will provide a short and very non-comprehensive list of some of the words you might want to test and that might be helpful to know.
**Foley:** The process of recreating and recording everyday sound effects like footsteps, movement, and object sounds in sync with the visuals of a film, TV show, or video game to enhance the audio quality and realism.
**Whoosh:** An effect that underscores movement, like a fist flying or a camera move. It's versatile and can range from fast, ghostly, slow-spinning, rhythmic, noisy, to tense.
**Impact:** The sound of an object making contact with another object or structure, like a book falling, a car crashing, or a mug shattering.
**Braam:** A big, brassy, cinematic hit that conveys something epic and grand is about to happen, commonly used in movie trailers.
**Glitch:** The sound of a malfunction, jittering, scratching, skipping, or moving erratically, used for transitions, logo reveals, or sci-fi soundscapes.
**Drone:** A continuous, textured sound that adds atmosphere and suspense, often used to underscore exploration or horror scenes.
Onomatopoeias like "oink," "meow," "roar," and "chirp" are also important sound effects that imitate natural sounds.
### Examples
> high-quality, wav, sound designed whoosh and braaam impact
In a case like this, it can sometimes be better to set the length instead of letting the AI decide. I know that I want a drawn-out "braaam" for this sound, so it will not be a very short sound. I will show you the results I get using both automatic and manual settings.
Duration set to 11 seconds:
Duration set to automatic:
> high-quality, wav, sound designed whoosh
> high-quality, wav, sound designed whoosh, aggressive
> high-quality, wav, sound designed whoosh, aggressive, futuristic, electronic
## Beyond Sound Effects
Even if the name of the feature is "Sound Effects," don't let that fool you. This is the perfect tool for sound designers, Foley artists, game developers, as well as producers and composers.
If you're a hip-hop producer looking for samples, whether new or more old school, and are tired of digging in crates or reusing the same overused samples that everyone else uses, this is the perfect tool for you. If you are an EDM producer looking for one-shots or other samples, it's perfect for you as well.
You can generate everything from individual one-shots to drum loops, instrumental loops, and unique new samples from big band sections and brass stabs—pretty much anything you can imagine.
I will go through a little bit of how to prompt this, but it involves a lot of trial and error to get what you want.
**Stem:** An individual track from a multitrack recording, such as isolated vocals, drums, or guitar.
**BPM:** Beats per minute, indicating the tempo of a piece of music.
**Key:** The scale in which a piece of music is set, such as C major or A minor.
**Loop:** A repeating section of sound material, commonly used in electronic music.
**Sample:** A portion of sound, typically a recording, used in musical compositions.
**One-shot:** A single, non-repeating sound or sample, often used in percussion.
These terms are, of course, just scratching the surface, as there are concepts such as synth pads, basslines, chord progressions, arpeggios, and many other musical terms that can be good to learn. However, the above can be a good starting point for generating musical material.
### Examples
You can create individual one-shot drum sounds.
> 90s hip-hop beat, drum loop sample
> Old-school funky bassline sample, stem, 88 BPM in F# minor
> Old-school funky brass section from an old vinyl sample, stem, 88 BPM in F# minor
> Old-school funky brass stabs from an old vinyl sample, stem, 88 BPM in F# minor
I do not remember the exact prompts for these, but they sounded good, so I wanted to include them. They showcase some other genres.
Then, you can, of course, take different types of samples and combine them to create full music. Anyone who's ever worked as a producer, especially those familiar with old-school hip-hop where sampling old tracks and editing is the essence of the genre, will find this a treasure trove of new samples to use.
A professional producer could create something amazing with these types of samples, and we are very excited to see what might be developed. Here is a quick demo to show what just a few generations and a couple of minutes of work can achieve.
This is the final product.
# Overview
> Discover all the voices ElevenLabs has to offer
## Voice Categories
There are 3 primary categories of voices on the ElevenLabs platform:
* Default Voices available to all
users on the Speech page, in the "Default" tab of My Voices, and via the API -
Long-term availability - Consistent quality for most use cases - Priority support
for new models - Optimized for English
* Generated voices made
using our Voice Design tool - Custom voice creation with gender, age, and accent
options - Includes different English accents to choose from - May require multiple
attempts to find the desired voice
* Cloned voices made using our Instant Voice Cloning (IVC) or Professional Voice Cloning (PVC) products,
including new ones you create and those added from the Voice Library.
# Default Voices
> A curated set of voices for our core use cases.
Default voices are a curated set of voices optimized for our core use cases and made available to all ElevenLabs users by default. They come with a few core guarantees:
* **Long-term availability**: we make our default voices via multi-year partnerships with voice actors
* **Consistent quality**: our team carefully crafts and QCs our default voices to ensure they perform well across a range of use cases
* **Priority model support**: our default voices are the first to receive fine tunings for new models as they are released
**Note:** Default voices were previously referred to as "premade" voices. The latter term is still used when accessing default voices via the API, e.g. when filtering by `category == "premade"`. Please see the [voices API documentation](https://elevenlabs.io/docs/api-reference/get-voices) for more details.
## Using Default Voices
Unlike voices you add from the Voice Library or new Instant Voice Clones (IVCs) or Professional Voice Clones (PVCs) you create, default voices are not accessed via your My Voices.
There are several ways you can find and use default voices:
Instead, there are 2 ways to find and use default voices:
* My Voices: Default voices can be found in My Voices, under either the
"All" or the "Default" tabs. Default voices do not take up any of your custom
voice slots, and cannot be removed from My Voices. You can use Default voices
directly from My Voices by clicking "Use". This will open Speech Synthesis
with the voice already selected.
* API: Calls to the /voices endpoint will fetch all default voices in
addition to voices added to My Voices. See the
voices API documentation
for more details.{" "}
* Voice dropdown menu: Default voices can be selected from the voice
dropdown menu. In Speech Synthesis, this can be accessed by clicking the voice
name in the bottom left-hand corner of the text-to-speech or voice changer
screen:
You'll find our default voices under the "Default" heading, or alternatively, you can search for the name of the voice you want to use. To hear a sample of the voice, click the circular icon next to the voice name.
{" "}
## Current Default Voices
Below is a list of our current default voices, including metadata and sample audio. Please note that all of our current default voices have fine tunings for our **Flash/Turbo v2, Flash/Turbo v2.5, and Multilingual v2 models**, which means they are optimized for use with these models.
| name | voice\_id | gender | age | accent | description | use\_case | preview\_url |
| --------- | -------------------- | ---------- | ----------- | ------------- | ------------- | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| Alice | Xb7hH8MSUJpSbSDYk0k2 | female | middle-aged | British | confident | news | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/Xb7hH8MSUJpSbSDYk0k2/d10f7534-11f6-41fe-a012-2de1e482d336.mp3) |
| Aria | 9BWtsMINqrJLrRacOk9x | female | middle-aged | American | expressive | social media | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/9BWtsMINqrJLrRacOk9x/405766b8-1f4e-4d3c-aba1-6f25333823ec.mp3) |
| Bill | pqHfZKP75CvOlQylNhV4 | male | old | American | trustworthy | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/pqHfZKP75CvOlQylNhV4/d782b3ff-84ba-4029-848c-acf01285524d.mp3) |
| Brian | nPczCjzI2devNBz1zQrb | male | middle-aged | American | deep | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/nPczCjzI2devNBz1zQrb/2dd3e72c-4fd3-42f1-93ea-abc5d4e5aa1d.mp3) |
| Callum | N2lVS1w4EtoT3dr4eOWO | male | middle-aged | Transatlantic | intense | characters | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/N2lVS1w4EtoT3dr4eOWO/ac833bd8-ffda-4938-9ebc-b0f99ca25481.mp3) |
| Charlie | IKne3meq5aSn9XLyUdCD | male | middle aged | Australian | natural | conversational | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/IKne3meq5aSn9XLyUdCD/102de6f2-22ed-43e0-a1f1-111fa75c5481.mp3) |
| Charlotte | XB0fDUnXU5powFXDhCwa | female | young | Swedish | seductive | characters | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/XB0fDUnXU5powFXDhCwa/942356dc-f10d-4d89-bda5-4f8505ee038b.mp3) |
| Chris | iP95p4xoKVk53GoZ742B | male | middle-aged | American | casual | conversational | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/iP95p4xoKVk53GoZ742B/3f4bde72-cc48-40dd-829f-57fbf906f4d7.mp3) |
| Daniel | onwK4e9ZLuTAKqWW03F9 | male | middle-aged | British | authoritative | news | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/onwK4e9ZLuTAKqWW03F9/7eee0236-1a72-4b86-b303-5dcadc007ba9.mp3) |
| Eric | cjVigY5qzO86Huf0OWal | male | middle-aged | American | friendly | conversational | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/cjVigY5qzO86Huf0OWal/d098fda0-6456-4030-b3d8-63aa048c9070.mp3) |
| George | JBFqnCBsd6RMkjVDRZzb | male | middle aged | British | warm | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/JBFqnCBsd6RMkjVDRZzb/e6206d1a-0721-4787-aafb-06a6e705cac5.mp3) |
| Jessica | cgSgspJ2msm6clMCkdW9 | female | young | American | expressive | conversational | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/cgSgspJ2msm6clMCkdW9/56a97bf8-b69b-448f-846c-c3a11683d45a.mp3) |
| Laura | FGY2WhTYpPnrIDTdsKH5 | female | young | American | upbeat | social media | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/FGY2WhTYpPnrIDTdsKH5/67341759-ad08-41a5-be6e-de12fe448618.mp3) |
| Liam | TX3LPaxmHKxFdv7VOQHJ | male | young | American | articulate | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/TX3LPaxmHKxFdv7VOQHJ/63148076-6363-42db-aea8-31424308b92c.mp3) |
| Lily | pFZP5JQG7iQjIQuC4Bku | female | middle-aged | British | warm | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/pFZP5JQG7iQjIQuC4Bku/89b68b35-b3dd-4348-a84a-a3c13a3c2b30.mp3) |
| Matilda | XrExE9yKIg1WjnnlVkGX | female | middle-aged | American | friendly | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/XrExE9yKIg1WjnnlVkGX/b930e18d-6b4d-466e-bab2-0ae97c6d8535.mp3) |
| River | SAz9YHcvj6GT2YYXdXww | non-binary | middle-aged | American | confident | social media | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/SAz9YHcvj6GT2YYXdXww/e6c95f0b-2227-491a-b3d7-2249240decb7.mp3) |
| Roger | CwhRBWXzGAHq8TQ4Fs17 | male | middle-aged | American | confident | social media | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/CwhRBWXzGAHq8TQ4Fs17/58ee3ff5-f6f2-4628-93b8-e38eb31806b0.mp3) |
| Sarah | EXAVITQu4vr4xnSDxMaL | female | young | American | soft | news | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/EXAVITQu4vr4xnSDxMaL/01a3e33c-6e99-4ee7-8543-ff2216a32186.mp3) |
| Will | bIHbv24MWmeRgasZH58o | male | young | American | friendly | social media | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/bIHbv24MWmeRgasZH58o/8caf8f3d-ad29-4980-af41-53f20c72d7a4.mp3) |
## How do Default Voices sound in my language?
* Our default voices can be used to generate audio in any of the [32 languages we support](https://elevenlabs.io/docs/api-reference/text-to-speech#supported-languages) by using them with one of our multilingual models (e.g. Multilingual v2, v2.5 Flash or Turbo v2.5).
* Some default voices may have unpredicable accents in other languages.
* We are working to provide a granular overview of how each default voice sounds in each of the languages we support and will update this page when this is ready.
## Legacy Voices
Below is a list of our legacy voices, which can be accesssed in 2 ways:
* **UI**: Search for the name of the legacy voice you're looking for in any voice dropdown, or go to My Voices -> Default, and look for voices with "Legacy" in the name.
* **API**: To see legacy voices when calling the /voices endpoint, you need to set the `show_legacy` query parameter to `True`. Please see the [voices API documentation](https://elevenlabs.io/docs/api-reference/get-voices) for more details.
**Note:** Legacy voices will remain available for the foreseeable future, but they are less consistent than default voices and will not receive priority support for future model releases. For more information on Legacy voices, please see [What are Legacy voices?](https://help.elevenlabs.io/hc/en-us/articles/26928417254801-What-are-Legacy-voices)
| name | voice\_id | gender | age | accent | description | use\_case | preview\_url |
| -------- | -------------------- | ------ | ----------- | ---------------- | -------------- | ---------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| Adam | pNInz6obpgDQGcFmaJgB | male | middle aged | american | deep | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/pNInz6obpgDQGcFmaJgB/d6905d7a-dd26-4187-bfff-1bd3a5ea7cac.mp3) |
| Antoni | ErXwobaYiN019PkySvjV | male | young | american | well-rounded | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/ErXwobaYiN019PkySvjV/2d5ab2a3-4578-470f-b797-6331e46a7d55.mp3) |
| Arnold | VR6AewLTigWG4xSOukaG | male | middle aged | american | crisp | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/VR6AewLTigWG4xSOukaG/49a22885-80d5-48e8-87a3-076fc9193d9a.mp3) |
| Clyde | 2EiwWnXFnvU5JabPnv8n | male | middle-aged | American | war veteran | characters | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/2EiwWnXFnvU5JabPnv8n/65d80f52-703f-4cae-a91d-75d4e200ed02.mp3) |
| Dave | CYw3kZ02Hs0563khs1Fj | male | young | British | conversational | characters | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/CYw3kZ02Hs0563khs1Fj/872cb056-45d3-419e-b5c6-de2b387a93a0.mp3) |
| Dorothy | ThT5KcBeYPX3keUQqHPh | female | young | British | pleasant | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/ThT5KcBeYPX3keUQqHPh/981f0855-6598-48d2-9f8f-b6d92fbbe3fc.mp3) |
| Drew | 29vD33N1CtxCmqQRPOHJ | male | middle-aged | American | well-rounded | news | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/29vD33N1CtxCmqQRPOHJ/b99fc51d-12d3-4312-b480-a8a45a7d51ef.mp3) |
| Emily | LcfcDJNUP1GQjkzn1xUU | female | young | American | calm | meditation | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/LcfcDJNUP1GQjkzn1xUU/e4b994b7-9713-4238-84f3-add8fccaaccd.mp3) |
| Ethan | g5CIjZEefAph4nQFvHAz | male | young | American | soft | ASMR | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/g5CIjZEefAph4nQFvHAz/26acfa99-fdec-43b8-b2ee-e49e75a3ac16.mp3) |
| Fin | D38z5RcWu1voky8WS1ja | male | old | Irish | sailor | characters | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/D38z5RcWu1voky8WS1ja/a470ba64-1e72-46d9-ba9d-030c4155e2d2.mp3) |
| Freya | jsCqWAovK2LkecY7zXl4 | female | young | American | expressive | characters | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/jsCqWAovK2LkecY7zXl4/8e1f5240-556e-4fd5-892c-25df9ea3b593.mp3) |
| George | Yko7PKHZNXotIFUBG7I9 | male | middle aged | british | | audiobook | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/Yko7PKHZNXotIFUBG7I9/02c66c93-a237-436f-8a7d-43e8c49bc6a3.mp3) |
| Gigi | jBpfuIE2acCO8z3wKNLl | female | young | American | childlish | animation | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/jBpfuIE2acCO8z3wKNLl/3a7e4339-78fa-404e-8d10-c3ef5587935b.mp3) |
| Giovanni | zcAOhNBS3c14rBihAFp1 | male | young | Italian | foreigner | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/zcAOhNBS3c14rBihAFp1/e7410f8f-4913-4cb8-8907-784abee5aff8.mp3) |
| Glinda | z9fAnlkpzviPz146aGWa | female | middle-aged | American | witch | characters | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/z9fAnlkpzviPz146aGWa/cbc60443-7b61-4ebb-b8e1-5c03237ea01d.mp3) |
| Grace | oWAxZDx7w5VEj9dCyTzz | female | young | American (South) | pleasant | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/oWAxZDx7w5VEj9dCyTzz/84a36d1c-e182-41a8-8c55-dbdd15cd6e72.mp3) |
| Harry | SOYHLrjzK2X1ezoPC6cr | male | young | American | anxious | characters | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/SOYHLrjzK2X1ezoPC6cr/86d178f6-f4b6-4e0e-85be-3de19f490794.mp3) |
| James | ZQe5CZNOzWyzPSCn5a3c | male | old | Australian | calm | news | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/ZQe5CZNOzWyzPSCn5a3c/35734112-7b72-48df-bc2f-64d5ab2f791b.mp3) |
| Jeremy | bVMeCyTHy58xNoL34h3p | male | young | Irish | excited | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/bVMeCyTHy58xNoL34h3p/66c47d58-26fd-4b30-8a06-07952116a72c.mp3) |
| Jessie | t0jbNlBVZ17f02VDIeMI | male | old | American | raspy | characters | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/t0jbNlBVZ17f02VDIeMI/e26939e3-61a4-4872-a41d-33922cfbdcdc.mp3) |
| Joseph | Zlb1dXrM653N07WRdFW3 | male | middle-aged | British | articulate | news | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/Zlb1dXrM653N07WRdFW3/daa22039-8b09-4c65-b59f-c79c48646a72.mp3) |
| Josh | TxGEqnHWrfWFTfGW9XjX | male | young | american | deep | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/TxGEqnHWrfWFTfGW9XjX/47de9a7e-773a-42a8-b410-4aa90c581216.mp3) |
| Michael | flq6f7yk4E4fJM5XTYuZ | male | old | American | calm | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/flq6f7yk4E4fJM5XTYuZ/c6431a82-f7d2-4905-b8a4-a631960633d6.mp3) |
| Mimi | zrHiDhphv9ZnVXBqCLjz | female | young | Swedish | childish | animation | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/zrHiDhphv9ZnVXBqCLjz/decbf20b-0f57-4fac-985b-a4f0290ebfc4.mp3) |
| Nicole | piTKgcLEGmPE4e6mEKli | female | young | American | soft | ASMR | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/piTKgcLEGmPE4e6mEKli/c269a54a-e2bc-44d0-bb46-4ed2666d6340.mp3) |
| Patrick | ODq5zmih8GrVes37Dizd | male | middle-aged | American | shouty | characters | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/ODq5zmih8GrVes37Dizd/0ebec87a-2569-4976-9ea5-0170854411a9.mp3) |
| Paul | 5Q0t7uMcjvnagumLfvZi | male | middle-aged | American | authoritative | news | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/5Q0t7uMcjvnagumLfvZi/a4aaa30e-54c4-44a4-8e46-b9b00505d963.mp3) |
| Rachel | 21m00Tcm4TlvDq8ikWAM | female | young | american | calm | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/21m00Tcm4TlvDq8ikWAM/b4928a68-c03b-411f-8533-3d5c299fd451.mp3) |
| Sam | yoZ06aMxZJJ28mfd3POQ | male | young | american | raspy | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/yoZ06aMxZJJ28mfd3POQ/b017ad02-8d18-4456-ad92-55c85ecf6363.mp3) |
| Serena | pMsXgVXv3BLzUgSXRplE | female | middle-aged | American | pleasant | narration | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/pMsXgVXv3BLzUgSXRplE/d61f18ed-e5b0-4d0b-a33c-5c6e7e33b053.mp3) |
| Thomas | GBv7mTt0atIp3Br8iCZE | male | young | American | calm | meditation | [Sample](https://storage.googleapis.com/eleven-public-prod/premade/voices/GBv7mTt0atIp3Br8iCZE/98542988-5267-4148-9a9e-baa8c4f14644.mp3) |
# Voice Design
> Generate a unique voice from a text prompt.
Voice Design helps creators fill the gaps when the exact voice they are looking for isn’t available in the Voice Library. Now if you can’t find a suitable voice for your project, you can create one. Note that Voice Design is highly experimental and Professional Voice Clones are still the highest quality voices on our platform. If there is a PVC available in our library that fits your needs, we recommend using it.
You can find Voice Design by heading to Voices -> My Voices -> Add a new voice -> Voice Design.
When you hit generate, we'll generate three voice options for you. The only charge for using voice design is the number of credits to generate your preview text, which you are only charged once even though we are generating three samples for you. You can see the number of characters that will be deducated in the "Text to preview" text box.
After generating, you'll have the option to select and save one of the generations, which will take up one of your voice slots.
## Voice Design Prompt Guide
### Voice Design Types
| Type | Description | Example Prompts |
| :--------------------- | :------------------------------------------------------------------------------------------------------------------------------ | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Realistic Voice Design | Create an original, realistic voice by specifying age, accent/nationality, gender, tone, pitch, intonation, speed, and emotion. | - "A young Indian female with a soft, high voice. Conversational, slow and calm." - "An old British male with a raspy, deep voice. Professional, relaxed and assertive." - "A middle-aged Australian female with a warm, low voice. Corporate, fast and happy." |
| Character Voice Design | Generate unique voices for creative characters using simpler prompts. | - "A massive evil ogre, troll" - "A sassy little squeaky mouse" - "An angry old pirate, shouting"
Some other characters we've had success with include Goblin, Vampire, Elf, Troll, Werewolf, Ghost, Alien, Giant, Witch, Wizard, Zombie, Demon, Devil, Pirate, Genie, Ogre, Orc, Knight, Samurai, Banshee, Yeti, Druid, Robot, Elf, Monkey, Monster, Dracula |
### Voice Attributes
| Attribute | Importance | Options |
| :----------------- | :-------------- | :------------------------------------------------------------------ |
| Age | High Importance | Young, Teenage, Adult, Middle-Aged, Old, etc... |
| Accent/Nationality | High Importance | British, Indian, Polish, American, etc... |
| Gender | High Importance | Male, Female, Gender Neutral |
| Tone | Not Needed | Gruff, Soft, Warm, Raspy, etc... |
| Pitch | Not Needed | Deep, Low, High, Squeaky, etc... |
| Intonation | Not Needed | Conversational, Professional, Corporate, Urban, Posh, etc... |
| Speed | Not Needed | Fast, Quick, Slow, Relaxed, etc... |
| Emotion/Delivery | Not Needed | Angry, Calm, Scared, Happy, Assertive, Whispering, Shouting, etc... |
# Overview
> Learn more about My Voices
My Voices is your personal voice HQ. Here you can:
* Create new Instant Voice Clones (IVCs), Professional Voice Clones (PVCs), and generate new voices using [Voice Design.](https://elevenlabs.io/docs/voices/voice-lab/voice-design)
* View all of your custom voices, including those added from the Voice Library
## Search and filter
My Voices includes a search box, so you can easily find voices by searching for the name, words from the description, or tags.
You can sort voices either alphabetically, or by most recently use, and you can filter by voice type. By default, all voice types are selected, but you can also filter for Professional Voice Clones, Instant Voice Clones or Generated Voices.
## Voice Categories
My Voices includes several tabs which allow you to filter your voices by type.
* **All**: All the voices currently saved in My Voices, including [Default](https://elevenlabs.io/docs/voices/default-voices) voices. Default voices cannot be deleted from My Voices, and do not take up any of your custom voice slots.
* **Personal**: Voices that you have created - Professional Voice Clones, Instant Voice Clones and voices generated using Voice Design.
* **Community**: Voices you have saved from the Voice Library.
* **Default**: All Default voices.
## Tags and Labels
You can give voices in My Voices (including those added from the Voice Library) custom names, descriptions, and tags. This allows you to organize My Voices as you wish and store custom attributes. However, please note that these changes will not reflect on shared versions of your voices. To edit shared names, descriptions, and labels, please set these values when sharing your voice in the Voice Library.
## Deleting voices
You can only delete voices that you have created or saved from the Voice Library. Default and Legacy voices cannot be deleted, but they don't take up any of your custom voice slots. Voices that you have created using Instant and Professional Voice Cloning and Voice Design, as well as voices you've saved from the Voice Library, use your custom voice slots.
You can free up voice slots by deleting these voices. To delete a voice, first click "View" to open the detailed view for the voice. Then click "Delete" in the bottom left corner. You can use the "Personal" and "Community" tabs to easily identify voices that use your custom voice slots, and can be deleted.
## Sharing voices
Only Professional Voice Clones can be shared with other users. Instant Voice Clones and voices created using Voice Design cannot be shared.
Professional Voice Clones can be shared privately, via a sharing link, or publicly via the Voice Library. For full details on how to share your Professional Voice Clone, please see [Sharing Voices.](https://elevenlabs.io/docs/voices/voice-library/sharing)
# Instant Voice Cloning
> Guide for getting the most out of you cloned voices.
Instant Voice Cloning (IVC) allows you to create voice clones form shorter samples near instantaneously. Creating an instant voice clone does not train or create a custom AI model. Instead, it relies on prior knowledge from training data to make an educated guess rather than training on the exact voice. This works extremely well for a lot of voices.
However, the biggest limitation of IVC is if you are trying to clone a very unique voice with a very unique accent where the AI might not have heard a similar voices before during training. In such cases, creating a custom model with explicit training using Professional Voice Cloning (PVC) might be the best option.
## Voice Creation
When cloning a voice, it's important to consider what the AI has been trained on: which languages and what type of dataset. In this case, you can find the languages for each model [here](https://help.elevenlabs.io/hc/en-us/articles/17883183930129-What-models-do-you-offer-and-what-is-the-difference-between-them-), and the dataset is quite varied, especially for the `multilingual v2`. You can read more about each individual model [here](/docs/product/speech-synthesis/models) and their strengths.
As mentioned earlier, if the voice you try to clone falls outside of these parameters or outside of what the AI has heard during training, it might have a hard time replicating the voice perfectly using instant voice cloning.
How the audio was recorded is more important than the total length (total runtime) of the samples. The number of samples you use doesn't matter; it is the total combined length (total runtime) that is the important part.
Approximately 1-2 minutes of clear audio without any reverb, artifacts, or background noise of any kind appears to be the sweet spot. When we speak of "audio or recording quality," we do not mean the codec, such as MP3 or WAV; we mean how the audio was captured. However, regarding audio codecs, using MP3 at 128 kbps and above seems to work just fine, and higher bitrates don't seem to markedly improve the quality of the clone.
The AI will attempt to mimic everything it hears in the audio; the speed of the person talking as well as the inflections, the accent and tonality, breathing pattern and strength, as well as noise and mouth clicks and everything else, including noise and artefacts which can confuse it.
Another important thing to keep in mind is that the AI will try to replicate the performance of the voice you provide. If you talk in a slow, monotone voice without much emotion, that is what the AI will mimic. On the other hand, if you talk quickly with much emotion, that is what the AI will try to replicate.
It is crucial that the voice remains consistent throughout all the samples, not only in tone but also in performance. If there is too much variance, it might confuse the AI, leading to more varied output between generations.
* The most important aspect to get a proper clone is the voice itself, the language and accent, and the quality of the recording.
* Audio length is less important than quality but still plays an important role up to a certain point. At a minimum, input audio should be 1 minute long. Avoid adding beyond 3 minutes; this will yield little improvement and can, in some cases, even be detrimental to the clone, making it more unstable.
* Keep the audio consistent. Ensure that the voice maintains a consistent tone throughout, with a consistent performance. Also, make sure that the audio quality of the voice remains consistent across all the samples. Even if you only use a single sample, ensure that it remains consistent throughout the full sample. Feeding the AI audio that is very dynamic, meaning wide fluctuations in pitch and volume, will yield less predictable results.
* Find a good balance for the volume so the audio is neither too quiet nor too loud. The ideal would be between -23 dB and -18 dB RMS with a true peak of -3 dB.
If you are unsure about what is permissible from a legal standpoint, please consult the [Terms of Service](https://elevenlabs.io/terms-of-use) and our [AI Safety information](https://elevenlabs.io/safety) for more information.
# Professional Voice Cloning
> Guide to getting the highest quality voice clone available.
The video is currently slightly outdated as we've released new features since
it was made, and the training time is significantly quicker. However, a lot of
the information in it is still relevant.
Professional Voice Cloning (PVC), unlike Instant Voice Cloning (IVC) which lets you clone voices with very short samples nearly instantaneously, allows you to train a hyper-realistic model of a voice. This is achieved by training a dedicated model on a large set of voice data to produce a model that’s indistinguishable from the original voice.
Since the custom models require fine-tuning and training, it will take a bit longer to train these Professional Voice Clones compared to the Instant Voice Clones. Giving an estimate is challenging as it depends on the number of people in the queue before you and a few other factors.
Here are the current estimates for Professional Voice Cloning:
* **English:** \~3 hours
* **Multilingual:** \~6 hours
## Voice Creation
There are a few things to be mindful of before you start uploading your samples, and some steps that you need to take to ensure the best possible results.
Firstly, Professional Voice Cloning is highly accurate in cloning the samples used for its training. It will create a near-perfect clone of what it hears, including all the intricacies and characteristics of that voice, but also including any artifacts and unwanted audio present in the samples. This means that if you upload low-quality samples with background noise, room reverb/echo, or any other type of unwanted sounds like music on multiple people speaking, the AI will try to replicate all of these elements in the clone as well.
Secondly, make sure there's only a single speaking voice throughout the audio, as more than one speaker or excessive noise or anything of the above can confuse the AI. This confusion can result in the AI being unable to discern which voice to clone or misinterpreting what the voice actually sounds like because it is being masked by other sounds, leading to a less-than-optimal clone.
Thirdly, make sure you have enough material to clone the voice properly. The bare minimum we recommend is 30 minutes of audio, but for the optimal result and the most accurate clone, we recommend closer to 3 hours of audio. You might be able to get away with less, but at that point, we can’t vouch for the quality of the resulting clone.
Fourthly, the speaking style in the samples you provide will be replicated in the output, so depending on what delivery you are looking for, the training data should correspond to that style (e.g. if you are looking to voice an audiobook with a clone of your voice, the audio you submit for training should be a recording of you reading a book in the tone of voice you want to use). It is better to just include one style in the uploaded samples for consistencies sake.
Lastly, it’s best to use samples speaking where you are speaking the language that the PVC will mainly be used for. Of course, the AI can speak any language that we currently support. However, it is worth noting that if the voice itself is not native to the language you want the AI to speak - meaning you cloned a voice speaking a different language - it might have an accent from the original language and might mispronounce words and inflections. For instance, if you clone a voice speaking English and then want it to speak Spanish, it will very likely have an English accent when speaking Spanish. We only support cloning samples recorded in one of our supported languages, and the application will reject your sample if it is recorded in an unsupported language.
For now, we only allow you to clone your own voice. You will be asked to go through a verification process before submitting your fine-tuning request.
* **Professional Recording Equipment:** Use high-quality recording equipment for optimal results as the AI will clone everything about the audio. High-quality input = high-quality output. Any microphone will work, but an XLR mic going into a dedicated audio interface would be our recommendation. A few general recommendations on low-end would be something like an Audio Technica AT2020 or a Rode NT1 going into a Focusrite interface or similar.
* **Use a Pop-Filter:** Use a Pop-Filter when recording. This will minimize plosives when recording.
* **Microphone Distance:** Position yourself at the right distance from the microphone - approximately two fists away from the mic is recommended, but it also depends on what type of recording you want.
* **Noise-Free Recording:** Ensure that the audio input doesn't have any interference, like background music or noise. The AI cloning works best with clean, uncluttered audio.
* **Room Acoustics:** Preferably, record in an acoustically-treated room. This reduces unwanted echoes and background noises, leading to clearer audio input for the AI. You can make something temporary using a thick duvet or quilt to dampen the recording space.
* **Audio Pre-processing:** Consider editing your audio beforehand if you're aiming for a specific sound you want the AI to output. For instance, if you want a polished podcast-like output, pre-process your audio to match that quality, or if you have long pauses or many "uhm"s and "ahm"s between words as the AI will mimic those as well.
* **Volume Control:** Maintain a consistent volume that's loud enough to be clear but not so loud that it causes distortion. The goal is to achieve a balanced and steady audio level. The ideal would be between -23dB and -18dB RMS with a true peak of -3dB.
* **Sufficient Audio Length:** Provide at least 30 minutes of high-quality audio that follows the above guidelines for best results - preferably closer to 3 hours of audio. The more quality data you can feed into the AI, the better the voice clone will be. The number of samples is irrelevant; the total runtime is what matters. However, if you plan to upload multiple hours of audio, it is better to split it into multiple \~30-minute samples. This makes it easier to upload.
* **Uploading:** After pressing upload, you will not be able to make any changes to the clone and it will be locked in. Ensure that you have uploaded the correct samples that you want to you.
* **Verify Your Voice:** Once everything is recorded and uploaded, you will be asked to verify your voice. To ensure a smooth experience, please try to verify your voice using the same or similar equipment used to record the samples and in a tone and delivery that is similar to what was present in the samples. If you do not have access to the same equipment, try verifying the best you can. If it fails, you will have to reach out to support.
Keep in mind that all of this depends on the output you want. The AI will try to clone everything in the audio, but for the AI to work optimally and predictably, we suggest following the guidelines mentioned above.
Once you've uploaded your samples, there are four stages of the cloning process that you might see on your voice card.
* **Verify:** This means that they have uploaded the voice samples, but you have not yet finished the verification step. You will need to finish this step before it can start training.
* **Processing:** This means that the voice has been verified and is preprocessing, ready to be trained. When you've reached this step, the rest is automatic, and you will not need to do anything.
* **Fine-tuning:** This is when the voice is actually training. Along with this label, you will also see a loading bar to show you the progress.
* **Fine-tuned:** This means the voice has finished training and is ready to be used!
## Scripts
What you read is not very important; how you read it is very important, however. The AI will try to mimic everything it hears in a voice: the tonal quality, the accent, the inflection, and many other intricate details. It will replicate how you pronounce certain words, vowels, and consonants, but not the actual words themselves. So, it is better to choose a text or script that conveys the emotion you want to capture, and read in a tone of voice you want to use.
* [Audiobook](/docs/product/voices/voice-lab/scripts/the-great-gatsby)
* [News Article](/docs/product/voices/voice-lab/scripts/news-article)
* [Social Media](/docs/product/voices/voice-lab/scripts/social-media)
* [Meditation](/docs/product/voices/voice-lab/scripts/meditation)
* [Elearning](/docs/product/voices/voice-lab/scripts/elearning)
# Overview
> Discover AI voices from the ElevenLabs community
The [Voice Library](https://elevenlabs.io/voice-library) (VL) is a marketplace where our community can share voices and earn rewards when they're used. At the moment, only Professional Voice Clones (PVCs) can be shared in the library. Instant Voice Clones (IVCs) cannot be shared for safety reasons.
## Using voices from the Voice Library
You can play a sample for any voice in the Voice Library by clicking it.
To use a voice from the Voice Library, you first need to add it to My Voices. To do this, click "Add". This will save it to My Voices using the default name for the voice. You can use it directly from the Voice Library by clicking "Use", which will open Speech Synthesis with the voice selected.
Once the voice has been added to My Voices, it will appear in the voice selection menu across all features.
## Details view
You can find out more information about a voice by clicking "View". This opens up a pane on the right which contains more information. Here you can see all the tags associated with the voice, including:
* the language it was trained on
* the age and gender of the voice
* the category, for example, "Conversational"
* how long the notice period is, if the voice has one
* if the voice has been labelled as High Quality
* what type of voice it is, for example, Professional Voice Clone
You can also see how many users have saved the voice to My Voices, and how many characters of audio have been generated with the voice.
Finally, you can also see suggestions of similar voices, and can play samples and add these to My Voices if you want.
### Category
Some labels tell you about the type of voice:
Voice Design voices are no longer shareable in the Voice Library; however, the
legacy shared voices will remain accessible.
}
>
Generated voices made using **[Voice
Design](/docs/product/voices/voice-lab/voice-design)**
}
>
Voices made using **[Professional Voice
Cloning](/docs/product/voices/voice-lab/professional-voice-cloning)**
The HQ label stands for High Quality, and indicates that this Professional
Voice Clone has been trained on audio which follows our **[Professional
Recording Guidelines](/docs/product/voices/voice-lab/professional-voice-cloning)**
and has passed a quality control check on input texts of various lengths.
### Sharing Options
Other labels tell you about options the voice owner set when sharing the voice. Please see the **[Sharing](/docs/product/voices/voice-library/sharing)** page for more details.
}
>
A label with a clock icon indicates that the voice has a Notice Period in
place. The Notice Period lets you now how long you'll continue to have
access to the voice if the voice owner decides to remove it from the Voice
Library.
Some voices have a credit multiplier in place. This is shown by a label
displaying, for example, x2 multiplier or x3 multiplier. This means that the
voice owner has set a custom rate for use of their voice. Please pay close
attention, as credit multipliers mean your account will be deducted >1x the
number of credits when you generate using a voice that has a credit
multiplier.
}
>
Some voices have "Live Moderation" enabled. This is indicated with a label
with a shield icon. When you generate using a voice with Live Moderation
enabled, we use tools to check whether the text being generated belongs to a
number of prohibited categories. This may introduce extra latency when using
the voice, and voices with Live Moderation enabled cannot be used in Projects.
## Filters, Sorting, and Search
To help you find the perfect voice for you, the Voice Library is searchable and filterable.
### Search box
You can use the search box to search by name, keyword and voice ID. You can also search by dragging and dropping an audio file, or uploading a file by clicking the upload icon. This will return the voice used, if it can be found, along with similar voices.
### Sort by
You have a number of options for sorting voices in the Voice Library:
* Trending: voices are ranked by our trending algorithm
* Latest: newest voices are shown first
* Most users
* Most characters generated
### Language filter
The language filter allows you to return only voices that have been trained on audio in a specific language. While all voices are compatible with our multilingual models and can therefore be used with all 32 languages we support, voices labelled with a specific language should perform well for content in that language
### Accent filter
If you select a specific language, the Accent filter will also become available. This allows you to look for voices with specific accents.
### More filters
Click the "More filters" button to access additional filters.
#### Category
* Voice Design
* Professional
* High-Quality
#### Gender
* Male
* Female
* Neutral
#### Age
* Young
* Middle Aged
* Old
#### Use case
You can click the use case of your choice to show only voices that have been labelled with this use case.
# Sharing Voices
> Learn how to share voices in the Voice Library.
## How to share a voice model in the Voice Library:
**1. Share Button:** To get started with sharing a voice model, find the voice model you want to share in My Voices and click the share icon:
**2. Sharing Toggle:** Next, activate sharing by enabling the "Sharing" toggle. Note that this doesn’t make your voice model automatically discoverable in the Voice Library.
**3. Sharing Link/Email Whitelist:** Once the "Sharing" toggle is enabled, you have a few ways to share your Voice Model:
* **Sharing Link:** share this link with your audience, your friends, or anyone else that you want to be able to make a copy of your voice model in My Voices.
* **Email Whitelist:** you can specify specific emails to restrict who can make copies of your voice model in My Voices using your Sharing Link. If you leave the whitelist blank, all emails will be enabled by default.
* **Discovery in Voice Library:** this makes your voice model discoverable in the Voice Library and takes you to the sharing dialog detailed in Step 4 below.
**4. Library Sharing Options:** if you enable "Discovery in Voice Library", you’ll be brought to a dialog screen where you can configure a few options for sharing your voice model in the Voice Library:
Please see the [Voice Library Addendum](https://elevenlabs.io/vla) to our Terms of Service for full descriptions of these options.
**5. Naming Guidelines:** Please ensure the name you give your voice model adheres to the guidelines shown in the sharing dialog:
* The naming pattern should be a one-word name followed by a 2-4 word description, separated by a hyphen (-).
* Your name should NOT include the following:
* Names of public individuals or entities (company names, band names, influencers or famous people, etc).
* Social handles (Twitter, Instagram, you name it, etc).
* ALL CAPS WORDS.
* Emojis and any other non-letter characters.
* Explicit or harmful words.
* The word “voice”.
* Some examples of names following our guidelines:
* Anna - calm and kind
* Robert - friendly grandpa
* Steve - wise teacher
* Harmony - soothing serenader
* Jasper - jovial storyteller
* Maya - confident narrator
**6. Scroll and accept terms:** Before sharing your voice model in the Voice Library, you’ll be asked to scroll and accept the [Voice Library Addendum](https://elevenlabs.io/terms#VLA) to our [Terms of Service](https://elevenlabs.io/terms) and provide additional consents and confirmations. Please do this carefully and ensure you fully understand our service before sharing. If you have any questions at this stage, you can reach out to us at [legal@elevenlabs.io](mailto:legal@elevenlabs.io).
Before you share your voice to the Voice Library, we have a few guidelines that need to be followed. These guidelines are in place to ensure better discoverability and to maintain a clean and organized appearance for everyone using the platform. Please take the time to read through the guidelines below. They will help you understand how you should name, categorize, and tag your voice to enhance the overall experience for users.
### Review
Once you’ve created, named, and shared your voice, it will be set for pending review. This means that someone from the ElevenLabs team will go through your voice to ensure that it adheres to the guidelines outlined above. If there are significant issues, your request to share the voice model will be declined. If only small changes are required, the team might make these adjustments for you and approve the voice model for sharing.
As part of the review process, our team may add labels to your voice model to make it discoverable using the filters for the Voice Library:
* Gender
* Accent
* Language (the language of the source audio used to create your PVC)
* Age
* Use case
* Descriptive
Consistently uploading voices that do not adhere to the guidelines or are highly explicit in nature might result in being barred from uploading and sharing voices altogether. Therefore, please adhere to the guidelines.
Currently, we do not have an estimate of how long the review process will take, as it is highly dependent on the length of the queue.
# Step-by-step Guide
> Step-by-step guide to creating the highest quality voice clone available
## What is a Professional Voice Clone (PVC)?
A Professional Voice Clone (PVC) is a special feature that is available to our Creator+ plans. A PVC is an ultra-realistic, custom AI model of your voice. This is done by training our specialized model with longer voice data (at least 30 mins and up to 3 hours for optimum results) to make it sound just like the original voice.
Essentially, a PVC is a more advanced version of our Instant Voice Cloning feature. For now, we only allow you to clone your own voice. You will be asked to go through a verification process with our voice Captcha before submitting your fine-tuning request.
Custom AI models require fine-tuning and training, so PVCs will take longer (about 4 to 8 hours) compared to Instant Voice Clones.
Video
## How to create a PVC?
### A Step-by-step guide to create a high-quality PVC:
**1. Go to your VoiceLab by first clicking on the “Voices” tab:**
![](https://lh7-us.googleusercontent.com/2XaOuuOmIs-CJ90NnY3yt-C5fbk0AUtMgxVFyB-7AYHum9r0Ooxr8R__Fvo-uE_fmTCQ26w0wRy2JvHYkqWelBh8dwANrnPdIcmt2n7_sArxsoheSNxyEURaHzoMtBuYW55GADkfYSdKvYYUIWKlSIw)
**2. Click on “Add Generative or Cloned Voice” and choose “Professional Voice Cloning”:**
![](https://lh7-us.googleusercontent.com/vRCfOP45g8elNMzGsXLguRfhYsmXhIGSZaMDv3Wl0lT8JQ5NTLdm9i8TnfoYt5N0TpotzlCV63o2lJK655CVqgXDlKcsfiUPWvaOXc37yPI8Vrwxh5Ul4vqoJSZGyXNpTGX8Di1NXAImOiIjVUVLslY)
**3. Confirm that you have read our Guidelines and Rules and click “Start”:**
![](https://lh7-us.googleusercontent.com/PJfCTiW0Ka1qWnWVZdDGPIMQrOjsIN1TPhdAwUE089viKEwHj2mnXCdpJHcBsaQZTBetSXSUrhrHKXwPqDG4rV7yT9iPiAZPTWogEj3sjYhw7gxb9B2bp-U_x_RQWO__ay5FM423OqLjGFEKlegJS3Q)
**4. Name your PVC, Choose the Language, and Upload your recordings:**
![](https://lh7-us.googleusercontent.com/kGimSZIHq6T2l40V-qpQuZS-bCIJ9PoIkPtsYjt83oUVMQug3H_TmRZ4eJd_B7nXMQTtU2JCbKdKAE_EK5HBKbdU3yHzCPxCPU_Fu3L4i3Ye_gLrSrvEOiAkt3gPdA7WoEfz_519h4nXo8D7g3R02aY)
**5. Add Labels and a Description for your Professional Clone Voice (this can be changed later) and click “Create Professional Voice”:**
![](https://lh7-us.googleusercontent.com/l-ZCbkR-Hdu0DV0h-4yy-ZlRsm-uK7fwEYDHmhGXhSIu30ke4JGAr2Cr_Ozp4UcRZaGlAGMSNSmIMwogH_fOzn3OgIcYcBaEnUnwJ9z5ggEpybq09e44J5gk45hqHO6hMvq6PhYrzJCh573-4rnLymE)
**6. Once the PVC model has accepted all of your Audio Samples, you will then be asked to verify your voice:**
![](https://lh7-us.googleusercontent.com/8hHkdxNPtcpV2Zuvd0u8tEQzrg1EnLBeWAPucODJA2RSRJUcJ8nR5R_vPW3KFFYQ1pmx8xxPQatcQbZ-lFKwxbIT-Wc0URjmZ2bBU1CvPnJzZw75SBIyq0EYxiur3yXshQw2eXix-Vlmn8tM4_B9FB8)
Before moving on, ensure that your browser has access to your Microphone and that you are not muted. The AI will compare your voice to the samples you just submitted, so it’s important to use the same equipment that you used to record your uploaded audio samples - try your best to match the tone and delivery!
![](https://lh7-us.googleusercontent.com/F8Ix9L_a1GWr4Kj7bqTzVfzabwgCh4LoRs3W2uvK8Tv52FqKQJjJkyYrDFHR2jgvHxvdPPOCLYuPkXrMX20dFWSd2qHWdm9x_XlyzzP5_-lZJzcY5-QSy-7Ep6pNBhYvPStJaEb1cvG291cO3X3_Brw)
Once you’re ready, click “Start Recording” and you will see a generated captcha to read such as the following:
![](https://lh7-us.googleusercontent.com/Zb-vtAh4LYDRfcbT_AI4m5FuOZXFv8Endh1Cz1yHNOBGTQytmosx7xjb2oqlx4lj9ULaZZXAy2MJ2cz7142rImBWB21cptWXGT13qCQhreSciFz0sbwZ6nzKKRf1MuWfu4WrKrgQ7KEVlO_cgkQRdns)
If the AI detects a difference in the audio quality of your recordings, you may see an error message letting you know:
![](https://lh7-us.googleusercontent.com/AupFUKrsuEBcLpDzmaEA_tzfKjvcAqhdcvcGJ_Se8J-nxApuQ2iRFBm4f34sb__LBNdME13leBJuuY699zOstNokMGRkJryQiLP-8kWYxFJdbCp8TklFL__4fIbqWkNQoLmMcYvqJRya91qSK97rJkI)
Ensure that your audio quality matches your uploaded recordings, speak clearly and concisely, and try again:
![](https://lh7-us.googleusercontent.com/pvNzcYfMXCL_26eCDMIe_JrULNEXSTghdhm03v62hdf1OVObYbRzOAbCa260BNpMyYExe_Hg4cS4AJ-Ej2Sc0YyP-3AsRmTrQuRdEL5d8yfwu_F-hZRDh2ODn-x-jJeUAe6-dv5H3BrrB79sTP1torY)
**7.Once successful, your voice will be marked as “successfully verified”. Your PVC will be then queued for Fine-Tuning:**
![](https://lh7-us.googleusercontent.com/_vt2hVW8SubcO0PHkTYEfsMk1yJhiSOjD6vcs0HpnXw7f2M0IIowS3sgKAp6Ba6ZhSPcauMOarHdYYvk6THMKe7HD7dY-sGG0LIscyumuoLz4XKRKkrefUHHdpqk9QJa5uypr_g0nHFgKnWUW6QlRSg)
**8. After fine-tuning your PVC, you will find it in your VoiceLab. Click "use" so it can appear in the Speech Synthesis page, so that you can use it to generate the audio you need.**
**9. If you would like to share your PVC in our Voice Library and start earning passively with [Payouts](https://elevenlabs.io/payouts), follow [these steps](https://elevenlabs.io/docs/voices/voice-library/sharing).**
## Recording audio for your PVC
### Recording Key Considerations
Before you upload your audio samples for Professional Voice Cloning (PVC), there are key considerations to keep in mind to achieve the best results.
1. **Recording Quality**
Firstly, Professional Voice Cloning is highly accurate in cloning the samples used for its training. It will create a near-perfect clone of what it hears, including all the nuances and characteristics of that voice, but also including any artifacts and unwanted audio present in the samples. This means that if you upload low-quality samples with background noise, room reverb/echo, or any other type of unwanted sounds, the AI will try to replicate all of these elements in the clone as well. Making your model also have ample background noise, sibilance, or reverb. Please follow these guidelines for best results.
2. **Clear audio with a single speaker and no background music or sound effects**
Ensure there’s only a single speaking voice throughout the audio, as more than one speaker or excessive noise or anything of the above can confuse the AI. This confusion can result in the AI being unable to discern which voice to clone or misinterpreting what the voice actually sounds like because it is being masked by other sounds, leading to a less-than-optimal clone.
3. **Use at least 30 mins to 3 hours of audio**
The bare minimum we recommend is 30 minutes of audio, but for optimal results and the most accurate clone, we recommend closer to 3 hours of audio. The more quality data you can feed into the AI, the better the voice clone will be. This can be either one long file, or several different files. If you choose to upload multiple audio files, make sure they have the same audio quality and are recorded in the same space. However, if you plan to upload multiple hours of audio, it is better to split it into multiple \~30-minute samples. This makes it easier to upload.
4. **Use a consistent delivery style**
The speaking style in your samples will be replicated in the output. For consistent results, use one style per upload. For instance, if you're creating a voice model intended for audiobooks, submit recordings of yourself reading books in a consistent style, avoiding different character voices or else this will create errors in your voice model. This does not mean monotone or emotionless, feel free to vary your tone and emotion according to the context of the text.
5. **Use audio samples in the same language as your PVC model**
For best results, use samples in the language you primarily intend the PVC for. While the AI can speak any supported language, cloning a voice from a different language may result in accents or mispronunciations. For example, if you clone an English voice for Spanish, it may retain an English accent. We only support cloning samples recorded in one of our supported languages, and the application will reject your sample if it is recorded in an unsupported language.
6. **Clone your own voice only**
For now, we only allow you to clone your own voice. You will be asked to go through a verification process with our voice Captcha before submitting your fine-tuning request.
### Recording Quality Guidelines
Whether you’re new to voice recording or a seasoned professional. Here are some quality guidelines to consider. **Please note that if you're sharing your PVC in our Voice Library and it follows these guidelines and showcases consistent output**, your PVC **may** earn a High-Quality Badge in our Voice Library, enhancing your ranking and potential earnings!
**General recording guidelines:**
* **Use professional recording equipment:** The AI will clone everything in your audio. High-quality input = high-quality output. Opt for a professional XLR mic going into a dedicated audio interface.
* **Use a pop-filter**: This will minimize plosives when recording.
* **Microphone distance**: Position yourself at the right distance from the microphone - approximately two fists away from the mic is recommended, but it also depends on what type of recording you want.
* **Noise-free recording:** Ensure that the audio input doesn’t have any interference, like background music or noise. The AI cloning works best with clean, uncluttered audio.
* **Room acoustics:** Always record in an acoustically-treated room. This reduces unwanted echoes and background noises, leading to clearer audio input for the AI.
* **Audio pre-processing** (optional): You might find that adding light compression or other tools can improve your audio files before creating your PVC. Please note that excessive processing can have diminishing returns, so it’s best to be conservative with these effects.
* **Volume control:** Maintain a consistent volume that’s loud enough to be clear but not so loud that it causes distortion. The goal is to achieve a balanced and steady audio level. The ideal would be between -23dB and -18dB RMS with a true peak of -3dB.
* **Audio file format:** Mono, .wav, Minimum 44.1 kHz sample rate, and Minimum 16-bit depth
### Please avoid these technical recording issues:
* Room echo or "boxiness.”
* Background noise, including hiss, white noise, electrical hum, or external disturbances.
* Apparent editing issues (i.e. clicks, pops, audible cuts).
* Distortion, clipping, heavy compression, or excessive processing (i.e. noise gate, noise reduction plugin, normalization, EQ).
* Sibilance, loud breath noises, plosives, and mouth clicks.
* Repeats, mistakes, and long periods of silence (5 seconds or more).
* Voice level/input gain imbalance anywhere in the recording.
### Performance guidelines:
* Emphasis, intonations and emotions should align appropriately with the context of the text to create a realistic PVC.
* In some cases (e.g. audiobooks), emotional range and variance is helpful in delivering an engaging performance and creating a great AI voice. Our models can capture this emotional range, but the voice itself should remain consistent.
* ***Please vary your tone*** and pace naturally when reading. ✅
* ***Please avoid changing voices*** for different characters in a single recording or else this will create errors in your voice model. ❌
* Ensure correct and articulate pronunciation.
* Avoid sounding nasal, muffled, or wet (excess saliva).
## Beginner's Guide to Audio Recording
New to audio recording? Follow our guideline below!
**1) Recording Location**
When recording audio, choose a suitable location and set up to minimize room echo/reverb.
So, we want to "deaden" the room as much as possible. This is precisely what a vocal booth that is acoustically treated made for, and if you do not have a vocal booth readily available, you can experiment with some ideas for a DIY vocal booth, “blanket fort”, or closet.
Here are a few YouTube examples of DIY acoustics ideas:
* [I made a vocal booth for \$0.00!](https://www.youtube.com/watch?v=j4wJMDUuHSM)
* [How to Record GOOD Vocals in a BAD Room](https://www.youtube.com/watch?v=TsxdHtu-OpU)
* [The 5 BEST Vocal Home Recording TIPS!](https://www.youtube.com/watch?v=K96mw2QBz34)
**2) 2) Equipment: Microphone, pop-filter, and audio interface**
A good microphone is crucial. Microphones range from $100 to $10,000, but a professional XLR microphone costing $150-$300 is sufficient for most voiceover work.
For an affordable yet high-quality setup for voiceover work, consider a **Focusrite** interface paired with an **Audio-Technica AT2020** or **Rode NT1 microphone**. This setup, costing between $300 to $500, offers high-quality recording suitable for professional use, with minimal self-noise for clean results.
Also, please ensure that you have a proper **pop-filter** in front of the microphone when recording to avoid plosives as well as breaths and air hitting the diaphragm/microphone directly, as it will sound poor and will also cause issues with the cloning process.
**3) Digital Audio Workstation (DAW)**
There are many different recording solutions out there that all accomplish the same thing: recording audio. However, they are not all created equally. As long as they can record WAV files at 44.1kHz or 48kHz with a bitrate of at least 24 bits, they should be fine. You don't need any fancy post-processing, plugins, denoisers, or anything because we want to keep audio recording simple.
If you want a recommendation, we would suggest something like **REAPER**, which is a fantastic DAW with a tremendous amount of flexibility. It is the industry standard for a lot of audio work. For a personal license or a discounted license, it is only \$60. Another good free option is **Audacity**.
Maintain optimal recording levels (not too loud or too quiet) to avoid digital distortion and excessive noise. Aim for peaks of -6 dB to -3 dB and an average loudness of -18 dB for voiceover work, ensuring clarity while minimizing the noise floor. Monitor closely and adjust levels as needed for the best results based on the project and recording environment.
**4) Positioning**
One helpful guideline to follow is to maintain a distance of about two fists away from the microphone, which is approximately 20cm (7-8 in), with a pop filter placed between you and the microphone. Some people prefer to position the pop filter all the way back so that they can press it up right against it. This helps them maintain a consistent distance from the microphone more easily.
Another common technique to avoid directly breathing into the microphone or causing plosive sounds is to speak at an angle. Speaking at an angle ensures that exhaled air is less likely to hit the microphone directly and, instead, passes by it.
**5) Performance**
The performance you give is one of the most crucial aspects of this entire recording session. The AI will try to clone everything about your voice to the best of its ability, which is very high. This means that it will attempt to replicate your cadence, tonality, performance style, the length of your pauses, whether you stutter, take deep breaths, sound breathy, or use a lot of "uhms" and "ahs" – it can even replicate those. Therefore, what we want in the audio file is precisely the performance and voice that we want to clone, nothing less and nothing more. That is also why it's quite important to find a script that you can read that fits the tonality we are aiming for.
When recording for AI, it is very important to be consistent. if you are recording a voice either keep it very animated throughout or keep it very subdued throughout you can't mix and match or the AI can become unstable because it doesn't know what part of the voice to clone. same if you're doing an accent keep the same accent throughout the recording. Consistency is key to a proper clone!
## Scripts
Here’s a variety of English scripts to help you create PVCs optimized for some of the most popular use cases.
Please remember that what you read is not very important; how you read it is very important, however. The AI will try to mimic everything it hears in a voice: the tonal quality, the accent, the inflection, and many other intricate details. It will replicate how you pronounce certain words, vowels, and consonants, but not the actual words themselves. So, it is better to choose a text or script that conveys the emotion you want to capture, and read in a tone of voice you want to use, and optimized for the use case it’s intended to serve.
* [Audiobook](/docs/product/voices/voice-lab/scripts/the-great-gatsby)
* [News Article](/docs/product/voices/voice-lab/scripts/news-article)
* [Social Media](/docs/product/voices/voice-lab/scripts/social-media)
* [Meditation](/docs/product/voices/voice-lab/scripts/meditation)
* [Elearning](/docs/product/voices/voice-lab/scripts/elearning)
# Payouts
> Earn rewards for sharing voices in the Voice Library
The [Payouts](https://elevenlabs.io/payouts) (VL) system allows you to earn rewards for sharing voices in the Voice Library. ElevenLabs uses Stripe Connect to process payments.
## Account setup
To set up your Payouts account:
* Go to "Payouts" in the sidebar and click "Create Payout Account"
* Follow the prompts from Stripe Connect to finish setting up your account
## Tracking usage and earnings
* You can track the usage of your voices by going to your My Voices, clicking "View" to open the detailed view for your voice, then clicking the sharing icon at the bottom. Once you have the Sharing Options open, click "View Metrics".
* The rewards you earn are based on the options you selected when sharing your voice in the Voice Library.
* You can also see your all-time earnings and past payouts by going back to your Payouts page
## Reader App Rewards
* If your voice is marked as **[High-Quality](/docs/product/voices/voice-library/overview#category)** and you have activated the "Available in ElevenReader" toggle, your voice will made be available in the ElevenReader. Rewards for ElevenReader are reported separately – to view your Reader App rewards, check the "ElevenReader" box on your "View Metrics" screen.
## Things to know
* Rewards accumulate frequently throughout the day, but payouts typically happen once a week as long as you have more than \$10 in accrued payouts. You can see your past payouts by going to your [Payouts](https://elevenlabs.io/app/payouts) page in the sidebar.
## Supported Countries
* Currently, Stripe Connect is not supported in all countries. We are constantly working to expand our reach for Payouts and plan to add availability in more countries when possible.
- Argentina - Australia - Austria - Belgium - Bulgaria - Canada - Chile -
Colombia - Croatia - Cyprus - Czech Republic - Denmark - Estonia - Finland -
France - Germany - Greece - Hong Kong SAR China - Hungary - Iceland - India -
Indonesia - Ireland - Israel - Italy - Japan - Latvia - Liechtenstein -
Lithuania - Luxembourg - Malaysia - Malta - Mexico - Monaco - Netherlands -
New Zealand - Nigeria - Norway - Peru - Philippines - Poland - Portugal -
Qatar - Romania - Saudi Arabia - Singapore - Slovakia - Slovenia - South
Africa - South Korea - Spain - Sweden - Switzerland - Thailand - Taiwan -
Turkey - United Arab Emirates - United Kingdom - United States - Uruguay -
Vietnam
# Pronunciation
> Effective techniques to guide ElevenLabs AI to achieve the correct pronunciation.
## Phoneme Tags
This feature is currently only supported by the "Turbo v2" and "Eleven English v1" models
In certain instances, you may want the model to pronounce a word, name, or phrase in a specific way. Pronunciation can be specified using standardised pronunciation alphabets. Currently we support the International Phonetic Alphabet (IPA) and the CMU Arpabet. Pronunciations are specified by wrapping words using the Speech Synthesis Markup Language (SSML) phoneme tag.
To use this feature as part of your text prompt, you need to wrap the desired word or phrase in the phoneme tag. In each case, replace `"your-IPA-Pronunciation-here"` or `"your-CMU-pronunciation-here"` with your desired IPA or CMU Arpabet pronunciation:
`word`.
`word`
An example for IPA:
```
actually
```
An example for CMU Arpabet:
```
actually
```
It is important to note that this only works per word. Meaning that if you, for example, have a name with a first and last name that you want to be pronounced a certain way, you will have to create the pronunciation for each word individually.
English is a lexical stress language, which means that within multi-syllable words, some syllables are emphasized more than others. The relative salience of each syllable is crucial for proper pronunciation and meaning distinctions. So, it is very important to remember to include the lexical stress when writing both IPA and ARPAbet as otherwise, the outcome might not be optimal.
Take the word "talon", for example.
Incorrect:
```
talon
```
Correct:
```
talon
```
The first example might switch between putting the primary emphasis on AE and AH, while the second example will always be pronounced reliably with the emphasis on AE and no stress on AH.
If you write it as:
```
talon
```
It will always put emphasis on AH instead of AE.
With the current implementation, we recommend using the CMU ARPAbet as it seems to be a bit more consistent and predictable with the current iteration of AI models. Some people get excellent results with IPA, but we have noticed that ARPAbet seems to work better with the current AI and be more consistent for a lot of users. However, we are working on improving this.
### Alternatives
Because phoneme tags are only supported by the Turbo v2 and English v1 models, if you're using the Multilingual v2, Turbo v2.5 or Flash models, you might need to try alternative methods to get the desired pronunciation for a word. You can find an alternative spelling and write a word more phonetically. You can also employ various tricks such as capital letters, dashes, apostrophes, or even single quotation marks around a single letter or letters.
As an example, a word like "trapezii" could be spelt "trapezIi" to put more emphasis on the "ii" of the word.
## Pronunciation Dictionaries
Some of our tools, such as Projects and Dubbing Studio, allow you to create and upload a pronunciation dictionary. These allow you to specify the pronunciation of certain words, such as character or brand names, or to specify how acronyms should be read. Pronunciation dictionaries allow this functionality by enabling you to upload a lexicon or dictionary file that specifies pairs of words and how they should be pronounced, either using a phonetic alphabet (phoneme tags) or word substitutions (alias tags).
Whenever one of these words is encountered in a project, the AI model will pronounce the word using the specified replacement. When checking for a replacement word in a pronunciation dictionary, the dictionary is checked from start to end and only the first replacement is used.
To provide a pronunciation dictionary file, open the settings for a project and upload a file in either TXT or the [.PLS format](https://www.w3.org/TR/pronunciation-lexicon/). When a dictionary is added to a project it will automatically recalculate which pieces of the project will need to be re-converted using the new dictionary file and mark these as unconverted.
Currently we only support pronunciation dictionaries that specify replacements using phonemes or aliases.
Both phonemes and aliases are sets of rules that specify a word or phrase they are looking for, referred to as a grapheme, and what it will be replaced with. Please note that searches are case sensitive.
### Alias Tags
The alias tag is used to specify pronunciation using other words or phrases. For example, you could use an alias tag to specify that "UN" should be read as "United Nations" whenever it is encountered in a project.
If you're generating using Multilingual v2 or Flash/Turbo v2.5, which don't support phoneme tags, you can use alias tags to specify how you want a word to be pronounced using other words or by spelling the word out more phonetically. Alias tags can be used with all our models, so they can be useful for specifying pronunciation when included in a pronunciation dictionary for Projects, Dubbing Studio or Speech Synthesis via the API.
For example, if your text includes a name that has an unusual pronunciation that the AI might struggle with, you could use an alias tag to specify how you would like it to be pronounced:
```
ClaughtonCloffton
```
### Pronunciation Dictionary Example
Here is an example pronunciation dictionary that specifies in IPA the pronunciation of "Apple" with IPA of "ˈæpl̩" and "UN" with an alias of "United Nations":
```
Appleˈæpl̩UNUnited Nations
```
# Pauses
> How to add pauses to your generated speech.
There are a few ways to introduce a pause or break and influence the rhythm and cadence of the speaker. The most consistent way is programmatically using the syntax ``. This will create an exact and natural pause in the speech. It is not just added silence between words, but the AI has an actual understanding of this syntax and will add a natural pause.
An example could look like this:
```
"Give me one second to think about it." "Yes, that would work."
```
Break time should be described in seconds, and the AI can handle pauses of up to 3 seconds in length.
However, since this is more than just inserted silence, how the AI handles these pauses can vary. As usual, the voice used plays a pivotal role in the output. Some voices, for example, voices trained on data with "uh"s and "ah"s in them, have been shown to sometimes insert those vocal mannerisms during the pauses like a real speaker might. This is more prone to happen if you add a break tag at the very start or very end of your text.
Please avoid using an excessive number of break tags as that has shown to potentially cause some instability in the AI. The speech of the AI might start speeding up and become very fast, or it might introduce more noise in the audio and a few other strange artifacts. We are working on resolving this.
### Alternatives
These options are inconsistent and might not always work. We recommend using the syntax above for consistency.
One trick that seems to provide the most consistence output - sans the above option - is a simple dash `-` or the em-dash `—`. You can even add multiple dashes such as `-- --` for a longer pause.
```
"It - is - getting late."
```
Ellipsis `...` can sometimes also work to add a pause between words but usually also adds some "hesitation" or "nervousness" to the voice that might not always fit.
```
I... yeah, I guess so..."
```
# Pacing and Emotion
> Effective techniques to guide ElevenLabs AI in pacing the speech and conveying emotions.
## Pacing
Based on varying user feedback and test results, it's been theorized that using a singular long sample for voice cloning has brought more success for some, compared to using multiple smaller samples. The current theory is that the AI stitches these samples together without any separation, causing pacing issues and faster speech. This is likely why some people have reported fast-talking clones.
To control the pacing of the speaker, you can write in a style similar to that of a book. While it's not a perfect solution, it can help improve the pacing and ensure that the AI generates a voiceover at the right speed. With this technique, you can create high-quality voiceovers that are both customized and easy to listen to.
```
"I wish you were right, I truly do, but you're not," he said slowly.
```
## Emotion
If you want the AI to express a specific emotion, the best approach is to write in a style similar to that of a book. To find good prompts to use, you can flip through some books and identify words and phrases that convey the desired emotion.
For instance, you can use dialogue tags to express emotions, such as `he said, confused`, or `he shouted angrily`. These types of prompts will help the AI understand the desired emotional tone and try to generate a voiceover that accurately reflects it. With this approach, you can create highly customized voiceovers that are perfect for a variety of applications.
```
"Are you sure about that?" he said, confused.
"Don’t test me!" he shouted angrily.
```
You will also have to somehow remove the prompt as the AI will read exactly what you give it. The AI can also sometimes infer the intended emotion from the text’s context, even without the use of tags.
```
"That is funny!"
"You think so?"
```
This is not always perfect since you are relying on the AI to understand if something is sarcastic, funny etc from the context of the text.
# Overview
> An in-depth overview of using Projects
## Creating a Project
Projects is an end-to-end workflow for creating long-form content. It allows you to upload a full book or document. You can even import a whole webpage via a URL. The AI can then generate a voiceover narration for the entire book, document, or webpage. You can then download either individual MP3 files for each chapter or as a single MP3 file for the whole audiobook.
We will provide a brief walkthrough of this feature, but we recommend that you test it yourself by navigating to the Projects tab in the menu.
Once you enter the new tab, you will encounter a screen where you can create new projects or open existing ones. The number of projects you can have at any given time is determined by your subscription. The higher your subscription is, the more projects you can have concurrently.
Click "Add a new project” and you will be presented with a popup. Here, you can choose to create a new empty project, import an already existing EPUB, PDF, TXT or HTML file, which will then automatically be converted into a project, or import text directly from a website using the URL to have the page be converted into a project. You can then use our [Audio Native](/docs/product/audio-native/overview) feature to easily and effortlessly embed any narration project onto your website.
For now, let's create a new empty project. You can name your project and choose the default voice. Additionally, you will need to select the model that will be used and decide the quality settings. The voice and its settings can be changed after the project is created.
Model and quality settings will remain locked after the project has been created and cannot be changed without creating a completely new project from scratch.
The quality setting determines the quality of the rendered output of your projects. This setting decides the bitrate for the MP3/Lossless WAV and quality optimization. For most people, standard or high settings will be sufficient. However, for those who require the highest possible quality we offer Ultra and Ultra Lossless (an uncompressed WAV file) which might be preferable in certain cases. These different quality settings have different costs associated with them, as they require different computational resources. Ultra Lossless is quite computationally intensive, making it the most expensive option. You are more than welcome to experiment with these different quality settings to find the one that best suits your project.
Once you set all the settings, press Create Project, you will be redirected to the editor.
## Settings and Buttons
Once inside the project, you will be presented with a blank page. However, if you choose to create a project by either importing a file or using a URL, you will be presented with that text as the system will automatically fill out the pages for you. If the EPUB, is well-structured and correctly formatted, it will also automatically split each chapter into its own chapter in Projects, making it very easy to navigate.
If you've ever used an online text editor, you will find yourself very at home with both the look and the structure of Projects, but we do have a few nifty features that will help you with especially long-form content.
At the top, you have a few buttons. You can hover over some of these buttons to get more information.
Most of these are probably pretty self-explanatory, but let's go through all of them.
**1. Play** will play currently selected paragraph, if it's not generated yet it will generate it. You can open options to change the button behaviour.
**Play until end** means that when you play a paragraph, it will continue and play the next one once the first one finishes. This makes it so that you can listen to your audiobook without having to pre-render all of the paragraphs first.
**2. Regenerate** will regenerate the currently selected paragraph and give you a new performance from your voice. This will replace previous generation, but you can restore it in **5. Generation history**. You can also select part of the paragraph to regenerate only one sentence.
**3. Voice** will change the voice of selected paragraph(s). You can select multiple paragraphs to change voice for bigger selection. You can select fragment of the paragraph to have multiple speakers in one paragraph.
**4. Voice settings** will change voice settings for currently used voice or current paragraph.
**5. Paragraph type** allows you to set headings for easier reading. Headings have longer audio break after them.
**6. Generation history** allows you to restore previous generations for each paragraph. It's helpful if you edited an already generated paragraph by mistake, or you liked the previous generation better.
**7. Locking** allows you to disable paragraph editing after you are happy with the performance.
**8. Project settings** allow you to change general settings, export settings and share settings.
**9. Convert** allows you to convert a whole Project or Chapter at once. After you converted Project you can download .mp3 or .wav audio file and .zip file with every chapter.
**10. Credits balance** allows you to control your spending.
**11. Projects title section** allows you to change a Project's title and toggle visibility of the Chapter list.
**12. Add a new chapter** creates a new chapter.
**13. Locked paragraph** is indicated by a Lock icon. You can't edit it unless you unlock it first.
**14. Current paragraph** current paragraph is indicated by the highlight. Different colours indicate different speakers.
**15. Chapters list** allows you to navigate between chapters, clicking currently picked chapter name allows you to rename it.
**16. Unconverted paragraphs** have a grey line on the left. Converted paragraphs have a darker line next to them.
## Using Projects
Now you can start writing. Please ensure that you use proper grammar and paragraph structures, as well as using line breaks where appropriate, as the AI will use these when generating. This goes for both Projects and Speech Synthesis, but it is even more important in Projects for optimal results.
When you have finished writing your text and are happy with it, you can generate a voiceover for it. You can click the paragraph - for which you want to generate audio. The current selection will be highlighted in colour. Then to generate that section to audio, just click the play button at the top. This will initiate the generation of audio for the specific section you have highlighted. Once the audio has finished generating, it will play. This process is similar to how audio generation works on the Speech Synthesis page.
Paragraphs that have already been generated are indicated by the black bar on the left-hand side of each paragraph. If you press the play button on the top bar and a paragraph has already been generated, it will just play that paragraph. However, if you press the regenerate button, with two circling arrows, it will regenerate the paragraph. Once a paragraph has been generated at least once, you can select one or more words to regenerate only the selected text, rather than the whole paragraph. For the best results, we recommend regenerating a complete phrase or sentence at a time.
If you press the play button, and the paragraph is fully generated, you can also download the paragraph by clicking the download button in the lower right corner of the player. This is exactly how it works in the Speech Synthesis. However, this button will only appear when something is finished generating. So, if you have "play until end" activated, it will not appear because the AI will keep generating the next section after the next section., meaning this only works for downloading individual paragraphs.
If you want to convert the entire chapter in one go, you can click the convert button in the upper right corner. This will open a page where you can choose to convert either your entire project or individual chapters. You can also download the entire project or individual chapters. Even after converting the whole chapter, you can still go back and regenerate sections of the book that you are not happy with before downloading the entire thing. However, if you make any changes, you will need to press convert once again for the changes to be reflected in the whole book, so you can download the entire chapter.
After the conversion of either a whole project or individual chapters has finished, you will be able to see these conversions by clicking “Versions” next to either the project or the individual chapters. You can then download the different versions.
Once your Project is converted, you have several download options available.
## Pronunciation Dictionaries
Sometimes you may want to specify the pronunciation of certain words, such as character or brand names, or specify how acronyms should be read. Pronunciation dictionaries allow this functionality by enabling you to upload a lexicon or dictionary file that includes rules about how specified words should be pronounced, either using a phonetic alphabet (phoneme tags) or word substitutions (alias tags).
Whenever one of these words is encountered in a project, the AI will pronounce the word using the specified replacement. When checking for a replacement word in a pronunciation dictionary, the dictionary is checked from start to end and only the first replacement is used.
You can add a pronunciation dictionary to your project from the General tab in Project settings.
# Overview
> Dubbing made easy: reach a broader audience with ElevenLabs.
## What is dubbing?
> **Dubbing** (ˈdʌbɪŋ)
>
> Noun
>
> provide (a film) with a soundtrack in a different language from the original
ElevenLabs was founded on the idea of creating amazing dubbing; a tool that would allow you to create a perfect dub in any language you desire, using the original voice of the actors and preserving the original performance, making all content more accessible.
## Getting Started
To get started, head over to the dubbing tab where you will be presented with a view of all your previously dubbed projects. When you open it up for the first time, it will be empty. If you've ever used [Projects](/docs/product/projects/overview) feature previously, this will feel very familiar to you.
To get started, click "Create New Dub", and you will be presented with a window containing a few different choices.
First, you will be asked to name the dub. You can also leave the name field empty if you wish to use the file name as the title of the dub.
Select the original language and the language it will be dubbed into.
Then you will be asked to select the video or audio you want to dub. You can crete a video dub on any subscription, but you need to be on the Creator plan or above to dub an audio file. You can either upload a video or audio file or import a video directly from YouTube, TikTok, X (Twitter), Vimeo, or other URL. There is a 500MB and 45-minute limit for the clip can be that you upload. You need to stay below both.
This limit can be extended by using our API, which allows for a maximum duration of 2.5 hours and a file size limit of 1 GB. You can learn more about our API and how to use it for dubbing in our guide, [How to dub video and audio with ElevenLabs](/docs/developer-guides/how-to-dub-a-video)
For a reduced cost, you can opt to add a watermark to your video. This option is only available for videos, which means that you will always have to pay the full cost when using audio files. It's not possible to remove a watermark after the dub has been created.
Cost of dubbing:
* Automatic dub with watermark - 2,000 credits per minute
* Automatic dub without watermark - 3,000 credits per minute
* Dubbing Studio with watermark - 5,000 credits per minute
* Dubbing Studio without watermark - 10,000 credits per minute.
One of the available options is to create a Dubbing Studio project. By checking this option, you'll unlock access to the Dubbing Studio editor. This interface will help you to enhance your audio content in various ways. You can easily regenerate specific clips to refine their quality. You can also change the voices within the editor. Furthermore, you have the ability to modify both the original transcripts and their translations. Please see our [guide to Dubbing Studio](https://elevenlabs.io/docs/dubbing/studio) for more information.
We recommend that you manually choose the number of speakers. You can also allow the AI to automatically detect how many individual speakers are in the clip, but it might not guess the correct number accurately and this will take more time than manually setting the number. If you don’t want to dub the whole video and only a portion of it, you can change the range that you want dubbed here.
There are a few things going on in the background when you create a dub. The AI can handle fully mixed and mastered audio with multiple speakers. It will separate the speakers from the background at an extremely high quality, keeping the soundtrack and the sound effects and Foley intact. It will also separate individual speakers even if they are overlapping. At the moment, the AI can handle up to three simultaneous speakers at a time.
After finalizing all your settings, review the total cost for the dubbing project, which is displayed underneath the "Create dub" button. Click "Create dub" to initiate the generation process.
The dub will then appear on the list of dubs on your Dubbing page, and you can track it's progress by checking the status. Once it's ready, you can click the ellipsis icon for the available options.
* View - this will bring up a preview of the dub, as well as show details about the dub such as the source, the source and target languages, and how many credits were used.
* Download - this will download your dub.
* Remove - this will delete the dub. Please note that it's not possible to restore deleted dubs.
# Studio
> The ultimate end-to-end workflow for creating amazing dubs.
## Overview
If you selected the option to create a Dubbing Studio project, once your dub finishes generating, you will see "Edit" when you click the three dots next to the dub in the list of dubs on your Dubbing page. Click "Edit" to open your Dubbing Studio project.
At first, when you open the studio, it might seem overwhelming, as there's a lot of information to take in. However, if you have used an audio or video editor before, you will most likely feel right at home with the layout of the studio.
* First, it is important to note that the initial version of the dub is an automated dub, and cannot be personalised. We compensate for this by providing credits equal to the cost of creating the project that can be used within the project. These will allow you to edit your content at no extra cost, giving you the opportunity to fully customize and regenerate your dub at least once.
* In the middle, you will see the speaker cards, which show the transcribed audio as well as the translated transcription. If you only see one set of cards, don't fret - to see both you have to select the language you want to work on. This defaults to the original.
* On the right-hand side, you will see the video clip that you uploaded to be dubbed. You can move this clip around and place it wherever you want. You can also resize it by dragging the corners of the clip.
* Below all of this, you have the timeline, which shows the different voices the AI extracted on individual tracks as well as clips indicating when a specific voice is speaking, along with the corresponding clips for the original audio.
* The timeline is also divided into a few different parts. On the left side, you can see the names of each speaker track. You can rename them here to keep your project organized. You can click the cogwheel and change settings across the whole track. However, keep in mind that if you do this, you will have to regenerate the already generated audio clips.
* In the middle section, you have the actual timeline, which includes all of the speech clips mentioned earlier.
* On the right-hand side, you have the settings for individual clips. When you have a clip selected, this is where you change the settings for that specific clip. You can change the volume, voice settings, or even the voice itself for the selected clip.
* Below this, you have the current dubs available for this project. You will see the original, which is just the original audio, and then all subsequent dubs that you have created for this project in all of the different languages. Click the plus button will add another dub to the project.
## Speaker Cards
Right in the middle of the studio view, you will see the speaker cards. These cards represent the text that is being spoken by a specific voice. You can both change the transcribed text – the text that the AI has automatically transcribed from the audio – and the translated text – the text the AI has automatically translated for you from the transcription.
When you first open the studio, you will most likely only see the transcribed text from the original audio and not the translated text. However, at the bottom, below the timeline, you will see a toggle where you can switch between all the languages the project is dubbed in. When you create it initially, you will only have the original language plus the language that you selected when creating the dub. If you click the other language, you should see that the speaker cards get split into two versions of the same text: one is the original text, and the second is the dubbed text.
This toggle also determines which language you hear the dub in. If you have the original selected, you will only hear the original language, but if you select one of the other languages that the video is dubbed into, you will hear those languages. I would recommend toggling the language that you have dubbed your project into so you can follow along with the guide a little bit easier.
## Timeline
Below the speaker cards, you will find the timeline. This is where you will refine and change the actual audio generated from the text in the speaker cards. It is segmented into different parts. On the left side, you have the tracks for each voice in the audio. In the middle, you have the clips that represent when a voice is speaking. On the right-hand side, you have the settings for the currently selected clip. We will go through all of these parts.
### Tracks
When you create your dub, you either specify the number of speakers manually (this is the recommended method) or let the AI automatically detect the number of speakers. Each speaker will be assigned a track, and each speaker will have clips on that track which represent when they're speaking and when they are not. These clips then represent the speaker cards – more on that later in the clips section.
On the left-hand side of each track, you have a few options. You can click the name, which usually just says "speaker" when you first create the dub, and then change it to the character name to keep it more organized.
On each track with a dubbed voice (not the tracks with the original voices), you will see a little cogwheel. If you click this, it will bring up some very important settings for each track. Here, you can change things such as stability, similarity, style for the whole track, as well as change how the clone is created. For example, you can select to have a clone created for each clip on the track individually (Clip Clone), create a single clone created from all clips for this speaker (Track Clone), or select a voice that you already have saved in My Voices. There's a third way to create a clone, which I will go through in the clips section.
Lastly, on each track, you have three dots that you can click to access the ability to remove the track from the project if you feel like it was created incorrectly by the AI. Perhaps it picked up some background noise and thought it was a speaker, but it was not, which means you can discard this track.
### Clips
Subsequently, each of these tracks will contain clips that represent the dialogue, audio, and speaker cards. These will be automatically created when you first create your dub.
If you click on a clip, the speaker cards will also jump to the appropriate location so you can easily find and edit transcriptions, translations, and performances. You will see two clips on top of each other in the same color; the top clip represents the original audio, and the bottom represents the dubbed audio. You can move these clips independently to adjust the audio within them. When you click a clip, it will be highlighted both in the timeline and in the speaker cards. This makes it very easy to edit specific clips without having to sync both views, as they do that automatically.
On each clip that represents a dubbed section, you will find two circling arrows which you can click to regenerate that specific clip. This will need to be done each time you have, for example, changed the settings, the voice, or the translation. You will need to regenerate the clip where this change occurred. If a clip needs to be regenerated, it will say “stale” next to these arrows. Regeneratating clips will cost credits.
If you have two clips that are very close together, you can click the gray icon between the two clips to combine them into a single, longer clip. Additionally, where the playhead is, you can click this gray icon to separate a clip into two individual clips.
If you drag either edge of a clip, you will extend or truncate it. You might notice that when you extend or truncate, the voice will either speed up or slow down, and the pitch will either go up or down as well. This is just an approximation, but you will have to regenerate the clip for the AI to be able to generate speech that will fit within the clip length and sound natural.
On the right-hand side, you will also see a few options. In contrast to the left-hand side options, which affect the whole track, these are the individual clip options. Here, you can set and change settings that will only affect the currently selected clip instead of the whole track. For example, you can set different values for stability, similarity, style, as well as adjust the volume. You can even specify a particular clone to be used for that particular clip only.
Lastly, you can right-click a clip to access a few more options. You can transcribe the audio again if you feel like the transcription was incorrect or if you've made changes to the clip. You can also delete the clip if you feel it shouldn't be there. The most interesting option here is probably that you can create a clone from a specific clip. One helpful tip is to find a clip that you like, where you feel the voice is good, right-click to create a clone from that clip, and then assign that clone to the whole track to achieve a consistent voice throughout. This is just one tip and may not work for all circumstances, but it can work very well in some cases.
### Adding Voiceover and SFX Tracks
Below the track list, you will see the following options:
* **Dubbed Speaker Tracks:** If you encounter multiple speakers mixed within a single track, you can create a new dubbed speaker track. This allows you to isolate and transfer clips containing additional voices to the new track.
* **Voiceover Tracks:** Voiceover tracks create new Speakers. You can click and add clips on the timeline wherever you like. After creating a clip, start writing your desired text on the speaker cards above. You'll first need to translate that text, then you can press "Generate". You can also use our voice changer tool by clicking on the microphone icon on the right side of the screen to use your own voice and then change it into the selected voice.
* **SFX Tracks:** Add a SFX track, then click anywhere on that track to create a SFX clip. Similar to our independent SFX feature, simply start writing your prompt in the Speaker card above and click “Generate” to create your new SFX audio. You can lengthen or shorten SFX clips and move them freely around your timeline to fit your project - make sure to press the “stale” button if you do so.
* **Upload Audio:** This option allows you to upload a non voiced track such as sfx, music or background track. Please keep in mind that if voices are present in this track, they won't be detected so it will not be possible to translate or correct them.
### "Dynamic" vs. "Fixed" Generation
In Dubbing Studio, all regenerations made to the text are "Fixed" generations by default. This means that no matter how much text is in a Speaker card, that respective clip will not change its length. This is helpful to keep the timing of the video with the speech. However, this can be problematic if there are too many or too few words within the speaker card, as this can result in sped up or slowed down speech.
This is where "Dynamic" generation can help. You can access this by right clicking on a clip and selecting "Generate Audio (Dynamic Duration). You'll notice now that the length of the clip will more appropriately match the text spoken for that section. For example, the phrase **"I'm doing well!"** should only occupy a small clip - if the clip was very long, the speech would be slurred and drawn out. This is where Dynamic generation can be helpful.
Just note, though, that this could affect the syncing and timing of your video. Additionally, if you choose "Dynamic Duration" for a clip that has many words, the clip will need to lengthen - if there is a clip directly in front of it, it will not have enough room to generate properly, so make sure you leave some space between your clips!
## Manual Import
When creating your dub, you have a special option during the creation process that is only available to the dubbing studio; manual dubbing. This option allows you to create a manual dub where you upload all of the files individually. You can upload the video file, the background audio, and the audio of only the speakers. Additionally, you should include a CSV file indicating the names of the speakers, the start and end time of when they are speaking, the original text, and the translated text. It's similar to a subtitle file but with a lot more information. This file needs to adhere to a very strict format to work correctly.
> Timecodes supported in CSV file include:
>
> seconds ([example file](https://raw.githubusercontent.com/elevenlabs/elevenlabs-docs/main/resources/dubbingTestFile%20\(seconds\).csv))
>
> hours:minutes:seconds:frame ([example file](https://raw.githubusercontent.com/elevenlabs/elevenlabs-docs/main/resources/dubbingTestFile%20\(frames\).csv))
>
> hours:minutes:seconds,milliseconds ([example file](https://raw.githubusercontent.com/elevenlabs/elevenlabs-docs/main/resources/dubbingTestFile%20\(milliseconds\).csv))
| speaker | start\_time | end\_time | transcription | translation |
| ------- | ----------- | ----------- | --------------------------------- | -------------------------------------------- |
| Joe | 0:00:00.000 | 0:00:02.000 | Hey! | Hallo! |
| Maria | 0:00:02.000 | 0:00:06.000 | Oh, hi, Joe. It has been a while. | Oh, hallo, Joe. Es ist schon eine Weile her. |
| Joe | 0:00:06.000 | 0:00:11.000 | Yeah, I know. Been busy. | Ja, ich weiß. War beschäftigt. |
| Maria | 0:00:11.000 | 0:00:17.000 | Yeah? What have you been up to? | Ja? Was hast du gemacht? |
| Joe | 0:00:17.000 | 0:00:23.000 | Traveling mostly. | Hauptsächlich gereist. |
| Maria | 0:00:23.000 | 0:00:30.000 | Oh, anywhere I would know? | Oh, irgendwo, das ich kenne? |
| Joe | 0:00:30.000 | 0:00:36.000 | Spain. | Spanien. |
# Overview
> Elevate your projects with this guide into the Voiceover Studio.
Similar to the Dubbing Studio, the new Voiceover Studio gives users an opportunity to create their own interactive content, but with a little more freedom. Voiceover Studio combines the audio timeline with our Sound Effects feature, giving you the ability to write a dialogue between any number of speakers, choose those speakers, and intertwine your own creative sound effects anywhere you like.
## Creating a Voiceover
To begin, click "Create a new voiceover". Here you have the option to upload a video, audio or create your Voiceover from scratch. After that, it's as simple as pressing "Create voiceover" - you can name your Voiceover before or after it's created. Once in the Studio, you will notice it looks very similar to a Dubbing Studio project - and it is - with some notable additions. Let's briefly revisit the layout -
### Timeline
On the bottom half of your screen, you will see the audio timeline. This is a linear representation of your Voiceover project. Each row represents a track, and on the far left section you have the track information for voiceover or SFX tracks. In the middle, you can create the clips that represent when a voice is speaking or a SFX is playing. On the right-hand side, you have the settings for the currently selected clip.
### Speaker Cards
In Dubbing Studio, the AI creates the Speaker Cards automatically - in Voiceover Studio, you get to create these on your own! Because of this, your Voiceover Project screen will begin blank after creation, and you will need to first add Tracks and Clips.
### Adding Tracks
There are three types of tracks you can add in the studio: Voiceover tracks, SFX tracks and uploaded audio.
* **Voiceover Tracks:** Voiceover tracks create new Speakers. You can click and add clips on the timeline wherever you like. After creating a clip, start writing your desired text on the speaker cards above and click "Generate". Similar to Dubbing Studio, you will also see a little cogwheel on each Speaker track - simply click on it to adjust the voice settings or replace any speaker with a voice directly from your VoiceLab - including your own Professional Voice Clone if you have created one.
* **SFX Tracks:** Add a SFX track, then click anywhere on that track to create a SFX clip. Similar to our independent SFX feature, simply start writing your prompt in the Speaker card above and click "Generate" to create your new SFX audio. You can lengthen or shorten SFX clips and move them freely around your timeline to fit your project - make sure to press the "stale" button if you do so.
* **Uploaded Audio:** Add an audio track including background music or sound effects. It's best to avoid uploading audio with speakers, as any speakers in this track will not be detected, so you won't be able to translate or correct them.
### Track Features
Once you've created a new Voiceover Track, you will see on the left-hand side of each track that you have a few options. You can also click directly on "New Voiceover Speaker" to rename it to keep yourself more organized.
Click the cog to open the Track Voice Settings. This is where you can change the voice and model used for this Voiceover Track, and adjust the voice settings. If you make changes here before generating audio for the track, the audio will generate with the settings you choose. If you change settings after audio has aleady been generated for the track, this audio will be labelled "Stale", and you will need to regenerate it, either by clicking the regenerate icon to generate a specific clip, or "Generate Stale Audio" to regenerate all the stale audio in your Voiceover project.
By clicking the small Headphones icon on either a Speaker or SFX track, you can "solo" that track which will mute all other tracks on playback. If you want to delete a track, simply click the three small dots next to the Headphones icon on the track.
### Key Differences from Dubbing Studio
If you chose not to upload a video when you created your Voiceover project, then the entire timeline is yours to work with and there are no time constraints. This differs from Dubbing Studio as it gives you a lot more freedom to create what you want and adjust the timing more easily.
When you Add a Voiceover Track, you will instantly be able to create clips on your timeline. Once you create a Voiceover clip, begin by writing in the Speaker Card above. After generating that audio, you will notice your clip on the timeline will automatically adjust its length based on the text prompt - this is called "Dynamic Generation". This option is also available in Dubbing Studio by right-clicking specific clips, but because syncing is more important with dubbed videos, the default generation type there is "Fixed Generation," meaning the clips' lengths are not affected.
### Credit Costs
Voiceover Studio does not deduct credits to create your initial project. Credits are deducted every time material is generated. Similar to Speech-Synthesis, credit costs for Voiceover Clips are based on the length of the text prompt. SFX clips will deduct 80 credits per generation.
If you choose to Dub (translate) your Voiceover Project into different languages, this will also cost additional credits depending on how much material needs to be generated. The cost is 1 credit per character for the translation, plus the cost of generating the new audio for the additional languages.
## Translating and Exporting
Similar to Dubbing Studio, after you've finished creating your Tracks and Clips and you've arranged them on the Timeline, you can click the "plus" icon on the bottom of the page to Dub your Voiceover into different languages. Click to add the desired language(s), and then make sure to generate by pressing "Generate Stale Audio" on the bottom right.
To export your Voiceover Project, simply click "Export" in the bottom right and choose your desired file type. Once the file has been generated, it will be available for download.
## Uploading Scripts
With Voiceover Studio, you have the option to upload a script for your project as a CSV file. You can either include speaker name and line, or speaker name, line, start time and end time.
Sample format, speaker and line
```
speaker,line,
Joe,"Hey!"
Maria,"Oh, hi Joe! It's been a while."
```
Sample format, speaker, line, start time and end time.
```
speaker,line,start_time,end_time
Joe,"Hey!",0.1,1.5
Maria,"Oh, hi Joe! It's been a while.",1.6,2.0
```
Once your script has imported, make sure to assign voices to each speaker before you generate the audio. To do this, click the cog icon in the information for each track, on the left.
If you don't specify start and end times for your clips, Voiceover Studio will estimate how long each clip will be, and distribute them along your timeline.
### Dynamic Duration
By default, Voiceover Studio uses Dynamic Duration, which means that the length of the clip will vary depending on the text input and the voice used. This ensures that the audio sounds as natural as possible, but it means that the length of the clip might change after the audio has been generated. You can easily reposition your clips along the timeline once they have been generated to get a natural sounding flow. If you click "Generate Stale Audio", or use the generate button on the clip, the audio will be generated using Dynamic Duration.
This also applies if you do specify the start and end time for your clips. The clips will generate based on the start time you specify, but if you use the default Dynamic Duration, the end time is likely to change once you generate the audio.
### Fixed Duration
If you need the clip to remain the length specified, you can choose to generate with Fixed Duration instead. To do this, you need to right click on the clip and select "Generate Audio Fixed Duration". This will adjust the length of the generated audio to fit the specified length of the clip. This could lead to the audio sounding unnaturally quick or slow, depending on the length of your clip.
If you want to generate multiple clips at once, you can use shift + click to select multiple clips for a speaker at once, then right click on one of them to select "Generate Audio Fixed Duration" for all selected clips.
### We can't wait to see where your creativity takes you!
# Overview
>
## Embed ElevenLabs on any web page
Audio Native is an embedded audio player that automatically voices content of a web page using ElevenLab’s text-to-speech service. It can also be used to embed pre-generated content from a project into a web page. All it takes to deploy on your site is a brief snippet of html. In addition, Audio Native comes with built-in metrics so you can track audience engagement through a listener dashboard.
## Set up
Before creating and deploying Audio Native players, you’ll need to go through a few steps to configure Audio Native on your account.
It’s best not to skip any of these steps in order to understand how Audio Native works and prevent potential misuse of the service. Don’t worry about getting everything right the first time you configure Audio Native, you can always change the settings later!
1. Navigate to Audio Native
* Go to Audio Native > Settings
* Or by directly navigating to [https://elevenlabs.io/app/audio-native/settings](https://elevenlabs.io/app/audio-native/settings)
2. Configure whitelisted sites - these are the list of website domains that will be permitted to play your content. Your Audio Native players will only work on sites that begin with the domains that you specify in this list.
* Click on Add URL
* If "elevenlabs.io/" is whitelisted, then the Audio Native player will work on any site on the elevenlabs website.
* However if we want to restrict it to just the blog page, then we can specify that in the whitelist by listing "elevenlabs.io/blog" instead. In this case, the Audio Native player will only work on any of the blog pages, and not other elevenlabs.io sites.
* If you try to embed audio native on pages that don't follow that path, you will see the player briefly appear and then disappear, or you may see this message:
You can resolve this by checking that the URL for the page is added to your URL list correctly.
3. Configure your Audio Native player’s appearance and default settings.
* Click "Player" or directly navigate to [https://elevenlabs.io/app/audionative/settings/player](https://elevenlabs.io/app/audionative/settings/player)
* Select a default voice. This is the voice that will be used if you are using Audio Native to automatically convert content from the page it’s embedded in. If you use Audio Native to play content already generated in a Project, it will keep the voice used in the Project.
* You'll get an alert of your selected voice model is not optimized for the voice you've chosen.
* Customize your player's background and text color. This is how it will be displayed on your website.
* Set a fallback Title and Author to display on your player.
* Optionally add a [pronunciation dictionary](https://elevenlabs.io/docs/projects/overview#pronunciation-dictionaries) to specify the pronunciation of words unique to your brand.
* By default, our embedded player will create a voice over of all of the text content on your page. You can customize the content we target with CSS selectors.
4. Now that you've finished customizing your player, head back to the General tab and grab your embed code.
* You'll use this code snippet to embed Audio Native into the html of any (whitelisted) site you’d like to have voiced by ElevenLabs.
## Deploying Audio Native
To see an example implementation check out our [dubbing studio blog post](https://elevenlabs.io/blog/introducing-dubbing-studio/).
The embedded player automatically collects listening metrics, retention and more. Plus, it can be readily extended to any article through simple copy-pasting.
There are three ways to deploy your Audio Native Player
### Method 1: Embed and automatically voice the site
Take the code that you generated during Audio Native set up and embed it into your website. The next time the site is opened, Audio Native will:
1. Create a new Project (make sure you have Project slots available otherwise it will throw an error)
2. Grab the webpage’s contents and put it into the newly created Project
3. Convert the Project into audio and deliver it to the Audio Native player
Once this process is complete, you can edit the Audio by editing the resulting Project. To update the audio after saving your edits to the project, select versions and publish the new version.
Here are specific implementation guides for some of the top CMS platforms:
* Audio Native with [Webflow](https://elevenlabs.io/docs/audio-native/webflow)
* Audio Native with [Framer](https://elevenlabs.io/docs/audio-native/framer)
* Audio Native with [Squarespace](https://elevenlabs.io/docs/audio-native/squarespace)
* Audio Native with [Wordpress](https://elevenlabs.io/docs/audio-native/wordpress)
* Audio Native with [Ghost](https://elevenlabs.io/docs/audio-native/ghost)
* Audio Native with [React Native](https://elevenlabs.io/docs/audio-native/audio-native-react)
***
### Method 2: Embed audio from an existing project
If you already have a converted Project and would like to embed that audio using Audio Native, simply open the Project settings and go to the Publish tab. Click Audio Native, then use the toggle to enable Audio Native. Then use the generated embed code to add the Audio Native player to any whitelisted site.
***
### Method 3: Embed audio from content using our API
You can use our API to programmatically create an Audio Native player for your existing content.
Using [this API method](https://elevenlabs.io/docs/api-reference/creates-audionative-enabled-project) you can submit your content as either a .html or .txt file and we will return an embeddable html code for an Audio Native player which can be inserted into your website or blog. Our Audio Native HTML embed code follows a standardised format, with a unique identifier for the uploaded content.
In the background when this api is called it will automatically convert your content into an ElevenLabs Project, optionally convert the project into audio straight away, and then enable sharing on this project audio using the returned Audio Native player.
Future edits to the content can be done by calling this method again with the new content, or from within the Projects UI.
## Updating Audio Native player for existing projects
When you make changes to the default player, this does not apply to existing Audio Native projects. As the settings for the player are saved for each project, these need to be updated individually for existing projects.
Here is how to do this:
1. Open the project from the list on [Audio Native](https://elevenlabs.io/app/audio-native) by clicking "Edit audio".
2. Click "Convert", then select "Share". Click "Audio Player Settings" to open the settings for the player. Make your adjustments and click "Save". These changes should reflect in the player. You may need to refresh the page.
# How to set up an ElevenLabs audio player for your articles in React (Next.js, Vite)
>
Here's a guide on how you can use [Audio Native](/docs/product/audio-native/overview) in your React projects. I'll be using Next.js, but this process will work for any React project.
Audio Native is an embedded audio player designed to vocalize the content of web pages through ElevenLabs' Text to Speech technology, as shown below.
First, you'll need to create and customize your player, whitelist your url, and copy your embed code. If you need help completing those steps, refer to our [Audio Native overview](https://elevenlabs.io/docs/audio-native/overview).
Once you've gone through the setup, you should see a page like this:
This is the code snippet that is used to embed Audio Native on a normal website such as Wordpress, Ghost, or Webflow. However, you can't use this snippet directly in React.
## Creating the Audio Native React component
Here's a handy component that you can reuse across your project:
```tsx
// ElevenLabsAudioNative.tsx
'use client';
import { useEffect } from 'react';
export type ElevenLabsProps = {
publicUserId: string;
textColorRgba?: string;
backgroundColorRgba?: string;
size?: 'small' | 'large';
children?: React.ReactNode;
};
export const ElevenLabsAudioNative = ({
publicUserId,
size,
textColorRgba,
backgroundColorRgba,
children,
}: ElevenLabsProps) => {
useEffect(() => {
const script = document.createElement('script');
script.src = 'https://elevenlabs.io/player/audioNativeHelper.js';
script.async = true;
document.body.appendChild(script);
return () => {
document.body.removeChild(script);
};
}, []);
return (
{children ? children : 'Elevenlabs AudioNative Player'}
);
};
export default ElevenLabsAudioNative;
```
Here's a link to the component on GitHub - [ElevenLabsAudioNative.tsx](https://github.com/elevenlabs/elevenlabs-examples/blob/main/examples/audio-native/react/ElevenLabsAudioNative.tsx)
```tsx
'use client';
import { useEffect } from 'react';
```
We add the `use client` directive at the top of the file. This is mainly for Next.js, as we are using `useEffect` which can only be used in client side components.
```tsx
export type ElevenLabsProps = {
publicUserId: string;
textColorRgba?: string;
backgroundColorRgba?: string;
size?: "small" | "large";
children?: React.ReactNode;
};
```
Helpful type for the props so that we can specify the public user ID (described later), customize colors and size, and set a default content if the player hasn't loaded. You can ignore this if you're not using TypeScript (TypeScript is great however!).
```tsx
useEffect(() => {
const script = document.createElement("script");
script.src = "https://elevenlabs.io/player/audioNativeHelper.js";
script.async = true;
document.body.appendChild(script);
return () => {
document.body.removeChild(script);
};
}, []);
```
In order to load the Audio Native player, we use the useEffect hook to dynamically append a script tag to the body and set the source to the URL of the Audio Native helper script.
When the component is dismounted, we make sure to remove the script tag from the body. This ensures it doesn't get loaded twice if we remount the component.
```tsx
{children ? children : "Elevenlabs AudioNative Player"}
```
Here is our main div element which will be where our Audio Native player will be placed. The children of the component can be used to show content before the player has been loaded (e.g. Loading audio player…).
React components are rendered and managed entirely in JavaScript, and their rendering lifecycle is controlled by React's virtual DOM. When you try to include a script tag directly within a React component's JSX, it doesn't behave as it would when included directly in an HTML file. React's virtual DOM does not execute script tags inserted into the DOM as part of component rendering. This is a security feature to prevent unintended or malicious code execution.
This is why, if we were to just paste the Audio Native code snippet into our React application, it would not work.
## Get the public user ID from the Audio Native snippet
Before you can use this component, you'll need to retrieve your public user ID from the code snippet. Go back to [https://elevenlabs.io/audio-native](https://elevenlabs.io/audio-native), and in the code snippet, copy the property called `publicuserid`.
This public user ID is used to identify your Audio Native project.
## Use the Audio Native component
Now that you have the public user ID, you can use the component on your page. Simply import it, then pass it the public user ID from the previous step.
```tsx
import { ElevenLabsAudioNative } from "./path/to/ElevenLabsAudioNative";
export default function Page() {
return (
Your Blog Post Title
Your blog post...
);
}
```
### Preview
Start your development server, if you haven't already, and view the page. You should see something similar to the following, stating that the URL is not allowed. (If you don't see anything, please see the Troubleshooting section below to perform a hard refresh)
### Troubleshooting
If you don't see the Audio Native player, try doing a hard refresh. This can sometimes be an issue because of the development server not properly reloading the script.
In Chrome it's: (⌘ or Ctrl) + Shift + R
### Why am I seeing “URL not allowed”?
Here's what's happening behind the scenes. Remember that script we loaded in the useEffect hook? This script is trying to scrape the content from your page to get all the text and convert it to audio. However, it can't load your page because it's on `localhost`. Audio Native can only process pages that are publicly accessible on the internet.
## Local testing with ngrok
This is where a service such as ngrok can help us. ngrok is a way to get your site on localhost to map to a public URL on the internet. They have a free tier, so visit their website [https://ngrok.com](https://ngrok.com), create an account and install it.
Here's their getting started guide - [https://ngrok.com/docs/getting-started](https://ngrok.com/docs/getting-started)
Once you have it installed, you can use a command similar to the one below to point your local React project to a public URL with ngrok. I'm running Next.js locally on port `3000`, so here's the command I run. Your details may vary.
```
ngrok http http://localhost:3000
```
Running this command will give you a URL that you can use in the next step.
### Update the allowed URLs to include the ngrok URL
Go to the Audio Native section:
[https://elevenlabs.io/audio-native](https://elevenlabs.io/audio-native)
Select the “My Websites” tab.
Enter the ngrok URL (from the previous step) in the “Allowed URLs” section.
This ensures that your player can only show on websites that you specify. This is very important, as someone else may otherwise be able to use your public user ID on their website.
Now visit your ngrok URL, you should see Audio Native processing your content. In the background, we are creating a project in your ElevenLabs account just for your page. This project contains the text from your page and converts it to audio.
View the newly created project here:
[https://elevenlabs.io/app/projects](https://elevenlabs.io/app/projects)
## Deploy to production
Make sure to also add the URL of your website to the allowed URLs once you've deployed your React app and you're ready to push to production.
We only used ngrok for local development, it's not needed for public facing URLs as ElevenLabs will directly grab the content from the website.
## Updating audio content
When updating the content on a page, you may notice that the audio from the Audio Native player won't update automatically.
In order to update the audio you'll have to go to the project in ElevenLabs and update the content from there manually. [https://elevenlabs.io/app/projects](https://elevenlabs.io/app/projects)
## Conclusion
Now that you have Audio Native working in your React project, go ahead and add the component to more pages on your website to begin converting content into high quality audio for your visitors.
# How to set up an ElevenLabs audio player for your articles in Ghost
>
Before adding Audio Native to Ghost, you'll need to create & customize your player, whitelist your blog's domain, and copy your embed code. If you need help completing those steps, refer to our [Audio Native overview](https://elevenlabs.io/docs/audio-native/overview).
Now that you've created & customized your Audio Native player, navigate to your Ghost blog, sign in, and open the blog post you wish to narrate in the editor.
Next, add a line near the top of your blog post (above or below the main image is usually best). Click the “+” symbol on the left side and select “HTML” from the menu.
Paste your Audio Native Embed code into the HTML box, as shown below, and press enter or click away.
Click the “Update” button in the top right corner of the editor, which should now be highlighted in green text.
Now, navigate to the live version of the blog post you just updated. You should see a message to let you know that the Audio Native project is being created. This means the text in your blog post is being converted to an audio article.
After a few minutes, the embedded audio player will appear and you can click play to hear the AI-generated audio blog.
Follow these steps for any Ghost blog posts that you wish to turn into audio articles.
# How to set up an ElevenLabs audio player for your articles in Squarespace
>
Before adding Audio Native to Squarespace, you'll need to create & customize your player, whitelist your blog's domain, and copy your embed code. If you need help completing those steps, refer to our [Audio Native overview](https://elevenlabs.io/docs/audio-native/overview).
Now that you've created & customized your Audio Native player, navigate to your Squarespace blog, sign in, and open the blog post you wish to narrate in the editor.
Next, add a line near the top of your blog post (below the header is usually best). Click the “+” symbol and select "Code" from the menu.
Paste your Audio Native Embed code into the HTML box, as shown below, and press enter or click away.
Click the Save button in the top right left of the editor, which should now be highlighted.
Now, navigate to the live version of the blog post you just updated. You should see a message to let you know that the Audio Native project is being created. This means the text in your blog post is being converted to an audio article.
After a few minutes, the embedded audio player will appear and you can click play to hear the AI-generated audio blog.
Follow these steps for any Squarespace blog posts that you wish to turn into audio articles.
# How to set up an ElevenLabs audio player for your articles in Framer
>
Before adding Audio Native to Framer, you'll need to create & customize your player, whitelist your blog's domain, and copy your embed code. If you need help completing those steps, refer to our [Audio Native overview](https://elevenlabs.io/docs/audio-native/overview).
Now that you've created & customized your Audio Native player, navigate to Framer. Go to Site Settings in Extract the \