Overview
ElevenLabs offer high-quality pre-made voices, a Voice Design feature that allows you to create unique voices, and two different types of voice cloning features: Instant Voice Cloning and Professional Voice Cloning.
The Voice Design feature allows users to select gender, age, and accent to generate unique voices. Keep in mind that it may take a few tries to find the perfect fit. Keep in mind that each time you press generate
When cloning a voice, it’s crucial to consider the AI’s training to get a good clone and focus on audio quality over length. Providing clear, well-paced speech of a single voice with consistent volume and without any background noise, chatter, reverb or other effects usually yields better results. Users should be mindful of potential limitations when attempting to clone voices with uncommon accents or highly dynamic speech.
Although any voice can speak any language, it is important to note that the AI will try to mimic the accent of the original voice. So, if you have cloned a voice with an American accent and have it speak Spanish, the generated audio might have an American accent.
Pre-made Voices
- High-quality, free-to-use voices
- Suitable for most use-cases
- Trained on English voices. Can be used with other languages but might have an English accent or not the correct pronunciation
- Can be shared in the Voice Library where you can earn back characters from your used quota when users use your shared voice
Generated Voices (Voice Design)
- Custom voice creation with gender, age, and accent options
- Includes a row for different English accents to choose from
- Quality comparable to pre-made and cloned voices
- May require multiple attempts to find the desired voice
Instant Cloned Voices
- Create a clone of a voice near instantaneously
- Audio quality of the samples is crucial for proper cloning
- Consistency of the recordings are more important than the total runtime
- Good audio total runtime is about 1-3 minutes
- Too much audio can make the voice much less consistent
- Results can be less predictable with wide dynamic range and broad emotional speech
Professionally Cloned Voices
- Creates a near-perfect clone of a voice
- Audio quality of samples is important for proper cloning
- Minimum recommended length of audio is 30 minutes, recommended is closer to 3 hours of high quality and consistent audio
- Results can be less predictable with wide dynamic range and broad emotional speech
- Can be shared in the Voice Library where you can earn back characters from your used quota when users use your shared voice
- Needs to be trained and fine-tuned with takes and estimate of ~4 weeks