ElevenLabs offer high-quality pre-made voices, a Voice Design feature that allows you to create unique voices, and two different types of voice cloning features: Instant Voice Cloning and Professional Voice Cloning.

The Voice Design feature allows users to select gender, age, and accent to generate unique voices. Keep in mind that it may take a few tries to find the perfect fit. Keep in mind that each time you press generate

When cloning a voice, it’s crucial to consider the AI’s training to get a good clone and focus on audio quality over length. Providing clear, well-paced speech of a single voice with consistent volume and without any background noise, chatter, reverb or other effects usually yields better results. Users should be mindful of potential limitations when attempting to clone voices with uncommon accents or highly dynamic speech.

Although any voice can speak any language, it is important to note that the AI will try to mimic the accent of the original voice. So, if you have cloned a voice with an American accent and have it speak Spanish, the generated audio might have an American accent.

High-quality, free-to-use voices

Suitable for most use-cases

Trained on English voices. Can be used with other languages but might have an English accent or not the correct pronunciation

Can be shared in the Voice Library where you can earn back characters from your used quota when users use your shared voice

​ Generated Voices (Voice Design)

Custom voice creation with gender, age, and accent options

Includes a row for different English accents to choose from

Quality comparable to pre-made and cloned voices

May require multiple attempts to find the desired voice

​ Instant Cloned Voices

Create a clone of a voice near instantaneously

Audio quality of the samples is crucial for proper cloning

Consistency of the recordings are more important than the total runtime

Good audio total runtime is about 1-3 minutes

Too much audio can make the voice much less consistent

Results can be less predictable with wide dynamic range and broad emotional speech

​ Professionally Cloned Voices

Creates a near-perfect clone of a voice

Audio quality of samples is important for proper cloning

Minimum recommended length of audio is 30 minutes, recommended is closer to 3 hours of high quality and consistent audio

Results can be less predictable with wide dynamic range and broad emotional speech

Can be shared in the Voice Library where you can earn back characters from your used quota when users use your shared voice

Needs to be trained and fine-tuned with takes and estimate of ~4 weeks