Voice design

Overview
Voice Design helps creators fill the gaps when the exact voice they are looking for isn’t available in the Voice Library. If you can’t find a suitable voice for your project, you can create one. Note that Voice Design is highly experimental and Professional Voice Clones (PVC) are currently the highest quality voices on our platform. If there is a PVC available in our library that fits your needs, we recommend using it instead.
You can find Voice Design by heading to Voices -> My Voices -> Add a new voice -> Voice Design in the ElevenLabs app or via the API.
When you hit generate, we’ll generate three voice options for you. The only charge for using voice design is the number of credits to generate your preview text, which you are only charged once even though we are generating three samples for you. You can see the number of characters that will be deducted in the “Text to preview” text box.
After generating, you’ll have the option to select and save one of the generations, which will take up one of your voice slots.
See the API reference for Voice Design
A Next.js example app for Voice Design
Prompting guide
The prompt is the foundation of your voice. It tells the model what kind of voice you’re trying to create — everything from the accent and character-type to the gender and vibe of the voice. A well-crafted prompt can be the difference between a generic voice and one that truly fits your vision. In general, more descriptive and granular prompts tend to yield more accurate and nuanced results. The more detail you provide — including age, gender, tone, accent, pacing, emotion, style, and more - the better the model can interpret and deliver a voice that feels intentional and tailored.
However, sometimes short and simple prompts can also work, especially when you’re aiming for a more neutral or broadly usable voice. For example, “A calm male narrator” might give you exactly what you need without going into detail — particularly if you’re not trying to create a very specific character or style. The right level of detail depends on your use case. Are you building a fantasy character? A virtual assistant? A tired New Yorker in her 60s with a dry sense of humor? The more clearly you define it in your prompt, the closer the output will be to what you’re imagining.
Audio Quality
Audio quality refers to the clarity, cleanliness, and overall fidelity of the generated voice. By default, ElevenLabs aims to produce clean and natural-sounding audio — but if your project requires a specific level of quality, it’s best to explicitly include it in your prompt.
For high-quality results, you can help the model by adding a phrase such as “perfect audio quality” or “studio-quality recording” to your voice description. This helps ensure the voice is rendered with maximum clarity, minimal distortion, and a polished finish.
Including these types of phrases can sometimes reduce the accuracy of the prompt in general if the voice is very specific or niche.
There may also be creative cases where lower audio quality is intentional, such as when simulating a phone call, old radio broadcast, or found footage. In those situations, either leave out quality descriptors entirely or explicitly include phrases like:
- “Low-fidelity audio”
- “Poor audio quality”
- “Sounds like a voicemail”
- “Muffled and distant, like on an old tape recorder”
The placement of this phrase in your prompt is flexible — it can appear at the beginning or end, though we’ve found it works well at either.
Age, Tone/Timbre and Gender
These three characteristics are the foundation of voice design, shaping the overall identity and emotional resonance of the voice. The more detail you provide, the easier it is for the AI to produce a voice that fits your creative vision — whether you’re building a believable character, crafting a compelling narrator, or designing a virtual assistant.
Age
Describing the perceived age of the voice helps define its maturity, vocal texture, and energy. Use specific terms to guide the AI toward the right vocal quality.
Useful descriptors:
- “Adolscent male” / “adolescent female”
- “Young adult” / “in their 20s” / “early 30s”
- “Middle-aged man” / “woman in her 40s”
- “Elderly man” / “older woman” / “man in his 80s”
Tone/Timbre
Refers to the physical quality of the voice, shaped by pitch, resonance, and vocal texture. It’s distinct from emotional delivery or attitude.
Common tone/timbre descriptors:
- “Deep” / “low-pitched”
- “Smooth” / “rich”
- “Gravelly” / “raspy”
- “Nasally” / “shrill”
- “Airy” / “breathy”
- “Booming” / “resonant”
- “Light” / “thin”
- “Warm” / “mellow”
- “Tinny” / “metallic”
Gender
Gender often typically influences pitch, vocal weight, and tonal presence — but you can push beyond simple categories by describing the sound instead of the identity.
Examples:
- “A lower-pitched, husky female voice”
- “A masculine male voice, deep and resonant”
- “A neutral gender — soft and mid-pitched”
Accent
Accent plays a critical role in defining a voice’s regional, cultural, and emotional identity. If your project depends on an authentic sound — whether it’s grounded in realism or stylized for character — being clear and deliberate about the desired accent is essential.
Phrase choice matters - certain terms tend to produce more consistent results. For example, “thick” often yields better results than “strong” when describing how prominent an accent should be. There is lots of trial and error to be had, and we encourage you to experiment with the wording and to be as creative and descriptive as possible.
- Examples of clear accent prompts:
- “A middle-aged man with a thick French accent”
- “A young woman with a slight Southern drawl”
- “An old man with a heavy Eastern European accent”
- “A cheerful woman speaking with a crisp British accent”
- “A younger male with a soft Irish lilt”
- “An authoritative voice with a neutral American accent”
- “A man with a regional Australian accent, laid-back and nasal”
Avoid overly vague descriptors like “foreign” or “exotic” — they’re imprecise and can produce inconsistent results.
Combine accent with other traits like tone, age, or pacing for better control. E.g., “A sarcastic old woman with a thick New York accent, speaking slowly.”
For fantasy or fictional voices, you can suggest real-world accents as inspiration:
- “An elf with a proper thick British accent. He is regal and lyrical.”
- “A goblin with a raspy Eastern European accent.”
Pacing
Pacing refers to the speed and rhythm at which a voice speaks. It’s a key component in shaping the personality, emotional tone, and clarity of the voice. Being explicit about pacing is essential, especially when designing voices for specific use cases like storytelling, advertising, character dialogue, or instructional content.
Use clear language to describe how fast or slow the voice should speak. You can also describe how the pacing feels — whether it’s steady, erratic, deliberate, or breezy. If the pacing shifts, be sure to indicate where and why.
Examples of pacing descriptors:
- “Speaking quickly” / “at a fast pace”
- “At a normal pace” / “speaking normally”
- “Speaking slowly” / “with a slow rhythm”
- “Deliberate and measured pacing”
- “Drawn out, as if savoring each word”
- “With a hurried cadence, like they’re in a rush”
- “Relaxed and conversational pacing”
- “Rhythmic and musical in pace”
- “Erratic pacing, with abrupt pauses and bursts”
- “Even pacing, with consistent timing between words”
- “Staccato delivery”
Text to preview
Once you’ve written a strong voice prompt, the text you use to preview that voice plays a crucial role in shaping how it actually sounds. The preview text acts like a performance script — it sets the tone, pacing, and emotional delivery that the voice will attempt to match.
To get the best results, your preview text should complement the voice description, not contradict it. For example, if your prompt describes a “calm and reflective younger female voice with a slight Japanese accent,” using a sentence like “Hey! I can’t stand what you’ve done with the darn place!!!” will clash with that intent. The model will try to reconcile that mismatch, often leading to unnatural or inconsistent results.
Instead, use sample text that reflects the voice’s intended personality and emotional tone. For the example above, something like “It’s been quiet lately… I’ve had time to think, and maybe that’s what I needed most.” supports the prompt and helps generate a more natural, coherent voice.
Additionally, we’ve found that longer preview texts tend to produce more stable and expressive results. Short phrases can sometimes sound abrupt or inconsistent, especially when testing subtle qualities like tone or pacing. Giving the model more context — a full sentence or even a short paragraph — allows it to deliver a smoother and more accurate representation of the voice.
Parameters
Loudness
Controls the volume of the Text to Preview generation, and ultimately the voice once saved.
Guidance Scale
Dictates how closely the Prompt is adhered to. higher/lower values will stick to the prompt more strictly but could result in poorer audio quality if the prompt is very niche, while higher/lower/ values will allow the model to be more creative at the cost of prompt accuracy. Use a high value in this case if the performance and audio quality is more important than nailing the prompt. High/low values are recommended when accent or tone accuracy is of paramount importance
Attributes and Examples
Experiment with the way in which these descriptors are written. For example, “Perfect audio quality” can also be written as “the audio quality is perfect”. These can sometimes produce different results!