Prompting Eleven v3 (alpha)

Learn how to prompt and use audio tags with our most advanced model.

This guide provides the most effective tags and techniques for prompting Eleven v3, including voice selection, changes in capitalization, punctuation, audio tags and multi-speaker dialogue. Experiment with these methods to discover what works best for your specific voice and use case.

Eleven v3 is in alpha. Very short prompts are more likely to cause inconsistent outputs. We encourage you to experiment with prompts greater than 250 characters.

Voice selection

The most important parameter for Eleven v3 is the voice you choose. It needs to be similar enough to the desired delivery. For example, if the voice is shouting and you use the audio tag [whispering], it likely won’t work well.

When creating IVCs, you should include a broader emotional range than before. As a result, voices in the voice library may produce more variable results compared to the v2 and v2.5 models. We’ve compiled over 22 excellent voices for V3 here.

Choose voices strategically based on your intended use:

For expressive IVC voices, vary emotional tones across the recording—include both neutral and dynamic samples.

For specific use cases like sports commentary, maintain consistent emotion throughout the dataset.

Neutral voices tend to be more stable across languages and styles, providing reliable baseline performance.

Professional Voice Clones (PVCs) are currently not fully optimized for Eleven v3, resulting in potentially lower clone quality compared to earlier models. During this research preview stage it would be best to find an Instant Voice Clone (IVC) or designed voice for your project if you need to use v3 features.

Settings

Stability

The stability slider is the most important setting in v3, controlling how closely the generated voice adheres to the original reference audio.

Stability settings in Eleven
v3

  • Creative: More emotional and expressive, but prone to hallucinations.
  • Natural: Closest to the original voice recording—balanced and neutral.
  • Robust: Highly stable, but less responsive to directional prompts but consistent, similar to v2.

For maximum expressiveness with audio tags, use Creative or Natural settings. Robust reduces responsiveness to directional prompts.

Audio tags

Eleven v3 introduces emotional control through audio tags. You can direct voices to laugh, whisper, act sarcastic, or express curiosity among many other styles. Speed is also controlled through audio tags.

The voice you choose and its training samples will affect tag effectiveness. Some tags work well with certain voices while others may not. Don’t expect a whispering voice to suddenly shout with a [shout] tag.

These tags control vocal delivery and emotional expression:

  • [laughs], [laughs harder], [starts laughing], [wheezing]
  • [whispers]
  • [sighs], [exhales]
  • [sarcastic], [curious], [excited], [crying], [snorts], [mischievously]
Example
[whispers] I never knew it could be this way, but I'm glad we're here.

Sound effects

Add environmental sounds and effects:

  • [gunshot], [applause], [clapping], [explosion]
  • [swallows], [gulps]
Example
[applause] Thank you all for coming tonight! [gunshot] What was that?

Unique and special

Experimental tags for creative applications:

  • [strong X accent] (replace X with desired accent)
  • [sings], [woo], [fart]
Example
[strong French accent] "Zat's life, my friend — you can't control everysing."

Some experimental tags may be less consistent across different voices. Test thoroughly before production use.

Punctuation

Punctuation significantly affects delivery in v3:

  • Ellipses (…) add pauses and weight
  • Capitalization increases emphasis
  • Standard punctuation provides natural speech rhythm
Example
"It was a VERY long day [sigh] … nobody listens anymore."

Single speaker examples

Use tags intentionally and match them to the voice’s character. A meditative voice shouldn’t shout; a hyped voice won’t whisper convincingly.

"Okay, you are NOT going to believe this.
You know how I've been totally stuck on that short story?
Like, staring at the screen for HOURS, just... nothing?
[frustrated sigh] I was seriously about to just trash the whole thing. Start over.
Give up, probably. But then!
Last night, I was just doodling, not even thinking about it, right?
And this one little phrase popped into my head. Just... completely out of the blue.
And it wasn't even for the story, initially.
But then I typed it out, just to see. And it was like... the FLOODGATES opened!
Suddenly, I knew exactly where the character needed to go, what the ending had to be...
It all just CLICKED. [happy gasp] I stayed up till, like, 3 AM, just typing like a maniac.
Didn't even stop for coffee! [laughs] And it's... it's GOOD! Like, really good.
It feels so... complete now, you know? Like it finally has a soul.
I am so incredibly PUMPED to finish editing it now.
It went from feeling like a chore to feeling like... MAGIC. Seriously, I'm still buzzing!"

Multi-speaker dialogue

v3 can handle multi-voice prompts effectively. Assign distinct voices from your Voice Library for each speaker to create realistic conversations.

Speaker 1: [excitedly] Sam! Have you tried the new Eleven V3?
Speaker 2: [curiously] Just got it! The clarity is amazing. I can actually do whispers now—
[whispers] like this!
Speaker 1: [impressed] Ooh, fancy! Check this out—
[dramatically] I can do full Shakespeare now! "To be or not to be, that is the question!"
Speaker 2: [giggling] Nice! Though I'm more excited about the laugh upgrade. Listen to this—
[with genuine belly laugh] Ha ha ha!
Speaker 1: [delighted] That's so much better than our old "ha. ha. ha." robot chuckle!
Speaker 2: [amazed] Wow! V2 me could never. I'm actually excited to have conversations now instead of just... talking at people.
Speaker 1: [warmly] Same here! It's like we finally got our personality software fully installed.

Tips

You can combine multiple audio tags for complex emotional delivery. Experiment with different combinations to find what works best for your voice.

Match tags to your voice’s character and training data. A serious, professional voice may not respond well to playful tags like [giggles] or [mischievously].

Text structure strongly influences output with v3. Use natural speech patterns, proper punctuation, and clear emotional context for best results.

There are likely many more effective tags beyond this list. Experiment with descriptive emotional states and actions to discover what works for your specific use case.