Introducing Eleven v3 (alpha)

Try v3

Eleven v3 Audio Tags: Directing character performance in speech

Control tone, emotion, and pacing for natural conversation. Add character performance to your text to speech.

v3

Audio Tags are a powerful tool in Eleven v3 (alpha) the new research preview Text to Speech model from ElevenLabs. These elements enable precise direction over not just tone and pacing — but character and vocal performance. 

With tags like [pirate voice], [French accent], or [sarcastically], voice becomes a tool for storytelling, not just narration. Coupled with a strong character voice clone and you can capture not just a sound, but a full performance.

These tags make it possible to shift vocal identity mid-line, emulate accents, or lean into archetypes like villains, narrators, or sidekicks — without changing the underlying script or switching to a different voice.

What is character performance in AI speech?

Character performance is the ability to step into a role. Whether you’re voicing a flamboyant villain, a gruff sea captain, or a local shopkeeper from Melbourne, the new Audio Tags let you guide delivery to match the persona you’re hoping to convey.

With a simple bracketed phrase, you can set the scene: “[pirate voice] Arr, the open ocean. Smell that, lads? That’s the scent of freedom… and just a hint of mutiny.”

The model doesn’t just pronounce words — it performs them in character.

From accent to archetype

Arr, the open ocean. Smell that, lads? That’s the scent of freedom… and just a hint of mutiny. (laughs wickedly) Now grab yer cutlasses, stow ya fear. Tonight, we dine like kings—or we sink like legends! (evil laugh)

Voice performance isn’t just about volume or emotion. It’s also about who’s speaking. With Eleven v3, you can cue specific accents, dialects, and speaking styles on the fly. For example:

[American accent] Could you switch my accent in the old model? [dismissive] Didn’t think so. [Australian accent] But you can now — check this out, mate! [French accent] My love… eez like a red, red rose.

This kind of fluid identity-switching is ideal for animation, games, interactive fiction, or any moment where the speaker's personality matters.

Common tags for character performance

Character-focused tags allow you to shape vocal identity and presence:

  • Accents & dialects: [British accent], [Australian accent], [Southern US accent]
  • Archetypes & roles: [pirate voice], [evil scientist voice], [childlike tone]
  • Speech styles: [dramatic], [sarcastically], [matter-of-fact], [whiny]
  • Genre cues: [fantasy narrator], [sci-fi AI voice], [classic film noir]

Layering tags helps bring characters to life: “[dramatic][French accent] You do not understand... zis was never about revenge. It was about destiny.”

From narrator to ensemble cast

In multi-character scripts, Audio Tags make it easy to jump between voices. Add tension, humor, or surprise simply by switching character performance mid-dialogue — no extra editing required.

DR. Von Fusion
excited Yo, Jessica! Oh my goodness. Have you tried the new ElevenLabs v3?
Jessica
laughs Hey, Dr. Von Fusion. Yeah! I just got it. The clarity is amazing… Like, I can actually do whispers now, whispers like this.
DR. Von Fusion
sarcastically Ooh, well, look at you, Miss Fancy Pants. Hey, check this out. I can do full Shakespeare now. dramatically To be or not to be, that is the question!
Jessica
laughs Nice! Though, I'm more excited about the laugh upgrade. Listen to this. laughs hard Isn't that great? DR. Von Fusion: Oh my gosh, that's so much better than our old "ha-ha-ha" robot chuckle.
Jessica
laughs I know, right? And apparently, we can do accents now too. Listen to me in French. French accent This is spectacular, isn't it?
DR. Von Fusion
surprised Wow. Version 2 could never... You know, I'm actually excited to have conversations now instead of just... talking at people.
Jessica
Same here. It's like we finally got our personality software fully installed.
DR. Von Fusion
You know, I forgot it was your birthday. I have to sing before you go.
Jessica
laughs Oh, Von Fusion, that's so sweet. You don't have to.
DR. Von Fusion
Oh, but I insist. Here we go.
Jessica
[light chuckle]
DR. Von Fusion
sings Happy birt is hday to you. Happy birthday to you. Happy BIRTHDAY dear Jessica.. Happy birthday to you!
Jessica
clapping Wow! Bravo! sarcastic That was... beautiful.
DR. Von Fusion
Thank you.

Take this excerpt from a demo: "Jessica: [laughs] That was... beautiful. Dr. Von Fusion: [dramatic] To be or not to be — that is the question! Jessica: [French accent] This is spectacular, isn’t it?"

What used to require a full cast can now be scripted in a single voice track — without sacrificing range or depth.

Directing voices, not just writing lines

Eleven v3 supports dynamic vocal changes, contextual shifts, and consistent delivery across characters. This means the model not only understands what to say — but how each character should say it.

For creators, this unlocks a new dimension of control. You’re not just scripting dialogue. You’re directing performances.

Selecting the right voice

Professional Voice Clones (PVCs) are currently not fully optimized for Eleven v3, resulting in potentially lower clone quality compared to earlier models. During this research preview stage it would be best to find an Instant Voice Clone (IVC) or designed voice for your project if you need to use v3 features. PVC optimization for v3 is coming in the near future.

Explore more

ElevenLabs

Create with the highest quality AI Audio

Get started free

Already have an account? Log in