Eleven v3 Audio Tags: Directing character performance in speech

Written by: Ryan Morrison
Published: Jun 10, 2025
Last updated: Jul 28, 2026

ListenListen to this article

0:00

0:000:00

Audio Tags are a powerful tool in Eleven v3 (alpha) the new research preview Text to Speech model from ElevenLabs. These elements enable precise direction over not just tone and pacing — but character and vocal performance.

With tags like [pirate voice], [French accent], or [sarcastically], voice becomes a tool for storytelling, not just narration. Coupled with a strong character voice clone and you can capture not just a sound, but a full performance.

These tags make it possible to shift vocal identity mid-line, emulate accents, or lean into archetypes like villains, narrators, or sidekicks — without changing the underlying script or switching to a different voice.

What is character performance in AI speech?

Character performance is the ability to step into a role. Whether you’re voicing a flamboyant villain, a gruff sea captain, or a local shopkeeper from Melbourne, the new Audio Tags let you guide delivery to match the persona you’re hoping to convey.

With a simple bracketed phrase, you can set the scene: “[pirate voice] Arr, the open ocean. Smell that, lads? That’s the scent of freedom… and just a hint of mutiny.”

The model doesn’t just pronounce words — it performs them in character.

From accent to archetype

Voice performance isn’t just about volume or emotion. It’s also about who’s speaking. With Eleven v3, you can cue specific accents, dialects, and speaking styles on the fly. For example:

[American accent] Could you switch my accent in the old model? [dismissive] Didn’t think so. [Australian accent] But you can now — check this out, mate! [French accent] My love… eez like a red, red rose.

This kind of fluid identity-switching is ideal for animation, games, interactive fiction, or any moment where the speaker's personality matters.

Common tags for character performance

Character-focused tags allow you to shape vocal identity and presence:

Accents & dialects: [British accent], [Australian accent], [Southern US accent]
Archetypes & roles: [pirate voice], [evil scientist voice], [childlike tone]
Speech styles: [dramatic], [sarcastically], [matter-of-fact], [whiny]
Genre cues: [fantasy narrator], [sci-fi AI voice], [classic film noir]

Layering tags helps bring characters to life: “[dramatic][French accent] You do not understand... zis was never about revenge. It was about destiny.”

From narrator to ensemble cast

In multi-character scripts, Audio Tags make it easy to jump between voices. Add tension, humor, or surprise simply by switching character performance mid-dialogue — no extra editing required.

Take this excerpt from a demo: "Jessica: [laughs] That was... beautiful. Dr. Von Fusion: [dramatic] To be or not to be — that is the question! Jessica: [French accent] This is spectacular, isn’t it?"

What used to require a full cast can now be scripted in a single voice track — without sacrificing range or depth.

Directing voices, not just writing lines

Eleven v3 supports dynamic vocal changes, contextual shifts, and consistent delivery across characters. This means the model not only understands what to say — but how each character should say it.

For creators, this unlocks a new dimension of control. You’re not just scripting dialogue. You’re directing performances.

Selecting the right voice

Professional Voice Clones (PVCs) are currently not fully optimized for Eleven v3, resulting in potentially lower clone quality compared to earlier models. During this research preview stage it would be best to find an Instant Voice Clone (IVC) or designed voice for your project if you need to use v3 features. PVC optimization for v3 is coming in the near future.

Eleven v3 Audio Tags: Directing character performance in speech

What is character performance in AI speech?

From accent to archetype

Common tags for character performance

From narrator to ensemble cast

Directing voices, not just writing lines

Selecting the right voice

Similar articles

Guide to ElevenLabs Text to Speech With Adobe Premiere Pro

Create realistic Mid-Atlantic accent Text to Speech

Create realistic Brooklyn accent Text to Speech

Create realistic Yorkshire accent Text to Speech