
Eleven v3 Audio Tags: Enabling narrative intelligence in speech
Guide emotional rhythm and structural flow with tags like [pause], [awe], or [dramatic tone] for compelling storytelling.
Introducing Eleven v3 (alpha)
Try v3Control tone, emotion, and pacing for natural conversation. Add character performance to your text to speech.
Audio Tags are a powerful tool in Eleven v3 (alpha) the new research preview Text to Speech model from ElevenLabs. These elements enable precise direction over not just tone and pacing — but character and vocal performance.
With tags like [pirate voice], [French accent], or [sarcastically], voice becomes a tool for storytelling, not just narration. Coupled with a strong character voice clone and you can capture not just a sound, but a full performance.
These tags make it possible to shift vocal identity mid-line, emulate accents, or lean into archetypes like villains, narrators, or sidekicks — without changing the underlying script or switching to a different voice.
Character performance is the ability to step into a role. Whether you’re voicing a flamboyant villain, a gruff sea captain, or a local shopkeeper from Melbourne, the new Audio Tags let you guide delivery to match the persona you’re hoping to convey.
With a simple bracketed phrase, you can set the scene: “[pirate voice] Arr, the open ocean. Smell that, lads? That’s the scent of freedom… and just a hint of mutiny.”
The model doesn’t just pronounce words — it performs them in character.
Voice performance isn’t just about volume or emotion. It’s also about who’s speaking. With Eleven v3, you can cue specific accents, dialects, and speaking styles on the fly. For example:
[American accent] Could you switch my accent in the old model? [dismissive] Didn’t think so. [Australian accent] But you can now — check this out, mate! [French accent] My love… eez like a red, red rose.
This kind of fluid identity-switching is ideal for animation, games, interactive fiction, or any moment where the speaker's personality matters.
Character-focused tags allow you to shape vocal identity and presence:
Layering tags helps bring characters to life: “[dramatic][French accent] You do not understand... zis was never about revenge. It was about destiny.”
In multi-character scripts, Audio Tags make it easy to jump between voices. Add tension, humor, or surprise simply by switching character performance mid-dialogue — no extra editing required.
Take this excerpt from a demo: "Jessica: [laughs] That was... beautiful. Dr. Von Fusion: [dramatic] To be or not to be — that is the question! Jessica: [French accent] This is spectacular, isn’t it?"
What used to require a full cast can now be scripted in a single voice track — without sacrificing range or depth.
Eleven v3 supports dynamic vocal changes, contextual shifts, and consistent delivery across characters. This means the model not only understands what to say — but how each character should say it.
For creators, this unlocks a new dimension of control. You’re not just scripting dialogue. You’re directing performances.
Professional Voice Clones (PVCs) are currently not fully optimized for Eleven v3, resulting in potentially lower clone quality compared to earlier models. During this research preview stage it would be best to find an Instant Voice Clone (IVC) or designed voice for your project if you need to use v3 features. PVC optimization for v3 is coming in the near future.
Guide emotional rhythm and structural flow with tags like [pause], [awe], or [dramatic tone] for compelling storytelling.
Infuse AI speech with emotional nuance using Eleven v3 Audio Tags. Control tension, warmth, hesitation, and relief for relatable, dynamic, and human-like spoken content.
Powered by ElevenLabs Conversational AI