
Eleven v3 Audio Tags: Bringing multi-character dialogue to life
Create dynamic multi-character dialogue with Eleven v3 Audio Tags. Script overlapping voices, interruptions, and emotional shifts for natural, human-like AI conversations.
Présentation de Eleven v3 Alpha
Essayez v3Fine-grained control over timing, rhythm, and emphasis with Eleven v3 Audio Tags. Transform flat delivery into dynamic, performative content.
Great speech isn’t just about what’s said — it’s how it’s said. With Eleven v3 Audio Tags, you gain fine-grained control over timing, rhythm, and emphasis, allowing you to shape the pacing of a line with precision.
Using tags like [pause], [rushed], [stammers], or [drawn out], you can adjust how each sentence lands — not just emotionally, but rhythmically. That control turns flat delivery into performance.
Delivery control is the ability to direct the flow of speech — how quickly it moves, where it pauses, when it emphasizes. It’s what makes a line feel dramatic, casual, tense, or comedic.
Avec Eleven v3, la livraison n'est pas limitée au rythme par défaut. Vous pouvez ralentir pour le suspense, accélérer pour l'urgence, ou ajouter du rythme pour l'humour — directement depuis le script.
Example: "Okay, so like I finally beat level 42 of that game I said I’d quit like... a month ago. [laughs] And then the final boss... was just... [giggle] a bunny rabbit. [big laugh] I couldn’t do it. It was too cute."
Tags here shape the tempo and timing — and that’s what makes the line land.
Tags give you access to the subtle cues humans use to pace speech naturally:
Example: "[drawn out] Sooooo... you're saying... [suspicious tone] you didn't eat the last slice?"
These tags give you complete control over how a voice feels in motion.
Changing how a line is delivered changes how it's interpreted.
Same words. Different meaning. With delivery control, tone emerges not from word choice, but from timing and intent.
You can layer delivery-focused tags with emotional or character cues to shape entire scenes.
Example: [hesitant][nervous] I... I’m not sure this is going to work. [gulps] But let’s try anyway.
Or: [whispering][pause] Did you hear that? [rushed] Hide! Now!
It’s this mix of rhythm and reaction that makes performances feel believable.
Eleven v3 turns the script into a score — and delivery control is how you conduct it. Whether you’re scripting tutorials, monologues, or punchlines, Audio Tags let you manage delivery with frame-by-frame precision.
For creators, this means complete command over how a line unfolds. You’re not just writing what happens. You’re setting its tempo.
Professional Voice Clones (PVCs) are currently not fully optimized for Eleven v3, resulting in potentially lower clone quality compared to earlier models. During this research preview stage it would be best to find an Instant Voice Clone (IVC) or designed voice for your project if you need to use v3 features. PVC optimization for v3 is coming in the near future.
Create dynamic multi-character dialogue with Eleven v3 Audio Tags. Script overlapping voices, interruptions, and emotional shifts for natural, human-like AI conversations.
Learn how Voice Cloning works, how to use it, and how to get started.