
Eleven v3 Audio Tags: Precision delivery control for AI speech
Fine-grained control over timing, rhythm, and emphasis with Eleven v3 Audio Tags. Transform flat delivery into dynamic, performative content.
Introducing Eleven v3 (alpha)
Try v3Seamlessly switch accents mid-sentence with Eleven v3 Audio Tags. Emulate American, British, French, and more for dynamic, culturally rich AI speech.
With Eleven v3 Audio Tags, switching accents is as simple as writing a bracketed cue. You can move between American, British, French, Australian — or any supported accent — mid-sentence, mid-script, or mid-character.
This opens new possibilities for creators who want dynamic, global, or expressive voice performances — without needing separate voice models or manual retakes.
Accent emulation is the ability to shift a voice’s pronunciation and rhythm to match different regions or dialects. It’s not a translation — the words stay the same — but the way they’re spoken changes.
With tags like [French accent], [Australian accent], or [Southern US accent], you can direct the model to speak in-region — and switch seamlessly when needed.
Example: [American accent] Could you switch my accent in the old model? [dismissive] Didn’t think so. [cheeky][Australian accent] But you can now — check this out, mate! [French accent] My love… eez like a red, red rose.
This isn’t imitation — it’s native delivery in context. The source voice you use will impact on the quality of the accent defined by an Audio Tag.
Accent emulation gives you creative and cultural range. You can localize content, define character identity, or make dialogue feel geographically grounded — all from a single model.
For example:
Accent cues let you script these experiences directly, without any model switching.
These tags help define regional identity and tone:
These tags can be used with emotional or delivery cues to create layered performance: [British accent][exasperated] You’re telling me *this* is the solution? Brilliant.[Southern US accent][calmly] Don’t worry now. We’ve got time.
Accent switching is especially powerful when combined with Character Performance and Multi-Character Dialogue.
For example:
Each speaker feels distinct — even though the same voice model delivers every line.
With Eleven v3, accent becomes a design element. It’s part of your character’s personality, your story’s setting, or your product’s tone.
And with Audio Tags, you can shift that identity on command — reliably and with expressive control.
Professional Voice Clones (PVCs) are currently not fully optimized for Eleven v3, resulting in potentially lower clone quality compared to earlier models. During this research preview stage it would be best to find an Instant Voice Clone (IVC) or designed voice for your project if you need to use v3 features. PVC optimization for v3 is coming in the near future.
Fine-grained control over timing, rhythm, and emphasis with Eleven v3 Audio Tags. Transform flat delivery into dynamic, performative content.
Create dynamic multi-character dialogue with Eleven v3 Audio Tags. Script overlapping voices, interruptions, and emotional shifts for natural, human-like AI conversations.