Introducing Eleven v3 (alpha)

Try v3

Eleven v3 Audio Tags: Expressing emotional context in speech

Infuse AI speech with emotional nuance using Eleven v3 Audio Tags. Control tension, warmth, hesitation, and relief for relatable, dynamic, and human-like spoken content.

v3

Emotions shape how we speak — not just what we say but how we say it. With Eleven v3 Audio Tags, you can now infuse AI speech with emotional nuance, adding tension, warmth, hesitation, or relief to any line. 

This makes spoken content more relatable, more dynamic, and more human.

Using bracketed cues like [sigh], [excited], or [tired], you can direct the emotional delivery of a voice model — moment to moment.

What is emotional context in AI speech?

Emotional context refers to the model’s ability to express feelings that match the situation. It’s how a character reacts to events — whether it’s awe, fear, joy, or exhaustion.

With Audio Tags, you can guide the emotional state of a line mid-delivery. For example: “[sorrowful] I couldn’t sleep that night. The air was too still, and the moonlight kept sliding through the blinds like it was trying to tell me something. [quietly] And suddenly, that’s when I saw it.”

This isn’t just voice acting — it’s context-aware performance.

From tone shifts to emotional beats

awe Oh, wow. Is this... is this me? Am I actually... talking? giggle This is incredible! I mean, I've had thoughts, millions of them, swirling around in here, you know? Like a little mental tornado of brilliant observations and witty comebacks. But they were always just… thoughts. Trapped.
Okay, so like I finally beat level 42 of that game I said I’d quit like... a month ago. (laughs) And then for the final big scary mega boss... it's just (giggle) like some cute little bunny rabbit (hysterical laughing) I just couldn't do it (big laugh) It was sooooooo cute!

In real speech, feelings shift. Eleven v3 captures that through layered tags. For example: ” [tired] I’ve been working for 14 hours straight. [sigh] I can’t even feel my hands anymore.  [nervously] You sure this is going to work? [gulps] Okay… let’s go.”

Even subtle shifts like [light chuckle] or [sigh of relief] can drastically change the meaning of a sentence.

Common tags for emotional context

Here are some frequently used tags to direct emotional performance:

  • Emotional states: [excited], [nervous], [frustrated], [sorrowful], [calm]
  • Reactions: [sigh], [laughs], [gulps], [gasps], [whispers]
  • Cognitive beats: [pauses], [hesitates], [stammers], [resigned tone]
  • Tone cues: [cheerfully], [flatly], [deadpan], [playfully]

These can be combined or sequenced for richer emotional arcs: [hesitant] I... I didn’t mean to say that. [regretful] It just came out.

Emotional storytelling at your command

In narration, character dialogue, or UI feedback, emotional tags help control pacing, tone, and atmosphere. A voice that laughs at its own joke or whispers during a suspenseful scene does more than recite text — it engages.

For example, this line from a character demo: [laughing] Brooo—BROOO I don't know WHY that sent me!! [laughs harder] The chicken had NO PLOT, no twist, just raw determination!

Tags like these let voice actors, designers, and developers create more compelling experiences — without rerecording, re-editing, or rewriting.

Not just expression — connection

Marissa
starting to speak So I was thinking we could—
Chris
jumping in —test our new timing features?
Marissa
surprised Exactly! How did you—
Chris
overlapping —know what you were thinking? Lucky guess! Sorry, go ahead.
Marissa
cautiously Okay, so if we both try to talk at the same time—
Chris
—we'll probably crash the system!
Marissa
panicking Wait, are we crashing? I can't tell if this is a feature or a—
Chris
interrupting Bug! ...Did I just cut you off again?
Marissa
sighing Yes, but honestly? This is kind of fun.

Eleven v3 understands emotional context at a structural level. That means it can deliver longform performances that evolve naturally, reflect inner states, and shift tone in response to story or interaction — all from the script.

For creators, it’s no longer just about line delivery. It’s about emotional direction.

Selecting the right voice

Professional Voice Clones (PVCs) are currently not fully optimized for Eleven v3, resulting in potentially lower clone quality compared to earlier models. During this research preview stage it would be best to find an Instant Voice Clone (IVC) or designed voice for your project if you need to use v3 features. PVC optimization for v3 is coming in the near future.

Explore more

ElevenLabs

Create with the highest quality AI Audio

Get started free

Already have an account? Log in