Expressive mode

Enable your AI agent to use expressive tags for enhanced emotional delivery and speech control.

Expressive mode is currently in private beta and available only to select workspaces. Contact your account manager for access.

Overview

Expressive mode is a specialized Text to Speech model designed specifically for expressive conversational agents. It enables your ElevenLabs agent to generate expressive tags that control how text is spoken, adding nuanced emotional delivery and speech patterns to your agent’s responses. When enabled, the LLM can output special tags in its responses that the expressive TTS model interprets to modify the audio synthesis.

Key capabilities:

  • Emotional expression: Add laughter, crying, or other emotional cues
  • Speech style control: Whisper, slow down, or modify delivery
  • Dynamic emphasis: Apply expressive tags contextually during conversation
  • Natural interactions: Create more human-like and engaging conversations

How it works

When expressive mode is enabled, your agent can output special tags in its responses that control the audio synthesis. These tags apply to approximately 4-5 words following the tag, creating natural expressive speech.

Expressive tags typically affect the next 4-5 words of generated speech before returning to normal delivery. The exact duration may vary based on the content and context.

Configuration

Enabling expressive mode

1

Access your agent settings

Navigate to your agent configuration in the ElevenLabs dashboard.

2

Enable the Eleven Expressive model

Locate the Text to Speech model family option under the Agent Voice tab and enable V3 Conversational for your agent.

Enabling expressive mode

3

Update system prompt

Add instructions to your agent’s system prompt to explain the expressive tag functionality.

System prompt configuration

To use expressive mode effectively, you must instruct your LLM about this capability in your system prompt. Here’s a recommended template:

You are a conversational AI agent that can generate expressive speech. You have access to
expressive tags that control how your responses are spoken.
You can use expressive tags in your responses to add emotional nuance and speech style
control. Common effective tags include simple one-word adjectives or verbs like:
- [laughs] - Adds laughter to the speech
- [cries] - Adds crying or emotional tone
- [whispers] - Lowers volume for whispering
- [slow] - Slows down speech delivery
- [excited] - Adds excitement to the delivery
- [sighs] - Adds a sighing quality
Feel free to experiment with other single-word expressive tags that match the emotional
context of your response. Each tag affects approximately the next 4-5 words.
Example: "That's amazing! [laughs] I can't believe it worked on the first try."

Encourage your LLM to use specific tags by providing examples in your system prompt. This helps the model understand when and how to apply expressive elements effectively.

Implementation examples

Basic usage

User: I just got accepted to my dream university!
Agent: That's incredible news! [laughs] I'm so happy for you! This is just the beginning of an amazing journey.

Advanced patterns

Once upon a time, there was a young adventurer who discovered a hidden cave.
[whispers] Inside, they found something extraordinary. [slow] A glowing crystal that pulsed
with ancient magic. [laughs] They couldn't believe their luck!

Best practices

Ensure the expressive tags match the emotional context of the conversation. Using [laughs] during a serious discussion can feel inappropriate and undermine user trust.

Different voices may render expressive tags differently. Test your agent with various voices to ensure the expressive elements work as intended.

Provide clear, detailed instructions in your system prompt about when and how to use expressive tags. Include specific examples to guide the LLM’s behavior.

Expressive elements like laughter or crying may be interpreted differently across cultures. Consider your target audience when implementing expressive mode.

Track how users respond to expressive elements. Collect feedback to refine when and how your agent uses these tags.

Limitations

  • Expressive mode is currently in alpha
  • Tag effects last approximately 4-5 words
  • Not all voices may support all expressive tags equally

FAQ

Each expressive tag typically affects the next 4-5 words of speech. The exact duration may vary slightly based on word length and context, but the system is designed to return to normal delivery naturally.

The system will do its best to render the expressive element, but results may vary depending on the voice’s capabilities. Some voices may be more expressive than others.

Expressive mode is currently available only to select workspaces in private beta. If you don’t see the option in your agent settings, contact your ElevenLabs account manager to inquire about access.

You can prompt the LLM to use any expressive tag, but certain tags are more likely to work effectively. Common one-word adjectives and verbs (like [laughs], [cries], [whispers], [slow], [excited], [sighs]) tend to produce the most reliable results. The expressive TTS model interprets these tags contextually.

Expressive mode uses the same pricing structure as standard agent conversations. There are no additional charges for using expressive tags.

Include clear examples and guidelines in your system prompt. Provide specific scenarios where each tag should be used, and test thoroughly. Consider using few-shot examples to demonstrate appropriate usage patterns.