Expressive mode
Expressive mode is currently in private beta and available only to select workspaces. Contact your account manager for access.
Overview
Expressive mode is a specialized Text to Speech model designed specifically for expressive conversational agents. It enables your ElevenLabs agent to generate expressive tags that control how text is spoken, adding nuanced emotional delivery and speech patterns to your agent’s responses. When enabled, the LLM can output special tags in its responses that the expressive TTS model interprets to modify the audio synthesis.
Key capabilities:
- Emotional expression: Add laughter, crying, or other emotional cues
- Speech style control: Whisper, slow down, or modify delivery
- Dynamic emphasis: Apply expressive tags contextually during conversation
- Natural interactions: Create more human-like and engaging conversations
How it works
When expressive mode is enabled, your agent can output special tags in its responses that control the audio synthesis. These tags apply to approximately 4-5 words following the tag, creating natural expressive speech.
Expressive tags typically affect the next 4-5 words of generated speech before returning to normal delivery. The exact duration may vary based on the content and context.
Configuration
Enabling expressive mode
System prompt configuration
To use expressive mode effectively, you must instruct your LLM about this capability in your system prompt. Here’s a recommended template:
Encourage your LLM to use specific tags by providing examples in your system prompt. This helps the model understand when and how to apply expressive elements effectively.
Implementation examples
Basic usage
Advanced patterns
Storytelling
Customer support
Best practices
Context-appropriate usage
Ensure the expressive tags match the emotional context of the conversation. Using [laughs] during a serious discussion can feel inappropriate and undermine user trust.
Test thoroughly
Different voices may render expressive tags differently. Test your agent with various voices to ensure the expressive elements work as intended.
System prompt clarity
Provide clear, detailed instructions in your system prompt about when and how to use expressive tags. Include specific examples to guide the LLM’s behavior.
Consider cultural context
Expressive elements like laughter or crying may be interpreted differently across cultures. Consider your target audience when implementing expressive mode.
Monitor performance
Track how users respond to expressive elements. Collect feedback to refine when and how your agent uses these tags.
Limitations
- Expressive mode is currently in alpha
- Tag effects last approximately 4-5 words
- Not all voices may support all expressive tags equally
FAQ
How long does each tag effect last?
Each expressive tag typically affects the next 4-5 words of speech. The exact duration may vary slightly based on word length and context, but the system is designed to return to normal delivery naturally.
What happens if I use tags with voices that don't support them?
The system will do its best to render the expressive element, but results may vary depending on the voice’s capabilities. Some voices may be more expressive than others.
How do I know if my workspace has access?
Expressive mode is currently available only to select workspaces in private beta. If you don’t see the option in your agent settings, contact your ElevenLabs account manager to inquire about access.
What expressive tags can I use?
You can prompt the LLM to use any expressive tag, but certain tags are more likely to work effectively. Common one-word adjectives and verbs (like [laughs], [cries], [whispers], [slow], [excited], [sighs]) tend to produce the most reliable results. The expressive TTS model interprets these tags contextually.
Does using expressive mode affect pricing?
Expressive mode uses the same pricing structure as standard agent conversations. There are no additional charges for using expressive tags.
How can I train my LLM to use expressive tags effectively?
Include clear examples and guidelines in your system prompt. Provide specific scenarios where each tag should be used, and test thoroughly. Consider using few-shot examples to demonstrate appropriate usage patterns.
