
Introducing Expressive Mode for ElevenAgents
More expressive voice agents, built for real-world customer conversations.
The most data-driven way to improve real-world agent performance.
Today, we’re introducing Experiments in ElevenAgents - a controlled way to run A/B tests on production traffic and measure what works before rolling changes out broadly.
As conversational agents take on high-impact workflows across support, sales, and operations, small configuration changes can materially affect business outcomes. A different prompt structure, a refined workflow branch, a new voice, or a tighter guardrail can change CSAT, containment, conversion, latency, and cost.
Experiments gives teams a structured way to test those changes using live traffic and measurable outcomes - without sacrificing safety or control.
.webp&w=3840&q=95)
Without structured experimentation, optimization relies on intuition. A prompt tweak "feels" better. A workflow adjustment "should" improve containment. A new escalation path "seems" more efficient.
Experiments replaces guesswork with evidence. Teams can introduce controlled variants, expose them to a defined percentage of real customer interactions, and measure impact across business and operational metrics.
This brings modern A/B testing practices to conversational agents - using production data instead of subjective judgment.
Experiments is built directly into ElevenLabs Agents and follows a simple, auditable workflow.
Start from an existing agent version and create a variant.
Modify prompts, workflows, tools, voice, knowledge bases, or guardrails. Each change is tied to a specific, versioned configuration with clear diffs and attribution.
Define what percentage of live conversations should be routed to the new variant.
Traffic splitting is controlled and auditable, ensuring teams can test safely without disrupting the majority of users.
Compare performance across variants using real production conversations.
Teams can measure outcomes such as:
Because testing runs on live traffic, results reflect actual user behavior, not synthetic benchmarks.
Once a variant demonstrates measurable improvement, teams can migrate more traffic to the higher-performing version.
Full version history is preserved, enabling fast rollbacks if needed.
Experiments supports continuous optimization across customer-facing and operational workflows.
Each experiment is tied to a specific agent version, ensuring every performance shift is attributable to a defined configuration change.
Experiments is built on top of ElevenLabs Agents’ versioning and audit trail.
Every experiment includes:
This allows teams to move quickly while maintaining compliance, traceability, and governance.
Rather than choosing between speed and control, teams get both.
Conversational agents should not be static. They should improve continuously as teams learn from production data.
With this workflow, teams can iterate systematically, quantify impact, and deploy higher-performing conversational agents with confidence.
Teams can now configure, deploy, and optimize higher-performing conversational agents with confidence using real production data.
Learn more: https://elevenlabs.io/docs/eleven-agents/operate/experiments

More expressive voice agents, built for real-world customer conversations.

Complete visibility into agent configuration changes and safe, staged rollouts of new versions.