Introducing Experiments in ElevenAgents

Last updated Mar 6, 2026 • 3 minutes reading time

Kacper Walentynowicz, Full Stack Engineer ,

Lauren Rothwell, Growth

The most data-driven way to improve real-world agent performance.

Contact Sales

Today, we’re introducing Experiments in ElevenAgents - a controlled way to run A/B tests on production traffic and measure what works before rolling changes out broadly.

As conversational agents take on high-impact workflows across support, sales, and operations, small configuration changes can materially affect business outcomes. A different prompt structure, a refined workflow branch, a new voice, or a tighter guardrail can change CSAT, containment, conversion, latency, and cost.

Experiments gives teams a structured way to test those changes using live traffic and measurable outcomes - without sacrificing safety or control.

From configuration changes to measurable lift

Without structured experimentation, optimization relies on intuition. A prompt tweak "feels" better. A workflow adjustment "should" improve containment. A new escalation path "seems" more efficient.

Experiments replaces guesswork with evidence. Teams can introduce controlled variants, expose them to a defined percentage of real customer interactions, and measure impact across business and operational metrics.

This brings modern A/B testing practices to conversational agents - using production data instead of subjective judgment.

How Experiments works

Experiments is built directly into ElevenLabs Agents and follows a simple, auditable workflow.

1. Create a new variant

Start from an existing agent version and create a variant.

Modify prompts, workflows, tools, voice, knowledge bases, or guardrails. Each change is tied to a specific, versioned configuration with clear diffs and attribution.

2. Route a controlled slice of traffic

Define what percentage of live conversations should be routed to the new variant.

Traffic splitting is controlled and auditable, ensuring teams can test safely without disrupting the majority of users.

3. Measure impact across key metrics

Compare performance across variants using real production conversations.

Teams can measure outcomes such as:

CSAT
Containment rate
Conversion
Average handling time
Median agent response latency
Cost per agent resolution

Because testing runs on live traffic, results reflect actual user behavior, not synthetic benchmarks.

4. Promote the winner

Once a variant demonstrates measurable improvement, teams can migrate more traffic to the higher-performing version.

Full version history is preserved, enabling fast rollbacks if needed.

Use cases across teams

Experiments supports continuous optimization across customer-facing and operational workflows.

CX teams can test whether a revised escalation flow improves CSAT without increasing handling time.
Revenue teams can test whether a more direct tone or different qualification logic increases conversion.
Operations teams can measure whether tool logic changes reduce average handling time or infrastructure cost.

Each experiment is tied to a specific agent version, ensuring every performance shift is attributable to a defined configuration change.

Enterprise-ready by design

Experiments is built on top of ElevenLabs Agents’ versioning and audit trail.

Every experiment includes:

Controlled, auditable traffic routing.
Clear attribution to specific agent versions.
Structured rollbacks.
Full conversation history tied to version state.

This allows teams to move quickly while maintaining compliance, traceability, and governance.

Rather than choosing between speed and control, teams get both.

Continuous optimization for conversational agents

Conversational agents should not be static. They should improve continuously as teams learn from production data.

With this workflow, teams can iterate systematically, quantify impact, and deploy higher-performing conversational agents with confidence.

Teams can now configure, deploy, and optimize higher-performing conversational agents with confidence using real production data.

Learn more: https://elevenlabs.io/docs/eleven-agents/operate/experiments

Explore articles by the ElevenLabs team

Product

Product

Introducing Expressive Mode for ElevenAgents

More expressive voice agents, built for real-world customer conversations.

Product

Product

Introducing Versioning for ElevenLabs Agents

Complete visibility into agent configuration changes and safe, staged rollouts of new versions.

Create with the highest quality AI Audio

Contact Sales Sign up