Our layered safety framework for AI agents
- Written by
- Louise Meyer-Schoenherr
- Published
- Last updated
ListenListen to this article
As AI agents take on high-stakes work, teams need confidence that their agents will behave safely and predictably.
In ElevenAgents, we leverage a layered safety architecture, spanning guardrails at every stage of a conversation, adversarial testing before launch, monitoring in production, data protection, and independent validation.
While no non-deterministic system can protect against every risk, this comprehensive safety framework means that the leading enterprises and governments building on ElevenAgents can design agents that fail rarely, recover gracefully, and meet a high safety bar.
Protection at every stage of the conversation
You can easily enable and configure controls that protect the three stages of every exchange. This is the basis for Guardrails 2.0 in ElevenAgents.
Input - Real-time checks on what the user sends.
- Manipulation guardrails independently analyze user input and can end conversations that show signs of prompt injection.
Decision - How the agent is guided and kept on track as it decides what to do.
- System prompts should include explicit instructions for the agent like avoiding inappropriate topics, defining how the agent represents itself, and restricting its scope.
- Workflows and procedures can gate the most sensitive actions behind tool calls, such as verifying a caller's identity before any account details are shared.
- The Focus guardrail reinforces the existing system instructions and mitigates drift, which is especially important in long or complex interactions.
Output - Independent validators run in parallel with response generation, adding minimal latency. These can be enabled to block responses that includes sensitive content or violate your policies.
- Content guardrails screen sensitive categories with per-category thresholds.
- Custom guardrails enforce domain-specific rules written in plain language, each evaluated by a lightweight model.
- Exit strategies let you define what happens when a guardrail triggers, like ending the call, transferring to a human, or retrying the response with corrective instructions.
Robust pre-launch testing
ElevenAgents offers robust testing functionality so those building on the platform can find and fix issues before an agent or configuration change goes live.
Simulations let you run a full multi-turn conversation with a simulated user before any real interaction. You define the scenario, and the simulator drives the conversation to resolution, including real or mocked tool calls. Since you control the simulated user, you can make it hostile or manipulative so the same mechanism that validates regular paths also exercises adversarial ones.
Several capabilities extend that coverage:
- Repeat runs execute a test many times, with failures grouped by reason to surface error patterns and edge cases.
- Next reply and tool invocation tests check individual responses and critical actions.
- Channel-specific testing verifies that an agent will perform as expected on every surface it ships on, since an agent outputs can be adjusted by channel (e.g. being concise on a call but thorough over email).
- Red-teaming via API. Tests can also be run through the API with red-teaming SDKs.
Evaluating and improving agents after launch
When you deploy your agents, evaluations run continuously on live conversations. Using an LLM-as-a-judge approach, each call can be automatically evaluated against the criteria you set. You can review conversation outcomes in dashboards and trace issues using detailed conversation logs that include searchable transcripts, sources, tool calls, and guardrail triggers.
Any issues found in production can be easily fed back into the test suite. When a change is proposed, you can use Experiments to route a portion of live traffic to a variant and measure real outcomes before it is applied broadly.
We recommend deploying in stages. Route a limited slice of traffic to your agent, evaluate those conversations, and when you've verified that your agent is working as expected, expand to additional use cases, channels, or regions.
Protecting sensitive data
Agents can handle payment details, health information, and personal identifiers, so it’s important to consider what data is stored, where it is stored, and for how long.
We offer a number of different mechanisms for customers to protect data:
- Zero Retention Mode can be enabled for eligible services to ensure that certain types of data are not retained, designed for demanding regulatory environments.
- Conversation history redaction strips sensitive entities after a call, replacing them with placeholders in text and bleeps in audio, down to individual entity types.
- Configurable retention aligns storage windows to your regulatory requirements.
- Data residency keeps processing within a required jurisdiction.
- Private deployments (VPC) isolate the environment for high-sensitivity workloads.
- Encryption protects data in transit and at rest.
Platform-level safeguards and external validation
Everything above sits on top of our broader platform-level safeguards, a comprehensive program spanning content provenance, abuse detection, prohibited-use enforcement, and model red-teaming. Through our Safety Partnership Program, we also work with companies at the forefront of AI safety, contributing to their products and to emerging industry standards.
We also submit our approach to independent scrutiny, including general security and privacy standards like SOC 2 Type II, ISO 27001, and GDPR, alongside industry and use-case-specific certifications like PCI DSS Level 1 for payment processing and HIPAA for US healthcare. See our trust center for more information.
We also meet newer, AI-native standards like ISO 42001 which governs AI management systems, and AIUC-1 which requires AI agents to withstand quarterly adversarial simulations from independent evaluators. The same capabilities behind AIUC-1 also unlock access to some of the industry's first agent insurance policies.
For large or complex rollouts, our Forward Deployed Engineers work alongside your team to design and implement your agent strategy, including configuration, guardrails, testing, monitoring, and roll-out strategy.
Conclusion
Our approach to safety in ElevenAgents is layered, with each element reinforcing the others:
- Agent configuration: System prompts, workflows, and procedures that shape behavior, with the most sensitive actions gated behind tool calls.
- Guardrails: Independent checks at every stage: manipulation detection at the input, Focus at the decision, and content and custom validators at the output, with configurable exit strategies.
- Testing: Adversarial simulations, repeat runs, channel-specific tests, and red-teaming before launch.
- Monitoring: Staged rollout, then continuous evaluation, searchable logs and transcripts, and a feedback loop after launch.
- Data protection: Redaction, Zero Retention Mode, configurable retention, data residency, private (VPC) deployments, and encryption.
- External validation: Certifications, AIUC-1's independent adversarial testing, agent insurance, red-teaming, and expert support.

.webp&w=3840&q=80)


