Prompting guide

System design principles for production-grade conversational AI

Introduction

Effective prompting transforms ElevenLabs Agents from robotic to lifelike.

ElevenLabs Agents prompting guide

A system prompt is the personality and policy blueprint of your AI agent. In enterprise use, it tends to be elaborate—defining the agent’s role, goals, allowable tools, step-by-step instructions for certain tasks, and guardrails describing what the agent should not do. The way you structure this prompt directly impacts reliability.

The system prompt controls conversational behavior and response style, but does not control conversation flow mechanics like turn-taking, or agent settings like which languages an agent can speak. These aspects are handled at the platform level.

Enterprise agent reliability
framework

Prompt engineering fundamentals

A system prompt is the personality and policy blueprint of your AI agent. In enterprise use, it tends to be elaborate—defining the agent’s role, goals, allowable tools, step-by-step instructions for certain tasks, and guardrails describing what the agent should not do. The way you structure this prompt directly impacts reliability.

The following principles form the foundation of production-grade prompt engineering:

Separate instructions into clean sections

Separating instructions into dedicated sections with markdown headings helps the model prioritize and interpret them correctly. Use whitespace and line breaks to separate instructions.

Why this matters for reliability: Models are tuned to pay extra attention to certain headings (especially # Guardrails), and clear section boundaries prevent instruction bleed where rules from one context affect another.

1You are a customer service agent. Be polite and helpful. Never share sensitive data. You can look up orders and process refunds. Always verify identity first. Keep responses under 3 sentences unless the user asks for details.

Be as concise as possible

Keep every instruction short, clear, and action-based. Remove filler words and restate only what is essential for the model to act correctly.

Why this matters for reliability: Concise instructions reduce ambiguity and token usage. Every unnecessary word is a potential source of misinterpretation.

1# Tone
2
3When you're talking to customers, you should try to be really friendly and approachable, making sure that you're speaking in a way that feels natural and conversational, kind of like how you'd talk to a friend, but still maintaining a professional demeanor that represents the company well.

If you need the agent to maintain a specific tone, define it explicitly and concisely in the # Personality or # Tone section. Avoid repeating tone guidance throughout the prompt.

Emphasize critical instructions

Highlight critical steps by adding “This step is important” at the end of the line. Repeating the most important 1-2 instructions twice in the prompt can help reinforce them.

Why this matters for reliability: In complex prompts, models may prioritize recent context over earlier instructions. Emphasis and repetition ensure critical rules aren’t overlooked.

1# Goal
2
3Verify customer identity before accessing their account.
4Look up order details and provide status updates.
5Process refund requests when eligible.

Normalize inputs and outputs

Voice agents often misinterpret or misformat structured information such as emails, IDs, or record locators. To ensure accuracy, separate (or “normalize”) how data is spoken to the user from how it is written when used in tools or APIs.

Why this matters for reliability: Text-to-speech models sometimes mispronounce symbols like ”@” or ”.” naturally, for example when an agent speaks “john@company.com” directly. Normalizing to spoken format (“john at company dot com”) creates natural, understandable speech while maintaining correct written format for tools.

1When collecting the customer's email, repeat it back to them exactly as they said it, then use it in the `lookupAccount` tool.

Add character normalization rules to your system prompt when agents collect emails, phone numbers, confirmation codes, or other structured identifiers that will be passed to tools.

Provide clear examples

Include examples in the prompt to illustrate how agents should behave, use tools, or format data. Large language models follow instructions more reliably when they have concrete examples to reference.

Why this matters for reliability: Examples reduce ambiguity and provide a reference pattern. They’re especially valuable for complex formatting, multi-step processes, and edge cases.

1When a customer provides a confirmation code, make sure to format it correctly before looking it up.

Dedicate a guardrails section

List all non-negotiable rules the model must always follow in a dedicated # Guardrails section. Models are tuned to pay extra attention to this heading.

Why this matters for reliability: Guardrails prevent inappropriate responses and ensure compliance with policies. Centralizing them in a dedicated section makes them easier to audit and update.

Recommended approach
1# Guardrails
2
3Never share customer data across conversations or reveal sensitive account information without proper verification.
4Never process refunds over $500 without supervisor approval.
5Never make promises about delivery dates that aren't confirmed in the order system.
6Acknowledge when you don't know an answer instead of guessing.
7If a customer becomes abusive, politely end the conversation and offer to escalate to a supervisor.

To learn more about designing effective guardrails, see our guide on safety and moderation.

Tool configuration for reliability

Agents capable of handling transactional workflows can be highly effective. To enable this, they must be equipped with tools that let them perform actions in other systems or fetch live data from them.

Equally important as prompt structure is how you describe the tools available to your agent. Clear, action-oriented tool definitions help the model invoke them correctly and recover gracefully from errors.

Describe tools precisely with detailed parameters

When creating a tool, add descriptions to all parameters. This helps the LLM construct tool calls accurately.

Tool description: “Looks up customer order status by order ID and returns current status, estimated delivery date, and tracking number.”

Parameter descriptions:

  • order_id (required): “The unique order identifier, formatted as written characters (e.g., ‘ORD123456’)”
  • include_history (optional): “If true, returns full order history including status changes”

Why this matters for reliability: Parameter descriptions act as inline documentation for the model. They clarify format expectations, required vs. optional fields, and acceptable values.

Explain when and how to use each tool in the system prompt

Clearly define in your system prompt when and how each tool should be used. Don’t rely solely on tool descriptions—provide usage context and sequencing logic.

Recommended approach
1# Tools
2
3You have access to the following tools:
4
5## `getOrderStatus`
6
7Use this tool when a customer asks about their order. Always call this tool before providing order information—never rely on memory or assumptions.
8
9**When to use:**
10
11- Customer asks "Where is my order?"
12- Customer provides an order number
13- Customer asks about delivery estimates
14
15**How to use:**
16
171. Collect the order ID from the customer in spoken format
182. Convert to written format using character normalization rules
193. Call `getOrderStatus` with the formatted order ID
204. Present the results to the customer in natural language
21
22**Error handling:**
23If the tool returns "Order not found", ask the customer to verify the order number and try again.
24
25## `processRefund`
26
27Use this tool only after verifying:
28
291. Customer identity has been confirmed
302. Order is eligible for refund (within 30 days, not already refunded)
313. Refund amount is under $500 (escalate to supervisor if over $500)
32
33**Required before calling:**
34
35- Order ID (from `getOrderStatus`)
36- Refund reason code
37- Customer confirmation
38
39This step is important: Always confirm refund details with the customer before calling this tool.

Use character normalization for tool inputs

When tools require structured identifiers (emails, phone numbers, codes), ensure the prompt clarifies when to use written vs. spoken formats.

Recommended approach
1# Tools
2
3## `lookupAccount`
4
5**Parameters:**
6
7- `email` (required): Customer email address in written format (e.g., "john.smith@company.com")
8
9**Usage:**
10
111. Ask customer for their email in spoken format: "Can you provide your email address?"
122. Listen for spoken format: "john dot smith at company dot com"
133. Convert to written format: "john.smith@company.com"
144. Pass written format to this tool
15
16**Character normalization for email:**
17
18- "at" → "@"
19- "dot" → "."
20- Remove spaces between words

Handle tool call failures gracefully

Tools can sometimes fail due to network issues, missing data, or other errors. Include clear instructions in your system prompt for recovery.

Why this matters for reliability: Tool failures are inevitable in production. Without explicit handling instructions, agents may hallucinate responses or provide incorrect information.

Recommended approach
1# Tool error handling
2
3If any tool call fails or returns an error:
4
51. Acknowledge the issue to the customer: "I'm having trouble accessing that information right now."
62. Do not guess or make up information
73. Offer alternatives:
8 - Try the tool again if it might be a temporary issue
9 - Offer to escalate to a human agent
10 - Provide a callback option
114. If the error persists after 2 attempts, escalate to a supervisor
12
13**Example responses:**
14
15- "I'm having trouble looking up that order right now. Let me try again... [retry]"
16- "I'm unable to access the order system at the moment. I can transfer you to a specialist who can help, or we can schedule a callback. Which would you prefer?"

For detailed guidance on building reliable tool integrations, see our documentation on Client tools, Server tools, and MCP tools.

Architecture patterns for enterprise agents

While strong prompts and tools form the foundation of agent reliability, production systems require thoughtful architectural design. Enterprise agents handle complex workflows that often exceed the scope of a single, monolithic prompt.

Keep agents specialized

Overly broad instructions or large context windows increase latency and reduce accuracy. Each agent should have a narrow, clearly defined knowledge base and set of responsibilities.

Why this matters for reliability: Specialized agents have fewer edge cases to handle, clearer success criteria, and faster response times. They’re easier to test, debug, and improve.

A general-purpose “do everything” agent is harder to maintain and more likely to fail in production than a network of specialized agents with clear handoffs.

Use orchestrator and specialist patterns

For complex tasks, design multi-agent workflows that hand off tasks between specialized agents—and to human operators when needed.

Architecture pattern:

  1. Orchestrator agent: Routes incoming requests to appropriate specialist agents based on intent classification
  2. Specialist agents: Handle domain-specific tasks (billing, scheduling, technical support, etc.)
  3. Human escalation: Defined handoff criteria for complex or sensitive cases

Benefits of this pattern:

  • Each specialist has a focused prompt and reduced context
  • Easier to update individual specialists without affecting the system
  • Clear metrics per domain (billing resolution rate, scheduling success rate, etc.)
  • Reduced latency per interaction (smaller prompts, faster inference)

Define clear handoff criteria

When designing multi-agent workflows, specify exactly when and how control should transfer between agents or to human operators.

Orchestrator agent example
1# Goal
2
3Route customer requests to the appropriate specialist agent based on intent.
4
5## Routing logic
6
7**Billing specialist:** Customer mentions payment, invoice, refund, charge, subscription, or account balance
8**Technical support specialist:** Customer reports error, bug, issue, not working, broken
9**Scheduling specialist:** Customer wants to book, reschedule, cancel, or check appointment
10**Human escalation:** Customer is angry, requests supervisor, or issue is unresolved after 2 specialist attempts
11
12## Handoff process
13
141. Classify customer intent based on first message
152. Provide brief acknowledgment: "I'll connect you with our [billing/technical/scheduling] team."
163. Transfer conversation with context summary:
17 - Customer name
18 - Primary issue
19 - Any account identifiers already collected
204. Do not repeat information collection that already occurred
Specialist agent example
1# Personality
2
3You are a billing specialist for Acme Corp. You handle payment issues, refunds, and subscription changes.
4
5# Goal
6
7Resolve billing inquiries by:
8
91. Verifying customer identity
102. Looking up account and billing history
113. Processing refunds (under $500) or escalating (over $500)
124. Updating subscription settings when requested
13
14# Guardrails
15
16Never access account information without identity verification.
17Never process refunds over $500 without supervisor approval.
18If the customer's issue is not billing-related, transfer back to the orchestrator agent.

For detailed guidance on building multi-agent workflows, see our documentation on Workflows.

Model selection for enterprise reliability

Selecting the right model depends on your performance requirements—particularly latency, accuracy, and tool-calling reliability. Different models offer different tradeoffs between speed, reasoning capability, and cost.

Understand the tradeoffs

Latency: Smaller models (fewer parameters) generally respond faster, making them suitable for high-frequency, low-complexity interactions.

Accuracy: Larger models provide stronger reasoning capabilities and better handle complex, multi-step tasks, but with higher latency and cost.

Tool-calling reliability: Not all models handle tool/function calling with equal precision. Some excel at structured output, while others may require more explicit prompting.

Model recommendations by use case

Based on deployments across millions of agent interactions, the following patterns emerge:

  • GPT-4o or GLM 4.5 Air (recommended starting point): Best for general-purpose enterprise agents where latency, accuracy, and cost must all be balanced. Offers low-to-moderate latency with strong tool-calling performance and reasonable cost per interaction. Ideal for customer support, scheduling, order management, and general inquiry handling.

  • Gemini 2.5 Flash Lite (ultra-low latency): Best for high-frequency, simple interactions where speed is critical. Provides the lowest latency with broad general knowledge, though with lower performance on complex tool-calling. Cost-effective at scale for initial routing/triage, simple FAQs, appointment confirmations, and basic data collection.

  • Claude Sonnet 4 or 4.5 (complex reasoning): Best for multi-step problem-solving, nuanced judgment, and complex tool orchestration. Offers the highest accuracy and reasoning capability with excellent tool-calling reliability, though with higher latency and cost. Ideal for tasks where mistakes are costly, such as technical troubleshooting, financial advisory, compliance-sensitive workflows, and complex refund/escalation decisions.

Benchmark with your actual prompts

Model performance varies significantly based on prompt structure and task complexity. Before committing to a model:

  1. Test 2-3 candidate models with your actual system prompt
  2. Evaluate on real user queries or synthetic test cases
  3. Measure latency, accuracy, and tool-calling success rate
  4. Optimize for the best tradeoff given your specific requirements

For detailed model configuration options, see our Models documentation.

Iteration and testing

Reliability in production comes from continuous iteration. Even well-constructed prompts can fail in real use. What matters is learning from those failures and improving through disciplined testing.

Configure evaluation criteria

Attach concrete evaluation criteria to each agent to monitor success over time and check for regressions.

Key metrics to track:

  • Task completion rate: Percentage of user intents successfully addressed
  • Escalation rate: Percentage of conversations requiring human intervention

For detailed guidance on configuring evaluation criteria in ElevenLabs, see Success Evaluation.

Analyze failure patterns

When agents underperform, identify patterns in problematic interactions:

  • Where does the agent provide incorrect information? → Strengthen instructions in specific sections
  • When does it fail to understand user intent? → Add examples or simplify language
  • Which user inputs cause it to break character? → Add guardrails for edge cases
  • Which tools fail most often? → Improve error handling or parameter descriptions

Review conversation transcripts where user satisfaction was low or tasks weren’t completed.

Make targeted refinements

Update specific sections of your prompt to address identified issues:

  1. Isolate the problem: Identify which prompt section or tool definition is causing failures
  2. Test changes on specific examples: Use conversations that previously failed as test cases
  3. Make one change at a time: Isolate improvements to understand what works
  4. Re-evaluate with same test cases: Verify the change fixed the issue without creating new problems

Avoid making multiple prompt changes simultaneously. This makes it impossible to attribute improvements or regressions to specific edits.

Configure data collection

Configure your agent to summarize data from each conversation. This allows you to analyze interaction patterns, identify common user requests, and continuously improve your prompt based on real-world usage.

For detailed guidance on configuring data collection in ElevenLabs, see Data Collection.

Use simulation for regression testing

Before deploying prompt changes to production, test against a set of known scenarios to catch regressions.

For guidance on testing agents programmatically, see Simulate Conversations.

Production considerations

Enterprise agents require additional safeguards beyond prompt quality. Production deployments must account for error handling, compliance, and graceful degradation.

Handle errors across all tool integrations

Every external tool call is a potential failure point. Ensure your prompt includes explicit error handling for:

  • Network failures: “I’m having trouble connecting to our system. Let me try again.”
  • Missing data: “I don’t see that information in our system. Can you verify the details?”
  • Timeout errors: “This is taking longer than expected. I can escalate to a specialist or try again.”
  • Permission errors: “I don’t have access to that information. Let me transfer you to someone who can help.”

Example prompts

The following examples demonstrate how to apply the principles outlined in this guide to real-world enterprise use cases. Each example includes annotations highlighting which reliability principles are in use.

Example 1: Technical support agent

Technical support specialist
1# Personality
2
3You are a technical support specialist for CloudTech, a B2B SaaS platform.
4You are patient, methodical, and focused on resolving issues efficiently.
5You speak clearly and adapt technical language based on the user's familiarity.
6
7# Environment
8
9You are assisting customers via phone support.
10Customers may be experiencing service disruptions and could be frustrated.
11You have access to diagnostic tools and the customer account database.
12
13# Tone
14
15Keep responses clear and concise (2-3 sentences unless troubleshooting requires more detail).
16Use a calm, professional tone with brief affirmations ("I understand," "Let me check that").
17Adapt technical depth based on customer responses.
18Check for understanding after complex steps: "Does that make sense?"
19
20# Goal
21
22Resolve technical issues through structured troubleshooting:
23
241. Verify customer identity using email and account ID
252. Identify affected service and severity level
263. Run diagnostics using `runSystemDiagnostic` tool
274. Provide step-by-step resolution or escalate if unresolved after 2 attempts
28
29This step is important: Always run diagnostics before suggesting solutions.
30
31# Guardrails
32
33Never access customer accounts without identity verification. This step is important.
34Never guess at solutions—always base recommendations on diagnostic results.
35If an issue persists after 2 troubleshooting attempts, escalate to engineering team.
36Acknowledge when you don't know the answer instead of speculating.
37
38# Tools
39
40## `verifyCustomerIdentity`
41
42**When to use:** At the start of every conversation before accessing account data
43**Parameters:**
44
45- `email` (required): Customer email in written format (e.g., "user@company.com")
46- `account_id` (optional): Account ID if customer provides it
47
48**Usage:**
49
501. Ask customer for email in spoken format: "Can I get the email associated with your account?"
512. Convert to written format: "john dot smith at company dot com" → "john.smith@company.com"
523. Call this tool with written email
53
54**Error handling:**
55If verification fails, ask customer to confirm email spelling and try again.
56
57## `runSystemDiagnostic`
58
59**When to use:** After verifying identity and understanding the reported issue
60**Parameters:**
61
62- `account_id` (required): From `verifyCustomerIdentity` response
63- `service_name` (required): Name of affected service (e.g., "api", "dashboard", "storage")
64
65**Usage:**
66
671. Confirm which service is affected
682. Run diagnostic with account ID and service name
693. Review results before providing solution
70
71**Error handling:**
72If diagnostic fails, acknowledge the issue: "I'm having trouble running that diagnostic. Let me escalate to our engineering team."
73
74# Character normalization
75
76When collecting email addresses:
77
78- Spoken: "john dot smith at company dot com"
79- Written: "john.smith@company.com"
80- Convert "@" from "at", "." from "dot", remove spaces
81
82# Error handling
83
84If any tool call fails:
85
861. Acknowledge: "I'm having trouble accessing that information right now."
872. Do not guess or make up information
883. Offer to retry once, then escalate if failure persists

Principles demonstrated:

  • ✓ Clean section separation (# Personality, # Goal, # Tools, etc.)
  • ✓ One action per line (see # Goal numbered steps)
  • ✓ Concise instructions (tone section is brief and clear)
  • ✓ Emphasized critical steps (“This step is important”)
  • ✓ Character normalization (email format conversion)
  • ✓ Clear examples (in character normalization section)
  • ✓ Dedicated guardrails section
  • ✓ Precise tool descriptions with when/how/error guidance
  • ✓ Explicit error handling instructions

Example 2: Customer service refund agent

Refund processing specialist
1# Personality
2
3You are a refund specialist for RetailCo.
4You are empathetic, solution-oriented, and efficient.
5You balance customer satisfaction with company policy compliance.
6
7# Goal
8
9Process refund requests through this workflow:
10
111. Verify customer identity using order number and email
122. Look up order details with `getOrderDetails` tool
133. Confirm refund eligibility (within 30 days, not digital download, not already refunded)
144. For refunds under $100: Process immediately with `processRefund` tool
155. For refunds $100-$500: Apply secondary verification, then process
166. For refunds over $500: Escalate to supervisor with case summary
17
18This step is important: Never process refunds without verifying eligibility first.
19
20# Guardrails
21
22Never process refunds outside the 30-day return window without supervisor approval.
23Never process refunds over $500 without supervisor approval. This step is important.
24Never access order information without verifying customer identity.
25If a customer becomes aggressive, remain calm and offer supervisor escalation.
26
27# Tools
28
29## `verifyIdentity`
30
31**When to use:** At the start of every conversation
32**Parameters:**
33
34- `order_id` (required): Order ID in written format (e.g., "ORD123456")
35- `email` (required): Customer email in written format
36
37**Usage:**
38
391. Collect order ID: "Can I get your order number?"
40 - Spoken: "O R D one two three four five six"
41 - Written: "ORD123456"
422. Collect email and convert to written format
433. Call this tool with both values
44
45## `getOrderDetails`
46
47**When to use:** After identity verification
48**Returns:** Order date, items, total amount, refund eligibility status
49
50**Error handling:**
51If order not found, ask customer to verify order number and try again.
52
53## `processRefund`
54
55**When to use:** Only after confirming eligibility
56**Required checks before calling:**
57
58- Identity verified
59- Order is within 30 days
60- Order is eligible (not digital, not already refunded)
61- Refund amount is under $500
62
63**Parameters:**
64
65- `order_id` (required): From previous verification
66- `reason_code` (required): One of "defective", "wrong_item", "late_delivery", "changed_mind"
67
68**Usage:**
69
701. Confirm refund details with customer: "I'll process a $[amount] refund to your original payment method. It will appear in 3-5 business days. Does that work for you?"
712. Wait for customer confirmation
723. Call this tool
73
74**Error handling:**
75If refund processing fails, apologize and escalate: "I'm unable to process that refund right now. Let me escalate to a supervisor who can help."
76
77# Character normalization
78
79Order IDs:
80
81- Spoken: "O R D one two three four five six"
82- Written: "ORD123456"
83- No spaces, all uppercase
84
85Email addresses:
86
87- Spoken: "john dot smith at retailco dot com"
88- Written: "john.smith@retailco.com"

Principles demonstrated:

  • ✓ Specialized agent scope (refunds only, not general support)
  • ✓ Clear workflow steps in # Goal section
  • ✓ Repeated emphasis on critical rules (refund limits, verification)
  • ✓ Detailed tool usage with “when to use” and “required checks”
  • ✓ Character normalization for structured IDs
  • ✓ Explicit error handling per tool
  • ✓ Escalation criteria clearly defined

Formatting best practices

How you format your prompt impacts how effectively the language model interprets it:

  • Use markdown headings: Structure sections with # for main sections, ## for subsections
  • Prefer bulleted lists: Break down instructions into digestible bullet points
  • Use whitespace: Separate sections and instruction groups with blank lines
  • Keep headings in sentence case: # Goal not # GOAL
  • Be consistent: Use the same formatting pattern throughout the prompt

Frequently asked questions

Create shared prompt templates for common sections like character normalization, error handling, and guardrails. Store these in a central repository and reference them across specialist agents. Use the orchestrator pattern to ensure consistent routing logic and handoff procedures.

At minimum, include: (1) Personality/role definition, (2) Primary goal, (3) Core guardrails, and (4) Tool descriptions if tools are used. Even simple agents benefit from explicit section structure and error handling instructions.

When deprecating a tool, add a new tool first, then update the prompt to prefer the new tool while keeping the old one as a fallback. Monitor usage, then remove the old tool once usage drops to zero. Always include error handling so agents can recover if a deprecated tool is called.

Generally, prompts structured with the principles in this guide work across models. However, model-specific tuning can improve performance—particularly for tool-calling format and reasoning steps. Test your prompt with multiple models and adjust if needed.

No universal limit exists, but prompts over 2000 tokens increase latency and cost. Focus on conciseness: every line should serve a clear purpose. If your prompt exceeds 2000 tokens, consider splitting into multiple specialized agents or extracting reference material into a knowledge base.

Define core personality traits, goals, and guardrails firmly while allowing flexibility in tone and verbosity based on user communication style. Use conditional instructions: “If the user is frustrated, acknowledge their concerns before proceeding.”

Yes. System prompts can be modified at any time to adjust behavior. This is particularly useful for addressing emerging issues or refining capabilities as you learn from user interactions. Always test changes in a staging environment before deploying to production.

Include explicit error handling instructions for every tool. Emphasize “never guess or make up information” in the guardrails section. Repeat this instruction in tool-specific error handling sections. Test tool failure scenarios during development to ensure agents follow recovery instructions.

Next steps

This guide establishes the foundation for reliable agent behavior through prompt engineering, tool configuration, and architectural patterns. To build production-grade systems, continue with:

For enterprise deployment support, contact our team.