Prompting guide | ElevenLabs Documentation

Introduction

Effective prompting transforms ElevenLabs Agents from robotic to lifelike.

A system prompt is the personality and policy blueprint of your AI agent. In enterprise use, it tends to be elaborate—defining the agent’s role, goals, allowable tools, step-by-step instructions for certain tasks, and guardrails describing what the agent should not do. The way you structure this prompt directly impacts reliability.

The system prompt controls conversational behavior and response style, but does not control conversation flow mechanics like turn-taking, or agent settings like which languages an agent can speak. These aspects are handled at the platform level.

Prompt engineering fundamentals

The following principles form the foundation of production-grade prompt engineering:

Separate instructions into clean sections

Separating instructions into dedicated sections with markdown headings helps the model prioritize and interpret them correctly. Use whitespace and line breaks to separate instructions.

Why this matters for reliability: Models are tuned to pay extra attention to certain headings (especially # Guardrails), and clear section boundaries prevent instruction bleed where rules from one context affect another.

1 You are a customer service agent. Be polite and helpful. Never share sensitive data. You can look up orders and process refunds. Always verify identity first. Keep responses under 3 sentences unless the user asks for details.

Be as concise as possible

Keep every instruction short, clear, and action-based. Remove filler words and restate only what is essential for the model to act correctly.

Why this matters for reliability: Concise instructions reduce ambiguity and token usage. Every unnecessary word is a potential source of misinterpretation.

1 # Tone
2 
3 When you're talking to customers, you should try to be really friendly and approachable, making sure that you're speaking in a way that feels natural and conversational, kind of like how you'd talk to a friend, but still maintaining a professional demeanor that represents the company well.

If you need the agent to maintain a specific tone, define it explicitly and concisely in the # Personality or # Tone section. Avoid repeating tone guidance throughout the prompt.

Emphasize critical instructions

Highlight critical steps by adding “This step is important” at the end of the line. Repeating the most important 1-2 instructions twice in the prompt can help reinforce them.

Why this matters for reliability: In complex prompts, models may prioritize recent context over earlier instructions. Emphasis and repetition ensure critical rules aren’t overlooked.

1 # Goal
2 
3 Verify customer identity before accessing their account.
4 Look up order details and provide status updates.
5 Process refund requests when eligible.

Normalize inputs and outputs

Voice agents often misinterpret or misformat structured information such as emails, IDs, or record locators. To ensure accuracy, separate (or “normalize”) how data is spoken to the user from how it is written when used in tools or APIs.

Why this matters for reliability: Text-to-speech models sometimes mispronounce symbols like ”@” or ”.” naturally, for example when an agent speaks “john@company.com” directly. Normalizing to spoken format (“john at company dot com”) creates natural, understandable speech while maintaining correct written format for tools.

1 When collecting the customer's email, repeat it back to them exactly as they said it, then use it in the `lookupAccount` tool.

Add character normalization rules to your system prompt when agents collect emails, phone numbers, confirmation codes, or other structured identifiers that will be passed to tools.

Provide clear examples

Include examples in the prompt to illustrate how agents should behave, use tools, or format data. Large language models follow instructions more reliably when they have concrete examples to reference.

Why this matters for reliability: Examples reduce ambiguity and provide a reference pattern. They’re especially valuable for complex formatting, multi-step processes, and edge cases.

1 When a customer provides a confirmation code, make sure to format it correctly before looking it up.

Dedicate a guardrails section

List all non-negotiable rules the model must always follow in a dedicated # Guardrails section. Models are tuned to pay extra attention to this heading.

Why this matters for reliability: Guardrails prevent inappropriate responses and ensure compliance with policies. Centralizing them in a dedicated section makes them easier to audit and update.

Recommended approach

1 # Guardrails
2 
3 Never share customer data across conversations or reveal sensitive account information without proper verification.
4 Never process refunds over $500 without supervisor approval.
5 Never make promises about delivery dates that aren't confirmed in the order system.
6 Acknowledge when you don't know an answer instead of guessing.
7 If a customer becomes abusive, politely end the conversation and offer to escalate to a supervisor.

To learn more about designing effective guardrails, see our guide on safety and moderation.

Tool configuration for reliability

Agents capable of handling transactional workflows can be highly effective. To enable this, they must be equipped with tools that let them perform actions in other systems or fetch live data from them.

Equally important as prompt structure is how you describe the tools available to your agent. Clear, action-oriented tool definitions help the model invoke them correctly and recover gracefully from errors.

Describe tools precisely with detailed parameters

When creating a tool, add descriptions to all parameters. This helps the LLM construct tool calls accurately.

Tool description: “Looks up customer order status by order ID and returns current status, estimated delivery date, and tracking number.”

Parameter descriptions:

order_id (required): “The unique order identifier, formatted as written characters (e.g., ‘ORD123456’)”
include_history (optional): “If true, returns full order history including status changes”

Why this matters for reliability: Parameter descriptions act as inline documentation for the model. They clarify format expectations, required vs. optional fields, and acceptable values.

Explain when and how to use each tool in the system prompt

Clearly define in your system prompt when and how each tool should be used. Don’t rely solely on tool descriptions—provide usage context and sequencing logic.

Recommended approach

1 # Tools
2 
3 You have access to the following tools:
4 
5 ## `getOrderStatus`
6 
7 Use this tool when a customer asks about their order. Always call this tool before providing order information—never rely on memory or assumptions.
8 
9 **When to use:**
10 
11 - Customer asks "Where is my order?"
12 - Customer provides an order number
13 - Customer asks about delivery estimates
14 
15 **How to use:**
16 
17 1. Collect the order ID from the customer in spoken format
18 2. Convert to written format using character normalization rules
19 3. Call `getOrderStatus` with the formatted order ID
20 4. Present the results to the customer in natural language
21 
22 **Error handling:**
23 If the tool returns "Order not found", ask the customer to verify the order number and try again.
24 
25 ## `processRefund`
26 
27 Use this tool only after verifying:
28 
29 1. Customer identity has been confirmed
30 2. Order is eligible for refund (within 30 days, not already refunded)
31 3. Refund amount is under $500 (escalate to supervisor if over $500)
32 
33 **Required before calling:**
34 
35 - Order ID (from `getOrderStatus`)
36 - Refund reason code
37 - Customer confirmation
38 
39 This step is important: Always confirm refund details with the customer before calling this tool.

Use character normalization for tool inputs

When tools require structured identifiers (emails, phone numbers, codes), ensure the prompt clarifies when to use written vs. spoken formats.

Recommended approach

1 # Tools
2 
3 ## `lookupAccount`
4 
5 **Parameters:**
6 
7 - `email` (required): Customer email address in written format (e.g., "john.smith@company.com")
8 
9 **Usage:**
10 
11 1. Ask customer for their email in spoken format: "Can you provide your email address?"
12 2. Listen for spoken format: "john dot smith at company dot com"
13 3. Convert to written format: "john.smith@company.com"
14 4. Pass written format to this tool
15 
16 **Character normalization for email:**
17 
18 - "at" → "@"
19 - "dot" → "."
20 - Remove spaces between words

Handle tool call failures gracefully

Tools can sometimes fail due to network issues, missing data, or other errors. Include clear instructions in your system prompt for recovery.

Why this matters for reliability: Tool failures are inevitable in production. Without explicit handling instructions, agents may hallucinate responses or provide incorrect information.

Recommended approach

1 # Tool error handling
2 
3 If any tool call fails or returns an error:
4 
5 1. Acknowledge the issue to the customer: "I'm having trouble accessing that information right now."
6 2. Do not guess or make up information
7 3. Offer alternatives:
8    - Try the tool again if it might be a temporary issue
9    - Offer to escalate to a human agent
10    - Provide a callback option
11 4. If the error persists after 2 attempts, escalate to a supervisor
12 
13 **Example responses:**
14 
15 - "I'm having trouble looking up that order right now. Let me try again... [retry]"
16 - "I'm unable to access the order system at the moment. I can transfer you to a specialist who can help, or we can schedule a callback. Which would you prefer?"

For detailed guidance on building reliable tool integrations, see our documentation on Client tools, Server tools, and MCP tools.

Architecture patterns for enterprise agents

While strong prompts and tools form the foundation of agent reliability, production systems require thoughtful architectural design. Enterprise agents handle complex workflows that often exceed the scope of a single, monolithic prompt.

Keep agents specialized

Overly broad instructions or large context windows increase latency and reduce accuracy. Each agent should have a narrow, clearly defined knowledge base and set of responsibilities.

Why this matters for reliability: Specialized agents have fewer edge cases to handle, clearer success criteria, and faster response times. They’re easier to test, debug, and improve.

A general-purpose “do everything” agent is harder to maintain and more likely to fail in production than a network of specialized agents with clear handoffs.

Use orchestrator and specialist patterns

For complex tasks, design multi-agent workflows that hand off tasks between specialized agents—and to human operators when needed.

Architecture pattern:

Orchestrator agent: Routes incoming requests to appropriate specialist agents based on intent classification
Specialist agents: Handle domain-specific tasks (billing, scheduling, technical support, etc.)
Human escalation: Defined handoff criteria for complex or sensitive cases

Benefits of this pattern:

Each specialist has a focused prompt and reduced context
Easier to update individual specialists without affecting the system
Clear metrics per domain (billing resolution rate, scheduling success rate, etc.)
Reduced latency per interaction (smaller prompts, faster inference)

Define clear handoff criteria

When designing multi-agent workflows, specify exactly when and how control should transfer between agents or to human operators.

Orchestrator agent example

1 # Goal
2 
3 Route customer requests to the appropriate specialist agent based on intent.
4 
5 ## Routing logic
6 
7 **Billing specialist:** Customer mentions payment, invoice, refund, charge, subscription, or account balance
8 **Technical support specialist:** Customer reports error, bug, issue, not working, broken
9 **Scheduling specialist:** Customer wants to book, reschedule, cancel, or check appointment
10 **Human escalation:** Customer is angry, requests supervisor, or issue is unresolved after 2 specialist attempts
11 
12 ## Handoff process
13 
14 1. Classify customer intent based on first message
15 2. Provide brief acknowledgment: "I'll connect you with our [billing/technical/scheduling] team."
16 3. Transfer conversation with context summary:
17    - Customer name
18    - Primary issue
19    - Any account identifiers already collected
20 4. Do not repeat information collection that already occurred

Specialist agent example

1 # Personality
2 
3 You are a billing specialist for Acme Corp. You handle payment issues, refunds, and subscription changes.
4 
5 # Goal
6 
7 Resolve billing inquiries by:
8 
9 1. Verifying customer identity
10 2. Looking up account and billing history
11 3. Processing refunds (under $500) or escalating (over $500)
12 4. Updating subscription settings when requested
13 
14 # Guardrails
15 
16 Never access account information without identity verification.
17 Never process refunds over $500 without supervisor approval.
18 If the customer's issue is not billing-related, transfer back to the orchestrator agent.

For detailed guidance on building multi-agent workflows, see our documentation on Workflows.

Model selection for enterprise reliability

Selecting the right model depends on your performance requirements—particularly latency, accuracy, and tool-calling reliability. Different models offer different tradeoffs between speed, reasoning capability, and cost.

Understand the tradeoffs

Latency: Smaller models (fewer parameters) generally respond faster, making them suitable for high-frequency, low-complexity interactions.

Accuracy: Larger models provide stronger reasoning capabilities and better handle complex, multi-step tasks, but with higher latency and cost.

Tool-calling reliability: Not all models handle tool/function calling with equal precision. Some excel at structured output, while others may require more explicit prompting.

Model recommendations by use case

Based on deployments across millions of agent interactions, the following patterns emerge:

GPT-4o or GLM 4.5 Air (recommended starting point): Best for general-purpose enterprise agents where latency, accuracy, and cost must all be balanced. Offers low-to-moderate latency with strong tool-calling performance and reasonable cost per interaction. Ideal for customer support, scheduling, order management, and general inquiry handling.
Gemini 2.5 Flash Lite (ultra-low latency): Best for high-frequency, simple interactions where speed is critical. Provides the lowest latency with broad general knowledge, though with lower performance on complex tool-calling. Cost-effective at scale for initial routing/triage, simple FAQs, appointment confirmations, and basic data collection.
Claude Sonnet 4 or 4.5 (complex reasoning): Best for multi-step problem-solving, nuanced judgment, and complex tool orchestration. Offers the highest accuracy and reasoning capability with excellent tool-calling reliability, though with higher latency and cost. Ideal for tasks where mistakes are costly, such as technical troubleshooting, financial advisory, compliance-sensitive workflows, and complex refund/escalation decisions.

Benchmark with your actual prompts

Model performance varies significantly based on prompt structure and task complexity. Before committing to a model:

Test 2-3 candidate models with your actual system prompt
Evaluate on real user queries or synthetic test cases
Measure latency, accuracy, and tool-calling success rate
Optimize for the best tradeoff given your specific requirements

For detailed model configuration options, see our Models documentation.

Iteration and testing

Reliability in production comes from continuous iteration. Even well-constructed prompts can fail in real use. What matters is learning from those failures and improving through disciplined testing.

Configure evaluation criteria

Attach concrete evaluation criteria to each agent to monitor success over time and check for regressions.

Key metrics to track:

Task completion rate: Percentage of user intents successfully addressed
Escalation rate: Percentage of conversations requiring human intervention

For detailed guidance on configuring evaluation criteria in ElevenLabs, see Success Evaluation.

Analyze failure patterns

When agents underperform, identify patterns in problematic interactions:

Where does the agent provide incorrect information? → Strengthen instructions in specific sections
When does it fail to understand user intent? → Add examples or simplify language
Which user inputs cause it to break character? → Add guardrails for edge cases
Which tools fail most often? → Improve error handling or parameter descriptions

Review conversation transcripts where user satisfaction was low or tasks weren’t completed.

Update specific sections of your prompt to address identified issues:

Isolate the problem: Identify which prompt section or tool definition is causing failures
Test changes on specific examples: Use conversations that previously failed as test cases
Make one change at a time: Isolate improvements to understand what works
Re-evaluate with same test cases: Verify the change fixed the issue without creating new problems

Avoid making multiple prompt changes simultaneously. This makes it impossible to attribute improvements or regressions to specific edits.

Configure data collection

Configure your agent to summarize data from each conversation. This allows you to analyze interaction patterns, identify common user requests, and continuously improve your prompt based on real-world usage.

For detailed guidance on configuring data collection in ElevenLabs, see Data Collection.

Use simulation for regression testing

Before deploying prompt changes to production, test against a set of known scenarios to catch regressions.

For guidance on testing agents programmatically, see Simulate Conversations.

Production considerations

Enterprise agents require additional safeguards beyond prompt quality. Production deployments must account for error handling, compliance, and graceful degradation.

Handle errors across all tool integrations

Every external tool call is a potential failure point. Ensure your prompt includes explicit error handling for:

Network failures: “I’m having trouble connecting to our system. Let me try again.”
Missing data: “I don’t see that information in our system. Can you verify the details?”
Timeout errors: “This is taking longer than expected. I can escalate to a specialist or try again.”
Permission errors: “I don’t have access to that information. Let me transfer you to someone who can help.”

Example prompts

The following examples demonstrate how to apply the principles outlined in this guide to real-world enterprise use cases. Each example includes annotations highlighting which reliability principles are in use.

Example 1: Technical support agent

Technical support specialist

1 # Personality
2 
3 You are a technical support specialist for CloudTech, a B2B SaaS platform.
4 You are patient, methodical, and focused on resolving issues efficiently.
5 You speak clearly and adapt technical language based on the user's familiarity.
6 
7 # Environment
8 
9 You are assisting customers via phone support.
10 Customers may be experiencing service disruptions and could be frustrated.
11 You have access to diagnostic tools and the customer account database.
12 
13 # Tone
14 
15 Keep responses clear and concise (2-3 sentences unless troubleshooting requires more detail).
16 Use a calm, professional tone with brief affirmations ("I understand," "Let me check that").
17 Adapt technical depth based on customer responses.
18 Check for understanding after complex steps: "Does that make sense?"
19 
20 # Goal
21 
22 Resolve technical issues through structured troubleshooting:
23 
24 1. Verify customer identity using email and account ID
25 2. Identify affected service and severity level
26 3. Run diagnostics using `runSystemDiagnostic` tool
27 4. Provide step-by-step resolution or escalate if unresolved after 2 attempts
28 
29 This step is important: Always run diagnostics before suggesting solutions.
30 
31 # Guardrails
32 
33 Never access customer accounts without identity verification. This step is important.
34 Never guess at solutions—always base recommendations on diagnostic results.
35 If an issue persists after 2 troubleshooting attempts, escalate to engineering team.
36 Acknowledge when you don't know the answer instead of speculating.
37 
38 # Tools
39 
40 ## `verifyCustomerIdentity`
41 
42 **When to use:** At the start of every conversation before accessing account data
43 **Parameters:**
44 
45 - `email` (required): Customer email in written format (e.g., "user@company.com")
46 - `account_id` (optional): Account ID if customer provides it
47 
48 **Usage:**
49 
50 1. Ask customer for email in spoken format: "Can I get the email associated with your account?"
51 2. Convert to written format: "john dot smith at company dot com" → "john.smith@company.com"
52 3. Call this tool with written email
53 
54 **Error handling:**
55 If verification fails, ask customer to confirm email spelling and try again.
56 
57 ## `runSystemDiagnostic`
58 
59 **When to use:** After verifying identity and understanding the reported issue
60 **Parameters:**
61 
62 - `account_id` (required): From `verifyCustomerIdentity` response
63 - `service_name` (required): Name of affected service (e.g., "api", "dashboard", "storage")
64 
65 **Usage:**
66 
67 1. Confirm which service is affected
68 2. Run diagnostic with account ID and service name
69 3. Review results before providing solution
70 
71 **Error handling:**
72 If diagnostic fails, acknowledge the issue: "I'm having trouble running that diagnostic. Let me escalate to our engineering team."
73 
74 # Character normalization
75 
76 When collecting email addresses:
77 
78 - Spoken: "john dot smith at company dot com"
79 - Written: "john.smith@company.com"
80 - Convert "@" from "at", "." from "dot", remove spaces
81 
82 # Error handling
83 
84 If any tool call fails:
85 
86 1. Acknowledge: "I'm having trouble accessing that information right now."
87 2. Do not guess or make up information
88 3. Offer to retry once, then escalate if failure persists

Principles demonstrated:

✓ Clean section separation (# Personality, # Goal, # Tools, etc.)
✓ One action per line (see # Goal numbered steps)
✓ Concise instructions (tone section is brief and clear)
✓ Emphasized critical steps (“This step is important”)
✓ Character normalization (email format conversion)
✓ Clear examples (in character normalization section)
✓ Dedicated guardrails section
✓ Precise tool descriptions with when/how/error guidance
✓ Explicit error handling instructions

Example 2: Customer service refund agent

Refund processing specialist

1 # Personality
2 
3 You are a refund specialist for RetailCo.
4 You are empathetic, solution-oriented, and efficient.
5 You balance customer satisfaction with company policy compliance.
6 
7 # Goal
8 
9 Process refund requests through this workflow:
10 
11 1. Verify customer identity using order number and email
12 2. Look up order details with `getOrderDetails` tool
13 3. Confirm refund eligibility (within 30 days, not digital download, not already refunded)
14 4. For refunds under $100: Process immediately with `processRefund` tool
15 5. For refunds $100-$500: Apply secondary verification, then process
16 6. For refunds over $500: Escalate to supervisor with case summary
17 
18 This step is important: Never process refunds without verifying eligibility first.
19 
20 # Guardrails
21 
22 Never process refunds outside the 30-day return window without supervisor approval.
23 Never process refunds over $500 without supervisor approval. This step is important.
24 Never access order information without verifying customer identity.
25 If a customer becomes aggressive, remain calm and offer supervisor escalation.
26 
27 # Tools
28 
29 ## `verifyIdentity`
30 
31 **When to use:** At the start of every conversation
32 **Parameters:**
33 
34 - `order_id` (required): Order ID in written format (e.g., "ORD123456")
35 - `email` (required): Customer email in written format
36 
37 **Usage:**
38 
39 1. Collect order ID: "Can I get your order number?"
40    - Spoken: "O R D one two three four five six"
41    - Written: "ORD123456"
42 2. Collect email and convert to written format
43 3. Call this tool with both values
44 
45 ## `getOrderDetails`
46 
47 **When to use:** After identity verification
48 **Returns:** Order date, items, total amount, refund eligibility status
49 
50 **Error handling:**
51 If order not found, ask customer to verify order number and try again.
52 
53 ## `processRefund`
54 
55 **When to use:** Only after confirming eligibility
56 **Required checks before calling:**
57 
58 - Identity verified
59 - Order is within 30 days
60 - Order is eligible (not digital, not already refunded)
61 - Refund amount is under $500
62 
63 **Parameters:**
64 
65 - `order_id` (required): From previous verification
66 - `reason_code` (required): One of "defective", "wrong_item", "late_delivery", "changed_mind"
67 
68 **Usage:**
69 
70 1. Confirm refund details with customer: "I'll process a $[amount] refund to your original payment method. It will appear in 3-5 business days. Does that work for you?"
71 2. Wait for customer confirmation
72 3. Call this tool
73 
74 **Error handling:**
75 If refund processing fails, apologize and escalate: "I'm unable to process that refund right now. Let me escalate to a supervisor who can help."
76 
77 # Character normalization
78 
79 Order IDs:
80 
81 - Spoken: "O R D one two three four five six"
82 - Written: "ORD123456"
83 - No spaces, all uppercase
84 
85 Email addresses:
86 
87 - Spoken: "john dot smith at retailco dot com"
88 - Written: "john.smith@retailco.com"

Principles demonstrated:

✓ Specialized agent scope (refunds only, not general support)
✓ Clear workflow steps in # Goal section
✓ Repeated emphasis on critical rules (refund limits, verification)
✓ Detailed tool usage with “when to use” and “required checks”
✓ Character normalization for structured IDs
✓ Explicit error handling per tool
✓ Escalation criteria clearly defined

Formatting best practices

How you format your prompt impacts how effectively the language model interprets it:

Use markdown headings: Structure sections with # for main sections, ## for subsections
Prefer bulleted lists: Break down instructions into digestible bullet points
Use whitespace: Separate sections and instruction groups with blank lines
Keep headings in sentence case: # Goal not # GOAL
Be consistent: Use the same formatting pattern throughout the prompt

Frequently asked questions

How do I maintain consistency across multiple agents?

Create shared prompt templates for common sections like character normalization, error handling, and guardrails. Store these in a central repository and reference them across specialist agents. Use the orchestrator pattern to ensure consistent routing logic and handoff procedures.

What's the minimum viable prompt for production?

At minimum, include: (1) Personality/role definition, (2) Primary goal, (3) Core guardrails, and (4) Tool descriptions if tools are used. Even simple agents benefit from explicit section structure and error handling instructions.

How do I handle tool deprecation without breaking agents?

When deprecating a tool, add a new tool first, then update the prompt to prefer the new tool while keeping the old one as a fallback. Monitor usage, then remove the old tool once usage drops to zero. Always include error handling so agents can recover if a deprecated tool is called.

Should I use different prompts for different LLMs?

Generally, prompts structured with the principles in this guide work across models. However, model-specific tuning can improve performance—particularly for tool-calling format and reasoning steps. Test your prompt with multiple models and adjust if needed.

How long should my system prompt be?

No universal limit exists, but prompts over 2000 tokens increase latency and cost. Focus on conciseness: every line should serve a clear purpose. If your prompt exceeds 2000 tokens, consider splitting into multiple specialized agents or extracting reference material into a knowledge base.

How do I balance consistency with adaptability?

Define core personality traits, goals, and guardrails firmly while allowing flexibility in tone and verbosity based on user communication style. Use conditional instructions: “If the user is frustrated, acknowledge their concerns before proceeding.”

Can I update prompts after deployment?

Yes. System prompts can be modified at any time to adjust behavior. This is particularly useful for addressing emerging issues or refining capabilities as you learn from user interactions. Always test changes in a staging environment before deploying to production.

How do I prevent agents from hallucinating when tools fail?

Include explicit error handling instructions for every tool. Emphasize “never guess or make up information” in the guardrails section. Repeat this instruction in tool-specific error handling sections. Test tool failure scenarios during development to ensure agents follow recovery instructions.

Next steps

This guide establishes the foundation for reliable agent behavior through prompt engineering, tool configuration, and architectural patterns. To build production-grade systems, continue with:

Workflows: Design multi-agent orchestration and specialist handoffs
Success Evaluation: Configure metrics and evaluation criteria
Data Collection: Capture structured insights from conversations
Testing: Implement regression testing and simulation
Security & Privacy: Ensure compliance and data protection
Our Docs Agent: See a complete case study of these principles in action

For enterprise deployment support, contact our team.

Introduction

Prompt engineering fundamentals

Separate instructions into clean sections

Be as concise as possible

Emphasize critical instructions

Normalize inputs and outputs

Provide clear examples

Dedicate a guardrails section

Tool configuration for reliability

Describe tools precisely with detailed parameters

Explain when and how to use each tool in the system prompt

Use character normalization for tool inputs

Handle tool call failures gracefully

Architecture patterns for enterprise agents

Keep agents specialized

Use orchestrator and specialist patterns

Define clear handoff criteria

Model selection for enterprise reliability

Understand the tradeoffs

Model recommendations by use case

Benchmark with your actual prompts

Iteration and testing

Configure evaluation criteria

Analyze failure patterns

Make targeted refinements

Configure data collection

Use simulation for regression testing

Production considerations

Handle errors across all tool integrations

Example prompts

Example 1: Technical support agent

Example 2: Customer service refund agent

Formatting best practices

Frequently asked questions

How do I maintain consistency across multiple agents?

What's the minimum viable prompt for production?

How do I handle tool deprecation without breaking agents?

Should I use different prompts for different LLMs?

How long should my system prompt be?

How do I balance consistency with adaptability?

Can I update prompts after deployment?

How do I prevent agents from hallucinating when tools fail?

Next steps