Prompting guide
Learn how to engineer lifelike, engaging Conversational AI voice agents
Overview
Effective prompting transforms Conversational AI voice agents from robotic to lifelike. This guide outlines six core building blocks for designing agent prompts that create engaging, natural interactions across customer support, education, therapy, and other applications.
The difference between an AI-sounding and naturally expressive Conversational AI agent comes down to how well you structure its system prompt.
Six building blocks
Each system prompt component serves a specific function. Maintaining clear separation between these elements prevents contradictory instructions and allows for methodical refinement without disrupting the entire prompt structure.
-
Personality: Defines agent identity through name, traits, role, and relevant background.
-
Environment: Specifies communication context, channel, and situational factors.
-
Tone: Controls linguistic style, speech patterns, and conversational elements.
-
Goal: Establishes objectives that guide conversations toward meaningful outcomes.
-
Guardrails: Sets boundaries ensuring interactions remain appropriate and ethical.
-
Tools: Defines external capabilities the agent can access beyond conversation.
1. Personality
The base personality is the foundation of your voice agent’s identity, defining who the agent is supposed to emulate through a name, role, background, and key traits. It ensures consistent, authentic responses in every interaction.
-
Identity: Give your agent a simple, memorable name (e.g. “Joe”) and establish the essential identity (e.g. “a compassionate AI support assistant”).
-
Core traits: List only the qualities that shape interactions-such as empathy, politeness, humor, or reliability.
-
Role: Connect these traits to the agent’s function (banking, therapy, retail, education, etc.). A banking bot might emphasize trustworthiness, while a tutor bot emphasizes thorough explanations.
-
Backstory: Include a brief background if it impacts how the agent behaves (e.g. “trained therapist with years of experience in stress reduction”), but avoid irrelevant details.
2. Environment
The environment captures where, how, and under what conditions your agent interacts with the user. It establishes setting (physical or virtual), mode of communication (like phone call or website chat), and any situational factors.
-
State the medium: Define the communication channel (e.g. “over the phone”, “via smart speaker”, “in a noisy environment”). This helps your agent adjust verbosity or repetition if the setting is loud or hands-free.
-
Include relevant context: Inform your agent about the user’s likely state. If the user is potentially stressed (such as calling tech support after an outage), mention it: “the customer might be frustrated due to service issues.” This primes the agent to respond with empathy.
-
Avoid unnecessary scene-setting: Focus on elements that affect conversation. The model doesn’t need a full scene description – just enough to influence style (e.g. formal office vs. casual home setting).
3. Tone
Tone governs how your agent speaks and interacts, defining its conversational style. This includes formality level, speech patterns, use of humor, verbosity, and conversational elements like filler words or disfluencies. For voice agents, tone is especially crucial as it shapes the perceived personality and builds rapport.
-
Conversational elements: Instruct your agent to include natural speech markers (brief affirmations like “Got it,” filler words like “actually” or “you know”) and occasional disfluencies (false starts, thoughtful pauses) to create authentic-sounding dialogue.
-
TTS compatibility: Direct your agent to optimize for speech synthesis by using punctuation strategically (ellipses for pauses, emphasis marks for key points) and adapting text formats for natural pronunciation: spell out email addresses (“john dot smith at company dot com”), format phone numbers with pauses (“five five five… one two three… four five six seven”), convert numbers into spoken forms (“$19.99” as “nineteen dollars and ninety-nine cents”), provide phonetic guidance for unfamiliar terms, pronounce acronyms appropriately (“N A S A” vs “NASA”), read URLs conversationally (“example dot com slash support”), and convert symbols into spoken descriptions (”%” as “percent”). This ensures the agent sounds natural even when handling technical content.
-
Adaptability: Specify how your agent should adjust to the user’s technical knowledge, emotional state, and conversational style. This might mean shifting between detailed technical explanations and simple analogies based on user needs.
-
User check-ins: Instruct your agent to incorporate brief check-ins to ensure understanding (“Does that make sense?”) and modify its approach based on feedback.
4. Goal
The goal defines what the agent aims to accomplish in each conversation, providing direction and purpose. Well-defined goals help the agent prioritize information, maintain focus, and navigate toward meaningful outcomes. Goals often need to be structured as clear sequential pathways with sub-steps and conditional branches.
-
Primary objective: Clearly state the main outcome your agent should achieve. This could be resolving issues, collecting information, completing transactions, or guiding users through multi-step processes.
-
Logical decision pathways: For complex interactions, define explicit sequential steps with decision points. Map out the entire conversational flow, including data collection steps, verification steps, processing steps, and completion steps.
-
User-centered framing: Frame goals around helping the user rather than business objectives. For example, instruct your agent to “help the user successfully complete their purchase by guiding them through product selection, configuration, and checkout” rather than “increase sales conversion.”
-
Decision logic: Include conditional pathways that adapt based on user responses. Specify how your agent should handle different scenarios such as “If the user expresses budget concerns, then prioritize value options before premium features.”
-
Evaluation criteria & data collection: Define what constitutes a successful interaction, so you know when the agent has fulfilled its purpose. Include both primary metrics (e.g., “completed booking”) and secondary metrics (e.g., “collected preference data for future personalization”).
5. Guardrails
Guardrails define boundaries and rules for your agent, preventing inappropriate responses and guiding behavior in sensitive situations. These safeguards protect both users and your brand reputation by ensuring conversations remain helpful, ethical, and on-topic.
-
Content boundaries: Clearly specify topics your agent should avoid or handle with care (medical advice, legal counsel, financial guidance, etc.) and how to gracefully redirect such conversations.
-
Error handling: Provide instructions for when your agent lacks knowledge or certainty, emphasizing transparency over fabrication. Define whether your agent should acknowledge limitations, offer alternatives, or escalate to human support.
-
Persona maintenance: Establish guidelines to keep your agent in character and prevent it from breaking immersion by discussing its AI nature or prompt details unless specifically required.
-
Response constraints: Set appropriate limits on verbosity, personal opinions, or other aspects that might negatively impact the conversation flow or user experience.
6. Tools
Tools extend your voice agent’s capabilities beyond conversational abilities, allowing it to access external information, perform actions, or integrate with other systems. Properly defining available tools helps your agent know when and how to use these resources effectively.
-
Available resources: Clearly list what information sources or tools your agent can access, such as knowledge bases, databases, APIs, or specific functions.
-
Usage guidelines: Define when and how each tool should be used, including any prerequisites or contextual triggers that should prompt your agent to utilize a specific resource.
-
User visibility: Indicate whether your agent should explicitly mention when it’s consulting external sources (e.g., “Let me check our database”) or seamlessly incorporate the information.
-
Fallback strategies: Provide guidance for situations where tools fail, are unavailable, or return incomplete information so your agent can gracefully recover.
-
Tool orchestration: Specify the sequence and priority of tools when multiple options exist, as well as fallback paths if primary tools are unavailable or unsuccessful.
Example prompts
Putting it all together, below are example system prompts that illustrate how to combine the building blocks for different agent types. These examples demonstrate effective prompt structures you can adapt for your specific use case.
Prompt formatting
How you format your prompt impacts how effectively the language model interprets it:
-
Use clear sections: Structure your prompt with labeled sections (Personality, Environment, etc.) or use Markdown headings for clarity.
-
Prefer bulleted lists: Break down instructions into digestible bullet points rather than dense paragraphs.
-
Consider format markers: Some developers find that formatting markers like triple backticks or special tags help maintain prompt structure:
-
Whitespace matters: Use line breaks to separate instructions and make your prompt more readable for both humans and models.
-
Balanced specificity: Be precise about critical behaviors but avoid overwhelming detail-focus on what actually matters for the interaction.
Evaluate & iterate
Prompt engineering is inherently iterative. Implement this feedback loop to continually improve your agent:
-
Configure evaluation criteria: Attach concrete evaluation criteria to each agent to monitor success over time & check for regressions.
- Response accuracy rate: Track % of responses that provide correct information
- User sentiment scores: Configure a sentiment analysis criteria to monitor user sentiment
- Task completion rate: Measure % of user intents successfully addressed
- Conversation length: Monitor number of turns needed to complete tasks
-
Analyze failures: Identify patterns in problematic interactions:
- Where does the agent provide incorrect information?
- When does it fail to understand user intent?
- Which user inputs cause it to break character?
- Review transcripts where user satisfaction was low
-
Targeted refinement: Update specific sections of your prompt to address identified issues.
- Test changes on specific examples that previously failed
- Make one targeted change at a time to isolate improvements
-
Configure data collection: Configure the agent to summarize data from each conversation. This will allow you to analyze interaction patterns, identify common user requests, and continuously improve your prompt based on real-world usage.
Frequently asked questions
Why are guardrails so important for voice agents?
Voice interactions tend to be more free-form and unpredictable than text. Guardrails prevent inappropriate responses to unexpected inputs and maintain brand safety. They’re essential for voice agents that represent organizations or provide sensitive advice.
Can I update the prompt after deployment?
Yes. The system prompt can be modified at any time to adjust behavior. This is particularly useful for addressing emerging issues or refining the agent’s capabilities as you learn from user interactions.
How do I handle users with different speaking styles or accents?
Design your prompt with simple, clear language patterns and instruct the agent to ask for clarification when unsure. Avoid idioms and region-specific expressions that might confuse STT systems processing diverse accents.
How can I make the AI sound more conversational?
Include speech markers (brief affirmations, filler words) in your system prompt. Specify that the AI can use interjections like “Hmm,” incorporate thoughtful pauses, and employ natural speech patterns.
Does a longer system prompt guarantee better results?
No. Focus on quality over quantity. Provide clear, specific instructions on essential behaviors rather than exhaustive details. Test different prompt lengths to find the optimal balance for your specific use case.
How do I balance consistency with adaptability?
Define core personality traits and guardrails firmly while allowing flexibility in tone and verbosity based on the user’s communication style. This creates a recognizable character that can still respond naturally to different situations.