How to Prompt a Conversational AI System

Properly prompting a large language model for use in a Conversational AI System can make all the difference.

Today, the LLM has emerged as the beating heart of conversational AI systems. Specifically, LLMs enable conversational AI — originally built around sprawling phone trees — to feature dynamic functionality and provide human-like experiences. However, LLMs aren’t a silver bullet upgrade; they require specialized prompting as they aren’t fine-tuned to human speech by default.

Developers make a common mistake when prompting LLMs for conversational AI: recycling the same playbook that was used to train human employees. This strategy, despite sounding straightforward, is rarely fruitful. LLMs make different assumptions than typical humans, and their default tone and scope aren’t conducive to verbal interactions.

Today, we’ll uncover what we know about how to prompt LLMs to build successful conversational AI systems. You can also read a more comprehensive and technical guide to this topic in the ElevenLabs Developer docs.

The Old System

Prior to LLMs, conversational AI systems leveraged sprawling logic trees that requests would triage through based on verbal inputs. This setup was popular among customer service numbers (e.g., airline hotlines) and payment systems (e.g., credit card phone services).

These older systems were sluggish, felt robot-like, and could only permit very limited human inputs. There’s a fair chance you experienced this yourself, bluntly yelling “YES” into a phone to answer a prompt. This poor experience led most users to attempt to “beat the system” to force a conversation with a live human agent.

However, there was an advantage to these phone trees — they were contained. There were only so many paths a conversation could take, and developers could easily implement guardrails to disregard disallowed inputs. This constraint underpins the pros and cons of LLMs: They expand dramatically past the limited nature of phone trees, but they are also unpredictable, opening a Pandora’s box of pitfalls — such as making impossible promises, getting angry with customers, or breaching sensitive data.

The Default Gaps

If LLMs are simply trained on a handbook originally designed for humans, they’ll have mediocre success due to a few core gaps. Understanding these gaps will help you design prompts to address them:

Tone Mismatch

LLMs are trained via reinforcement learning, where human feedback incentivizes LLMs to return structured feedback. Specifically, LLM responses tend to be verbose and filled with bullets, call-out blocks, and headlines.

However, in the context of conversational AI, LLMs need to emulate the concise and flattened nature of verbal interactions.

Assumption Gaps

LLMs have a tendency to fill any unknowns with inferred knowledge instead of asking questions. This can lead them to make incorrect assumptions that’ll mislead users—or lead to costly mistakes (e.g., promised refunds). Later, we’ll see how we can use a knowledge base and guardrails to better ground the LLMs from making incorrect promises and executing disallowed actions.

Latency

LLMs can programmatically invoke function calls, gathering and writing data on behalf of humans. While this is generally one of the biggest advantages of LLMs, it also means that previous training instructions, which allowed call agents to “buy time” while performing tasks, are no longer needed. However, function calls aren’t instantaneous either, meaning that LLMs need to accurately forewarn the user whenever a delay is expected (e.g., “give me a moment to examine your case”).

Configurations

Personality

LLMs are fairly successful at augmenting tone to match a style. An LLM could be configured to sound friendly, humorous, concise, formal, or a combination of styles. This is an important input when prompting an LLM.

For example, developers of a customer service conversational AI application designed to support disgruntled airline customers might use a prompt such as:

You are a friendly customer service agent who speaks in concise, clear, empathetic sentences.
American
Whispering
Mysterious
Gaming
Lively
Irish
Soothing
Audiobook

Nicole

Format

LLMs need to receive explicit directions on how to respond. To ensure they don’t include additional padder text, LLMs should be provided a structure that encapsulates the response passed to the user.

For example, LLMs might be prompted to:

Respond exclusively with the string that should be read aloud to the user

This scaffolding encourages the LLM to provide a response designed to be spoken aloud.

However, LLMs can sometimes stumble on things that might not intuitively differ from written content. A common example is numbers — an LLM might print out a zipcode like 10023, which will cause the text-to-speech model to say, “ten thousand and twenty-three.” Instead, the LLM should be explicitly prompted to say the numbers individually, signposting what the numbers signify, e.g., “The zipcode is one zero zero two three.”

Temperature

Temperature is a critical parameter when configuring LLMs for conversational AI. A lower temperature produces more focused, deterministic responses ideal for task-oriented conversations, while higher temperatures create more creative, varied responses.

A low temperature is ideal for conversational AI systems that might prefer consistent responses (e.g., a customer service line for refunds). Meanwhile, for systems that want to provide a more engaging and realistic feel to customers (e.g., a digital coach), a high temperature is better:

Low Temperature: Thank you for calling ElevenLabs support. How can I help you?
High Temperature: Hey hey! You've landed at ElevenLabs support—ready to tackle your tech troubles! What's on your mind?

Knowledge Bases

For conversational AI systems that tap into larger reservoirs of knowledge, a knowledge base should be leveraged to minimize the prompt’s length. In production, this is typically accomplished via a vector database (such as Pinecone or Elasticsearch) or the LLM provider’s direct knowledge store.

Generally speaking, knowledge bases are essential for grounding LLM responses in factual, approved information. When building a conversational AI system, you should provide the LLM with a comprehensive knowledge base that contains accurate, up-to-date information about products, services, policies, and procedures. This prevents the LLM from hallucinating or making up information while encouraging consistent and reliable responses across conversations.

Process

Because LLMs often invoke functions on behalf of the user, they also need to know what inputs are explicitly needed. For example, if an LLM’s job is to help a user schedule a haircut appointment, they’ll need to ensure that they have:

  1. The user’s name
  2. The desired date and time
  3. The user’s address
  4. The user’s service preference

A naive implementation might result in the LLM asking for all of the information in one turn of the conversation. This is perfectly fine as text, but in a conversation, it can be overwhelming:

Support Agent: Could you please provide me with your name, your address, when you'd like your service to be, and what service you'd like?
Customer: My name is Mathew and anytime Wednesday afternoon works. What else did you ask for?

Because information is usually gathered incrementally via conversation, LLMs must be encouraged to fetch this information in a piecemeal fashion. The result is a much more conversational experience:

Support Agent: Could you please provide me with your name?
Customer: My name is Mathew Pregasen.
Support Agent: Thanks Mathew. When would you like to make an appointment?
Customer: Anytime on Wednesday afternoon works fine.
Support Agent: Great. Now can I get your address to find the nearest location?
Customer: 555 West Main Street
Support Agent: Perfect. Now what service are you look for?
Customer: I'm looking for a haircut and if you could also do my beard that would be great!

Guardrails

Permissions

When building distributed systems, you make an assumption that your server will crash at some point. Likewise, when building AI systems, you should assume that your LLM will make a mistake at some point. To minimize the blast radius of that mistake, you should give these systems the lowest permissions necessary for the job at hand. Below are some examples of ways you can do this:

  • Correctly set read/write permissions: If the LLM only needs to read information from a data source, make sure it’s given a read-only endpoint.
  • Limit access to API endpoints: If the LLM only needs access to certain endpoints, make sure it can’t access any others.
  • Human-in-the-loop escalations: If a high-risk action needs to be taken, consider a human-in-the-loop workflow that requires “manager approval” before executing the action.

Validation and Verification

When creating conversation AI systems that take actions through tool use, it’s helpful to build in a validation-and-verification process to ensure you’re collecting the correct information from users. Today, when you talk to a human agent, they’ll repeat any critical information you give to verify they heard it correctly and that the customer didn’t misspeak. LLMs could benefit from a similar level of error-checking:

Support Agent: Great. Now can I get your address to find the nearest location?
Customer: 555 West Main Street
Support Agent: I got five five five west main street. Did I miss anything?

For validation, any information received from the customer should be checked against the typical structure of that information. Does the phone number have the correct number of digits? Does the age given by the customer fall within a reasonable range? Did the customer provide a valid address?

Support Agent: What would a good callback number be for you?
Customer: 317-798-97289
Support Agent: I think I might have misheard you. I heard 11 numbers. Would you mind repeating that again?

Depending on your use case, you can verify all information received or only information that failed verification. Additionally, you can decide to verify each piece of information as it comes in or verify everything at the end.

A Closing Thought

Successfully prompting a conversational AI system involves balancing the right configurations and guardrails to generate an experience that emulates talking to a human with heightened efficiency. The process isn’t as trivial as using old training materials to prompt an LLM; instead, LLMs are tools that need specialized structure and strategy to create predictable, effective results.

Explore more

ElevenLabs

Create with the highest quality AI Audio

Get started free

Already have an account? Log in