Add voice to chat agent

Add voice modality to your own hosted text agent server or LLM.
Voice Engine will soon be even easier to integrate via the server side SDKs.

Overview

Voice Engine lets you easily add full voice capabilities to any chat agent, even those built outside the ElevenAgents platform. The voice engine contains everything needed to add voice conversations to your agent including turn taking, interruption detection and optimizations for the userโ€™s language.

To use your custom agent, it must expose text based APIs that align with one of the following OpenAI-compatible request/response structures:

The Responses API is OpenAIโ€™s newer API format that supports additional features. Both API formats are fully supported for custom LLM integration.

Youโ€™ll learn how to:

  • Store your OpenAI API key in ElevenLabs
  • Host a server that replicates OpenAIโ€™s Chat Completions or Responses endpoint
  • Direct ElevenLabs to your custom endpoint
  • Pass extra parameters to your LLM as needed

Configure Voice Engine

How-to guide ยท Assumes you have completed the Agents quickstart.

First step is to create an empty agent to hold the Voice Engine config. Follow the quickstart guide linked above to create an empty agent.

Next, set up a compatible server endpoint using OpenAIโ€™s style. You can implement either the Chat Completions API (/v1/chat/completions) or the Responses API (/v1/responses).

Both endpoints must return responses in SSE (Server-Sent Events) format with Content-Type: text/event-stream.

The Chat Completions API uses the /v1/chat/completions endpoint.

Each chunk must be formatted as data: {json}\n\n and the stream must end with data: [DONE]\n\n.

Hereโ€™s an example server implementation:

1import json
2import os
3import fastapi
4from fastapi.responses import StreamingResponse
5from openai import AsyncOpenAI
6import uvicorn
7import logging
8from dotenv import load_dotenv
9from pydantic import BaseModel
10from typing import List, Optional
11
12# Load environment variables from .env file
13load_dotenv()
14
15# Retrieve API key from environment
16OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
17if not OPENAI_API_KEY:
18 raise ValueError("OPENAI_API_KEY not found in environment variables")
19
20app = fastapi.FastAPI()
21oai_client = AsyncOpenAI(api_key=OPENAI_API_KEY)
22
23class Message(BaseModel):
24 role: str
25 content: str
26
27class ChatCompletionRequest(BaseModel):
28 messages: List[Message]
29 model: str
30 temperature: Optional[float] = 0.7
31 max_tokens: Optional[int] = None
32 stream: Optional[bool] = False
33 user_id: Optional[str] = None
34
35@app.post("/v1/chat/completions")
36async def create_chat_completion(request: ChatCompletionRequest) -> StreamingResponse:
37 oai_request = request.dict(exclude_none=True)
38 if "user_id" in oai_request:
39 oai_request["user"] = oai_request.pop("user_id")
40
41 chat_completion_coroutine = await oai_client.chat.completions.create(**oai_request)
42
43 async def event_stream():
44 try:
45 async for chunk in chat_completion_coroutine:
46 # Convert the ChatCompletionChunk to a dictionary before JSON serialization
47 chunk_dict = chunk.model_dump()
48 yield f"data: {json.dumps(chunk_dict)}\n\n"
49 yield "data: [DONE]\n\n"
50 except Exception as e:
51 logging.error("An error occurred: %s", str(e))
52 yield f"data: {json.dumps({'error': 'Internal error occurred!'})}\n\n"
53
54 return StreamingResponse(event_stream(), media_type="text/event-stream")
55
56if __name__ == "__main__":
57 uvicorn.run(app, host="0.0.0.0", port=8013)

Run the code to start your Voice Engine server.

Setting up a public URL for your server

To make your server accessible, create a public URL using a tunneling tool like ngrok:

$ngrok http --url=<Your url>.ngrok.app 8013

Configuring ElevenLabs Voice Engine

Next, update your agent settings in the ElevenLabs dashboard to point to your custom LLM server.

Direct your server URL to ngrok endpoint and set โ€œLimit token usageโ€ to 5000.

You can now start interacting with your agent with your own Voice Engine server.

Optimizing for slow processing LLMs

If your custom LLM has slow processing times (perhaps due to agentic reasoning or pre-processing requirements) you can improve the conversational flow by implementing buffer words in your streaming responses. This technique helps maintain natural speech prosody while your LLM generates the complete response.

Buffer words

When your LLM needs more time to process the full response, return an initial response ending with "... " (ellipsis followed by a space). This allows the Text to Speech system to maintain natural flow while keeping the conversation feeling dynamic. This creates natural pauses that flow well into subsequent content that the LLM can reason longer about. The extra space is crucial to ensure that the subsequent content is not appended to the โ€โ€ฆโ€ which can lead to audio distortions.

Implementation

Hereโ€™s how to modify your custom LLM server to implement buffer words:

1@app.post("/v1/chat/completions")
2async def create_chat_completion(request: ChatCompletionRequest) -> StreamingResponse:
3 oai_request = request.dict(exclude_none=True)
4 if "user_id" in oai_request:
5 oai_request["user"] = oai_request.pop("user_id")
6
7 async def event_stream():
8 try:
9 # Send initial buffer chunk while processing
10 initial_chunk = {
11 "id": "chatcmpl-buffer",
12 "object": "chat.completion.chunk",
13 "created": 1234567890,
14 "model": request.model,
15 "choices": [{
16 "delta": {"content": "Let me think about that... "},
17 "index": 0,
18 "finish_reason": None
19 }]
20 }
21 yield f"data: {json.dumps(initial_chunk)}\n\n"
22
23 # Process the actual LLM response
24 chat_completion_coroutine = await oai_client.chat.completions.create(**oai_request)
25
26 async for chunk in chat_completion_coroutine:
27 chunk_dict = chunk.model_dump()
28 yield f"data: {json.dumps(chunk_dict)}\n\n"
29 yield "data: [DONE]\n\n"
30
31 except Exception as e:
32 logging.error("An error occurred: %s", str(e))
33 yield f"data: {json.dumps({'error': 'Internal error occurred!'})}\n\n"
34
35 return StreamingResponse(event_stream(), media_type="text/event-stream")

System tools integration

Your custom LLM can trigger system tools to control conversation flow and state. These tools are automatically included in the tools parameter of your chat completion requests when configured in your agent.

How system tools work

  1. LLM Decision: Your custom LLM decides when to call these tools based on conversation context
  2. Tool Response: The LLM responds with function calls in standard OpenAI format
  3. Backend Processing: ElevenLabs processes the tool calls and updates conversation state

For more information on system tools, please refer to the system tools guide.

Available system tools

Purpose: Automatically terminate conversations when appropriate conditions are met.

Trigger conditions: The LLM should call this tool when:

  • The main task has been completed and user is satisfied
  • The conversation reached natural conclusion with mutual agreement
  • The user explicitly indicates they want to end the conversation

Parameters:

  • reason (string, required): The reason for ending the call
  • message (string, optional): A farewell message to send to the user before ending the call

Function call format:

1{
2 "type": "function",
3 "function": {
4 "name": "end_call",
5 "arguments": "{\"reason\": \"Task completed successfully\", \"message\": \"Thank you for using our service. Have a great day!\"}"
6 }
7}

Implementation: Configure as a system tool in your agent settings. The LLM will receive detailed instructions about when to call this function.

Learn more about the end call tool.

Purpose: Automatically switch to the userโ€™s detected language during conversations.

Trigger conditions: The LLM should call this tool when:

  • User speaks in a different language than the current conversation language
  • User explicitly requests to switch languages
  • Multi-language support is needed for the conversation

Parameters:

  • reason (string, required): The reason for the language switch
  • language (string, required): The language code to switch to (must be in supported languages list)

Function call format:

1{
2 "type": "function",
3 "function": {
4 "name": "language_detection",
5 "arguments": "{\"reason\": \"User requested Spanish\", \"language\": \"es\"}"
6 }
7}

Implementation: Configure supported languages in agent settings and add the language detection system tool. The agent will automatically switch voice and responses to match detected languages.

Learn more about the language detection tool.

Purpose: Allow the agent to pause and wait for user input without speaking.

Trigger conditions: The LLM should call this tool when:

  • User indicates they need a moment (โ€œGive me a secondโ€, โ€œLet me thinkโ€)
  • User requests pause in conversation flow
  • Agent detects user needs time to process information

Parameters:

  • reason (string, optional): Free-form reason explaining why the pause is needed

Function call format:

1{
2 "type": "function",
3 "function": {
4 "name": "skip_turn",
5 "arguments": "{\"reason\": \"User requested time to think\"}"
6 }
7}

Implementation: No additional configuration needed. The tool simply signals the agent to remain silent until the user speaks again.

Learn more about the skip turn tool.

Example Request with System Tools

When system tools are configured, your custom LLM will receive requests that include the tools in the standard OpenAI format:

1{
2 "messages": [
3 {
4 "role": "system",
5 "content": "You are a helpful assistant. You have access to system tools for managing conversations."
6 },
7 {
8 "role": "user",
9 "content": "I think we're done here, thanks for your help!"
10 }
11 ],
12 "model": "your-custom-model",
13 "temperature": 0.7,
14 "max_tokens": 1000,
15 "stream": true,
16 "tools": [
17 {
18 "type": "function",
19 "function": {
20 "name": "end_call",
21 "description": "Call this function to end the current conversation when the main task has been completed...",
22 "parameters": {
23 "type": "object",
24 "properties": {
25 "reason": {
26 "type": "string",
27 "description": "The reason for the tool call."
28 },
29 "message": {
30 "type": "string",
31 "description": "A farewell message to send to the user along right before ending the call."
32 }
33 },
34 "required": ["reason"]
35 }
36 }
37 },
38 {
39 "type": "function",
40 "function": {
41 "name": "language_detection",
42 "description": "Change the conversation language when the user expresses a language preference explicitly...",
43 "parameters": {
44 "type": "object",
45 "properties": {
46 "reason": {
47 "type": "string",
48 "description": "The reason for the tool call."
49 },
50 "language": {
51 "type": "string",
52 "description": "The language to switch to. Must be one of language codes in tool description."
53 }
54 },
55 "required": ["reason", "language"]
56 }
57 }
58 },
59 {
60 "type": "function",
61 "function": {
62 "name": "skip_turn",
63 "description": "Skip a turn when the user explicitly indicates they need a moment to think...",
64 "parameters": {
65 "type": "object",
66 "properties": {
67 "reason": {
68 "type": "string",
69 "description": "Optional free-form reason explaining why the pause is needed."
70 }
71 },
72 "required": []
73 }
74 }
75 }
76 ]
77}

Your custom LLM must support function calling to use system tools. Ensure your model can generate proper function call responses in OpenAI format.

Additional Features

You may pass additional parameters to your custom LLM implementation.

1

Define the Extra Parameters

Create an object containing your custom parameters:

1from elevenlabs.conversational_ai.conversation import Conversation, ConversationConfig
2
3extra_body_for_convai = {
4 "UUID": "123e4567-e89b-12d3-a456-426614174000",
5 "parameter-1": "value-1",
6 "parameter-2": "value-2",
7}
8
9config = ConversationConfig(
10 extra_body=extra_body_for_convai,
11)
2

Update the LLM Implementation

Modify your custom LLM code to handle the additional parameters:

1import json
2import os
3import fastapi
4from fastapi.responses import StreamingResponse
5from fastapi import Request
6from openai import AsyncOpenAI
7import uvicorn
8import logging
9from dotenv import load_dotenv
10from pydantic import BaseModel
11from typing import List, Optional
12
13# Load environment variables from .env file
14load_dotenv()
15
16# Retrieve API key from environment
17OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
18if not OPENAI_API_KEY:
19 raise ValueError("OPENAI_API_KEY not found in environment variables")
20
21app = fastapi.FastAPI()
22oai_client = AsyncOpenAI(api_key=OPENAI_API_KEY)
23
24class Message(BaseModel):
25 role: str
26 content: str
27
28class ChatCompletionRequest(BaseModel):
29 messages: List[Message]
30 model: str
31 temperature: Optional[float] = 0.7
32 max_tokens: Optional[int] = None
33 stream: Optional[bool] = False
34 user_id: Optional[str] = None
35 elevenlabs_extra_body: Optional[dict] = None
36
37@app.post("/v1/chat/completions")
38async def create_chat_completion(request: ChatCompletionRequest) -> StreamingResponse:
39 oai_request = request.dict(exclude_none=True)
40 print(oai_request)
41 if "user_id" in oai_request:
42 oai_request["user"] = oai_request.pop("user_id")
43
44 if "elevenlabs_extra_body" in oai_request:
45 oai_request.pop("elevenlabs_extra_body")
46
47 chat_completion_coroutine = await oai_client.chat.completions.create(**oai_request)
48
49 async def event_stream():
50 try:
51 async for chunk in chat_completion_coroutine:
52 chunk_dict = chunk.model_dump()
53 yield f"data: {json.dumps(chunk_dict)}\n\n"
54 yield "data: [DONE]\n\n"
55 except Exception as e:
56 logging.error("An error occurred: %s", str(e))
57 yield f"data: {json.dumps({'error': 'Internal error occurred!'})}\n\n"
58
59 return StreamingResponse(event_stream(), media_type="text/event-stream")
60
61if __name__ == "__main__":
62 uvicorn.run(app, host="0.0.0.0", port=8013)

Example request

With this custom message setup, your LLM will receive requests in this format:

1{
2 "messages": [
3 {
4 "role": "system",
5 "content": "\n <Redacted>"
6 },
7 {
8 "role": "assistant",
9 "content": "Hey I'm currently unavailable."
10 },
11 {
12 "role": "user",
13 "content": "Hey, who are you?"
14 }
15 ],
16 "model": "gpt-4o",
17 "temperature": 0.5,
18 "max_tokens": 5000,
19 "stream": true,
20 "elevenlabs_extra_body": {
21 "UUID": "123e4567-e89b-12d3-a456-426614174000",
22 "parameter-1": "value-1",
23 "parameter-2": "value-2"
24 }
25}