Custom LLM allows you to connect your conversations to your own LLM via an external endpoint. ElevenLabs also supports natively integrated LLMs

Custom LLMs let you bring your own OpenAI API key or run an entirely custom LLM server.

Overview

By default, we use our own internal credentials for popular models like OpenAI. To use a custom LLM server, it must align with the OpenAI create chat completion request/response structure.

The following guides cover both use cases:

Bring your own OpenAI key: Use your own OpenAI API key with our platform.
Custom LLM server: Host and connect your own LLM server implementation.

You’ll learn how to:

Store your OpenAI API key in ElevenLabs
host a server that replicates OpenAI’s create chat completion endpoint
Direct ElevenLabs to your custom endpoint
Pass extra parameters to your LLM as needed

Using your own OpenAI key

To integrate a custom OpenAI key, create a secret containing your OPENAI_API_KEY:

Navigate to the “Secrets” page and select “Add Secret”

Choose “Custom LLM” from the dropdown menu.

Enter the URL, your model, and the secret you created.

Set “Custom LLM extra body” to true.

Custom LLM Server

To bring a custom LLM server, set up a compatible server endpoint using OpenAI’s style, specifically targeting create_chat_completion.

Here’s an example server implementation using FastAPI and OpenAI’s Python SDK:

1 import json
2 import os
3 import fastapi
4 from fastapi.responses import StreamingResponse
5 from openai import AsyncOpenAI
6 import uvicorn
7 import logging
8 from dotenv import load_dotenv
9 from pydantic import BaseModel
10 from typing import List, Optional
11 
12 # Load environment variables from .env file
13 load_dotenv()
14 
15 # Retrieve API key from environment
16 OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
17 if not OPENAI_API_KEY:
18     raise ValueError("OPENAI_API_KEY not found in environment variables")
19 
20 app = fastapi.FastAPI()
21 oai_client = AsyncOpenAI(api_key=OPENAI_API_KEY)
22 
23 class Message(BaseModel):
24     role: str
25     content: str
26 
27 class ChatCompletionRequest(BaseModel):
28     messages: List[Message]
29     model: str
30     temperature: Optional[float] = 0.7
31     max_tokens: Optional[int] = None
32     stream: Optional[bool] = False
33     user_id: Optional[str] = None
34 
35 @app.post("/v1/chat/completions")
36 async def create_chat_completion(request: ChatCompletionRequest) -> StreamingResponse:
37     oai_request = request.dict(exclude_none=True)
38     if "user_id" in oai_request:
39         oai_request["user"] = oai_request.pop("user_id")
40 
41     chat_completion_coroutine = await oai_client.chat.completions.create(**oai_request)
42 
43     async def event_stream():
44         try:
45             async for chunk in chat_completion_coroutine:
46                 # Convert the ChatCompletionChunk to a dictionary before JSON serialization
47                 chunk_dict = chunk.model_dump()
48                 yield f"data: {json.dumps(chunk_dict)}\n\n"
49             yield "data: [DONE]\n\n"
50         except Exception as e:
51             logging.error("An error occurred: %s", str(e))
52             yield f"data: {json.dumps({'error': 'Internal error occurred!'})}\n\n"
53 
54     return StreamingResponse(event_stream(), media_type="text/event-stream")
55 
56 if __name__ == "__main__":
57     uvicorn.run(app, host="0.0.0.0", port=8013)

Run this code or your own server code.

Setting Up a Public URL for Your Server

To make your server accessible, create a public URL using a tunneling tool like ngrok:

$ ngrok http --url=<Your url>.ngrok.app 8013

Configuring Elevenlabs CustomLLM

Now let’s make the changes in Elevenlabs

Direct your server URL to ngrok endpoint, setup “Limit token usage” to 5000 and set “Custom LLM extra body” to true.

You can start interacting with Agents Platform with your own LLM server

Optimizing for slow processing LLMs

If your custom LLM has slow processing times (perhaps due to agentic reasoning or pre-processing requirements) you can improve the conversational flow by implementing buffer words in your streaming responses. This technique helps maintain natural speech prosody while your LLM generates the complete response.

Buffer words

When your LLM needs more time to process the full response, return an initial response ending with "... " (ellipsis followed by a space). This allows the Text to Speech system to maintain natural flow while keeping the conversation feeling dynamic. This creates natural pauses that flow well into subsequent content that the LLM can reason longer about. The extra space is crucial to ensure that the subsequent content is not appended to the ”…” which can lead to audio distortions.

Implementation

Here’s how to modify your custom LLM server to implement buffer words:

1 @app.post("/v1/chat/completions")
2 async def create_chat_completion(request: ChatCompletionRequest) -> StreamingResponse:
3     oai_request = request.dict(exclude_none=True)
4     if "user_id" in oai_request:
5         oai_request["user"] = oai_request.pop("user_id")
6 
7     async def event_stream():
8         try:
9             # Send initial buffer chunk while processing
10             initial_chunk = {
11                 "id": "chatcmpl-buffer",
12                 "object": "chat.completion.chunk",
13                 "created": 1234567890,
14                 "model": request.model,
15                 "choices": [{
16                     "delta": {"content": "Let me think about that... "},
17                     "index": 0,
18                     "finish_reason": None
19                 }]
20             }
21             yield f"data: {json.dumps(initial_chunk)}\n\n"
22 
23             # Process the actual LLM response
24             chat_completion_coroutine = await oai_client.chat.completions.create(**oai_request)
25 
26             async for chunk in chat_completion_coroutine:
27                 chunk_dict = chunk.model_dump()
28                 yield f"data: {json.dumps(chunk_dict)}\n\n"
29             yield "data: [DONE]\n\n"
30 
31         except Exception as e:
32             logging.error("An error occurred: %s", str(e))
33             yield f"data: {json.dumps({'error': 'Internal error occurred!'})}\n\n"
34 
35     return StreamingResponse(event_stream(), media_type="text/event-stream")

System tools integration

Your custom LLM can trigger system tools to control conversation flow and state. These tools are automatically included in the tools parameter of your chat completion requests when configured in your agent.

How system tools work

LLM Decision: Your custom LLM decides when to call these tools based on conversation context
Tool Response: The LLM responds with function calls in standard OpenAI format
Backend Processing: ElevenLabs processes the tool calls and updates conversation state

For more information on system tools, please see our guide

Available system tools

End call

Purpose: Automatically terminate conversations when appropriate conditions are met.

Trigger conditions: The LLM should call this tool when:

The main task has been completed and user is satisfied
The conversation reached natural conclusion with mutual agreement
The user explicitly indicates they want to end the conversation

Parameters:

reason (string, required): The reason for ending the call
message (string, optional): A farewell message to send to the user before ending the call

Function call format:

1 {
2   "type": "function",
3   "function": {
4     "name": "end_call",
5     "arguments": "{\"reason\": \"Task completed successfully\", \"message\": \"Thank you for using our service. Have a great day!\"}"
6   }
7 }

Implementation: Configure as a system tool in your agent settings. The LLM will receive detailed instructions about when to call this function.

Learn more: End call tool

Language detection

Purpose: Automatically switch to the user’s detected language during conversations.

Trigger conditions: The LLM should call this tool when:

User speaks in a different language than the current conversation language
User explicitly requests to switch languages
Multi-language support is needed for the conversation

Parameters:

reason (string, required): The reason for the language switch
language (string, required): The language code to switch to (must be in supported languages list)

Function call format:

1 {
2   "type": "function",
3   "function": {
4     "name": "language_detection",
5     "arguments": "{\"reason\": \"User requested Spanish\", \"language\": \"es\"}"
6   }
7 }

Implementation: Configure supported languages in agent settings and add the language detection system tool. The agent will automatically switch voice and responses to match detected languages.

Learn more: Language detection tool

Agent transfer

Purpose: Transfer conversations between specialized AI agents based on user needs.

Trigger conditions: The LLM should call this tool when:

User request requires specialized knowledge or different agent capabilities
Current agent cannot adequately handle the query
Conversation flow indicates need for different agent type

Parameters:

reason (string, optional): The reason for the agent transfer
agent_number (integer, required): Zero-indexed number of the agent to transfer to (based on configured transfer rules)

Function call format:

1 {
2   "type": "function",
3   "function": {
4     "name": "transfer_to_agent",
5     "arguments": "{\"reason\": \"User needs billing support\", \"agent_number\": 0}"
6   }
7 }

Implementation: Define transfer rules mapping conditions to specific agent IDs. Configure which agents the current agent can transfer to. Agents are referenced by zero-indexed numbers in the transfer configuration.

Learn more: Agent transfer tool

Transfer to human

Purpose: Seamlessly hand off conversations to human operators when AI assistance is insufficient.

Trigger conditions: The LLM should call this tool when:

Complex issues requiring human judgment
User explicitly requests human assistance
AI reaches limits of capability for the specific request
Escalation protocols are triggered

Parameters:

reason (string, optional): The reason for the transfer
transfer_number (string, required): The phone number to transfer to (must match configured numbers)
client_message (string, required): Message read to the client while waiting for transfer
agent_message (string, required): Message for the human operator receiving the call

Function call format:

1 {
2   "type": "function",
3   "function": {
4     "name": "transfer_to_number",
5     "arguments": "{\"reason\": \"Complex billing issue\", \"transfer_number\": \"+15551234567\", \"client_message\": \"I'm transferring you to a billing specialist who can help with your account.\", \"agent_message\": \"Customer has a complex billing dispute about order #12345 from last month.\"}"
6   }
7 }

Implementation: Configure transfer phone numbers and conditions. Define messages for both customer and receiving human operator. Works with both Twilio and SIP trunking.

Learn more: Transfer to human tool

Skip turn

Purpose: Allow the agent to pause and wait for user input without speaking.

Trigger conditions: The LLM should call this tool when:

User indicates they need a moment (“Give me a second”, “Let me think”)
User requests pause in conversation flow
Agent detects user needs time to process information

Parameters:

reason (string, optional): Free-form reason explaining why the pause is needed

Function call format:

1 {
2   "type": "function",
3   "function": {
4     "name": "skip_turn",
5     "arguments": "{\"reason\": \"User requested time to think\"}"
6   }
7 }

Implementation: No additional configuration needed. The tool simply signals the agent to remain silent until the user speaks again.

Learn more: Skip turn tool

Voicemail detection

Parameters:

reason (string, required): The reason for detecting voicemail (e.g., “automated greeting detected”, “no human response”)

Function call format:

1 {
2   "type": "function",
3   "function": {
4     "name": "voicemail_detection",
5     "arguments": "{\"reason\": \"Automated greeting detected with request to leave message\"}"
6   }
7 }

Learn more: Voicemail detection tool

Example Request with System Tools

When system tools are configured, your custom LLM will receive requests that include the tools in the standard OpenAI format:

1 {
2   "messages": [
3     {
4       "role": "system",
5       "content": "You are a helpful assistant. You have access to system tools for managing conversations."
6     },
7     {
8       "role": "user",
9       "content": "I think we're done here, thanks for your help!"
10     }
11   ],
12   "model": "your-custom-model",
13   "temperature": 0.7,
14   "max_tokens": 1000,
15   "stream": true,
16   "tools": [
17     {
18       "type": "function",
19       "function": {
20         "name": "end_call",
21         "description": "Call this function to end the current conversation when the main task has been completed...",
22         "parameters": {
23           "type": "object",
24           "properties": {
25             "reason": {
26               "type": "string",
27               "description": "The reason for the tool call."
28             },
29             "message": {
30               "type": "string",
31               "description": "A farewell message to send to the user along right before ending the call."
32             }
33           },
34           "required": ["reason"]
35         }
36       }
37     },
38     {
39       "type": "function",
40       "function": {
41         "name": "language_detection",
42         "description": "Change the conversation language when the user expresses a language preference explicitly...",
43         "parameters": {
44           "type": "object",
45           "properties": {
46             "reason": {
47               "type": "string",
48               "description": "The reason for the tool call."
49             },
50             "language": {
51               "type": "string",
52               "description": "The language to switch to. Must be one of language codes in tool description."
53             }
54           },
55           "required": ["reason", "language"]
56         }
57       }
58     },
59     {
60       "type": "function",
61       "function": {
62         "name": "skip_turn",
63         "description": "Skip a turn when the user explicitly indicates they need a moment to think...",
64         "parameters": {
65           "type": "object",
66           "properties": {
67             "reason": {
68               "type": "string",
69               "description": "Optional free-form reason explaining why the pause is needed."
70             }
71           },
72           "required": []
73         }
74       }
75     }
76   ]
77 }

Your custom LLM must support function calling to use system tools. Ensure your model can generate proper function call responses in OpenAI format.

Additional Features

Custom LLM Parameters

You may pass additional parameters to your custom LLM implementation.

Python

Define the Extra Parameters

Create an object containing your custom parameters:

1 from elevenlabs.conversational_ai.conversation import Conversation, ConversationConfig
2 
3 extra_body_for_convai = {
4     "UUID": "123e4567-e89b-12d3-a456-426614174000",
5     "parameter-1": "value-1",
6     "parameter-2": "value-2",
7 }
8 
9 config = ConversationConfig(
10     extra_body=extra_body_for_convai,
11 )

Update the LLM Implementation

Modify your custom LLM code to handle the additional parameters:

1 import json
2 import os
3 import fastapi
4 from fastapi.responses import StreamingResponse
5 from fastapi import Request
6 from openai import AsyncOpenAI
7 import uvicorn
8 import logging
9 from dotenv import load_dotenv
10 from pydantic import BaseModel
11 from typing import List, Optional
12 
13 # Load environment variables from .env file
14 load_dotenv()
15 
16 # Retrieve API key from environment
17 OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
18 if not OPENAI_API_KEY:
19     raise ValueError("OPENAI_API_KEY not found in environment variables")
20 
21 app = fastapi.FastAPI()
22 oai_client = AsyncOpenAI(api_key=OPENAI_API_KEY)
23 
24 class Message(BaseModel):
25     role: str
26     content: str
27 
28 class ChatCompletionRequest(BaseModel):
29     messages: List[Message]
30     model: str
31     temperature: Optional[float] = 0.7
32     max_tokens: Optional[int] = None
33     stream: Optional[bool] = False
34     user_id: Optional[str] = None
35     elevenlabs_extra_body: Optional[dict] = None
36 
37 @app.post("/v1/chat/completions")
38 async def create_chat_completion(request: ChatCompletionRequest) -> StreamingResponse:
39     oai_request = request.dict(exclude_none=True)
40     print(oai_request)
41     if "user_id" in oai_request:
42         oai_request["user"] = oai_request.pop("user_id")
43 
44     if "elevenlabs_extra_body" in oai_request:
45         oai_request.pop("elevenlabs_extra_body")
46 
47     chat_completion_coroutine = await oai_client.chat.completions.create(**oai_request)
48 
49     async def event_stream():
50         try:
51             async for chunk in chat_completion_coroutine:
52                 chunk_dict = chunk.model_dump()
53                 yield f"data: {json.dumps(chunk_dict)}\n\n"
54             yield "data: [DONE]\n\n"
55         except Exception as e:
56             logging.error("An error occurred: %s", str(e))
57             yield f"data: {json.dumps({'error': 'Internal error occurred!'})}\n\n"
58 
59     return StreamingResponse(event_stream(), media_type="text/event-stream")
60 
61 if __name__ == "__main__":
62     uvicorn.run(app, host="0.0.0.0", port=8013)

Example Request

With this custom message setup, your LLM will receive requests in this format:

1 {
2   "messages": [
3     {
4       "role": "system",
5       "content": "\n  <Redacted>"
6     },
7     {
8       "role": "assistant",
9       "content": "Hey I'm currently unavailable."
10     },
11     {
12       "role": "user",
13       "content": "Hey, who are you?"
14     }
15   ],
16   "model": "gpt-4o",
17   "temperature": 0.5,
18   "max_tokens": 5000,
19   "stream": true,
20   "elevenlabs_extra_body": {
21     "UUID": "123e4567-e89b-12d3-a456-426614174000",
22     "parameter-1": "value-1",
23     "parameter-2": "value-2"
24   }
25 }