Using Your Own OpenAI Key for LLM
To integrate a custom OpenAI key, create a secret containing your OPENAI_API_KEY:
Navigate to the “Secrets” page and select “Add Secret”
Choose “Custom LLM” from the dropdown menu.
Enter the URL, your model, and the secret you created.
Custom LLM Server
To bring a custom LLM server, set up a compatible server endpoint using OpenAI’s style, specifically targeting create_chat_completion.
Here’s an example server implementation using FastAPI and OpenAI’s Python SDK:
python
import json
import os
import fastapi
from fastapi.responses import StreamingResponse
from openai import AsyncOpenAI
import uvicorn
import logging
from dotenv import load_dotenv
from pydantic import BaseModel
from typing import List, Optional
load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
if not OPENAI_API_KEY:
raise ValueError("OPENAI_API_KEY not found in environment variables")
app = fastapi.FastAPI()
oai_client = AsyncOpenAI(api_key=OPENAI_API_KEY)
class Message(BaseModel):
role: str
content: str
class ChatCompletionRequest(BaseModel):
messages: List[Message]
model: str
temperature: Optional[float] = 0.7
max_tokens: Optional[int] = None
stream: Optional[bool] = False
user_id: Optional[str] = None
@app.post("/v1/chat/completions")
async def create_chat_completion(request: ChatCompletionRequest) -> StreamingResponse:
oai_request = request.dict(exclude_none=True)
if "user_id" in oai_request:
oai_request["user"] = oai_request.pop("user_id")
chat_completion_coroutine = await oai_client.chat.completions.create(**oai_request)
async def event_stream():
try:
async for chunk in chat_completion_coroutine:
yield f"data: {json.dumps(chunk)}\n\n"
yield "data: [DONE]\n\n"
except Exception as e:
logging.error("An error occurred: %s", str(e))
yield f"data: {json.dumps({'error': 'Internal error occurred!'})}\n\n"
return StreamingResponse(event_stream(), media_type="text/event-stream")
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8013)
Run this code or your own server code.
Setting Up a Public URL for Your Server
To make your server accessible, create a public URL using a tunneling tool like ngrok:
ngrok http --url=<Your url>.ngrok.app 8013
Configuring Elevenlabs CustomLLM
Now let’s make the changes in Elevenlabs
Direct your server URL to ngrok endpoint, also setup “Limit token usage” to 5000.
You can start interacting with Conversational AI with your own LLM server