What LLMs are supported?

Any LLM that produces text. The SDK has built-in stream extraction for OpenAI (Responses API and Chat Completions API), Anthropic Messages API, and Google Gemini API. For other providers, pass a plain string or an async iterable of string chunks.

What is the difference between Speech Engine and ElevenAgents?

ElevenAgents is a fully-hosted platform where ElevenLabs provides the LLM, knowledge base, and tools. Speech Engine is for developers who want to bring their own LLM and control the conversation logic on their own server.

What server frameworks are supported?

In TypeScript, you can attach Speech Engine to any Node.js HTTP server (Express, Fastify, or plain http.createServer()), or run a standalone WebSocket server. In Python, the SDK provides a standalone server via engine.serve(), or you can integrate with FastAPI, Starlette, or any ASGI framework using engine.create_session().

Introducing Speech Engine

Turn your chat agent into a voice agent

Get started Contact sales

Add human-like voice to your existing chat agent with a single prompt. We handle the voice. Your agent's LLM, RAG, and architecture all remain untouched.

Bring a voice layer to your existing stack

Get started

Speech Engine integrates on top of your existing stack. Nothing is rearchitected and your text-based agent remains untouched.

The full voice layer, in one integration

Speech Engine combines our leading speech, transcription, and voice orchestration models into a single pipeline - all custom built to work best together.

Speech Engine

Speech to Text

Turn Detection

Interrupt Detection

Text to Speech

Audio Orchestration

Elevate your chatbot with voice

Voice is the fastest and richest way to exchange information, making products and services more accessible for customers.

Seamless conversation flow

Our voice models are optimized for conversation, delivering ultra-low latency in real-world environments.

Turn-taking and interruption handling

Dedicated models handle overlapping speech and mid-sentence changes without custom logic on your end.

Global coverage across 70+ languages

Get expressive, human-like voices that support the full range of emotion across a broad range of languages.

A voice stack tested across millions of real conversations

Every component built and optimized to work best together.

Speech to Text

Our transcription model is optimized for conversational accuracy, transcribing user speech at ultra-low 80ms latency.

Text to Speech

Expressive, human-like voices across 70+ languages. Choose from our 11,000+ voice library or create your own with Voice Cloning.

Turn detection

Knows when the user has finished speaking versus pausing - controlling exactly when the transcript is sent to your LLM.

Interruption handling

Monitors for user speech while the agent is talking. Stops playback and loops back instantly when the user cuts in.

Voice activity detection

Filters speech from background noise at the input level, so only clean audio reaches the transcription model.

Audio orchestration

Manages the full voice lifecycle - from capturing user audio to delivering the agent's spoken response.

Add Speech Engine to your agent with one prompt

Install with one command using our skill. The skill sets up everything you need so you can go from chat to voice in a single prompt.

Server SDK

Attach Speech Engine to your server. Receive transcripts, pass them to your LLM, and send the response back - all in a few lines.

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient({
  apiKey: process.env.ELEVENLABS_API_KEY,
});
const engine = await elevenlabs.speechEngine.create({
  name: "My Speech Engine",
  speechEngine: {
    // Note we use the wss protocol instead of https
    wsUrl: "wss://abc123.ngrok.io/ws",
  },
});
console.log("Speech Engine ID:", engine.engineId);

ElevenLabs UI

Drop in pre-built UI components - agent orbs, waveforms, and chat widgets - or build your own on top of the same SDK.

View ElevenLabs UI

Chat interface with options for customer support and text input field. "Yes, for John Johnson please" visible.

Client SDK

Start a conversation session from the browser or mobile app in three lines. The same client integration as ElevenAgents, so upgrading later requires no changes.

import express from "express";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";

const app = express();
const elevenlabs = new ElevenLabsClient({ apiKey: process.env.ELEVENLABS_API_KEY });
const speechEngineId = "seng_8k3m9xr4hjnfg983brhmhkd98n6";

app.get("/api/token", async (req, res) => {
  const { token } = await elevenlabs.conversationalAi.conversations.getWebrtcToken({ agentId: speechEngineId });
  res.json({ token });
});

app.listen(3002, () => console.log("Token server listening on port 3002"));

Enterprise-grade security

Our platform is designed for deployments at scale with enterprise-level data protections, including support for SOC 2, HIPAA, and GDPR compliance. EU Data Residency and Zero Retention Mode are available for stricter data control.

Learn more