Python SDK reference

Classes, methods, and events for the Speech Engine Python SDK.

This page documents the public API for the Speech Engine Python SDK (elevenlabs).

Getting a Speech Engine resource

Retrieve a SpeechEngineResource by its engine ID. The returned object provides methods to start a server, verify requests, or create individual sessions.

1from elevenlabs import AsyncElevenLabs
2
3elevenlabs = AsyncElevenLabs()
4engine = await elevenlabs.speech_engine.get("seng_8k3m9xr4hjnfg983brhmhkd98n6")

SpeechEngineResource

Properties

PropertyTypeDescription
engine_idstrThe ID of the speech engine.

serve

Start a standalone WebSocket server. Blocks until stopped.

1await engine.serve(
2 port=3001,
3 path="/ws",
4 debug=True,
5 on_transcript=handle_transcript,
6)
ParameterTypeDefaultDescription
portint3001Port to listen on.
pathstrNoneRestrict connections to this path. None accepts all.
debugboolFalseEnable debug logging to stdout.
on_initcallableCalled when a session is initialized.
on_transcriptcallableCalled when a user transcript arrives.
on_closecallableCalled on clean disconnect.
on_disconnectcallableCalled when the WebSocket drops unexpectedly.
on_errorcallableCalled on protocol or WebSocket errors.

verify_request

Verify that an incoming request originates from the ElevenLabs Speech Engine API. Checks the X-Elevenlabs-Speech-Engine-Authorization header for a valid JWT signed with the SHA-256 hash of your API key.

Only needed when managing the WebSocket upgrade yourself. When using serve(), verification is handled automatically.

1is_valid = engine.verify_request(headers)
ParameterTypeDescription
headersdictRequest headers dictionary.

Returns: boolTrue if the request is valid.

create_session

Wrap an accepted WebSocket in a SpeechEngineSession. Use this for custom server integration (e.g. FastAPI, Starlette, or manual WebSocket handling).

1session = engine.create_session(websocket, debug=True)
2session.on("user_transcript", handle_transcript)
3await session.run()
ParameterTypeDefaultDescription
wsWebSocketAn accepted WebSocket connection.
debugboolFalseEnable debug logging.

Returns: SpeechEngineSession

SpeechEngineSession

Wraps a single WebSocket connection. Each connection represents one conversation. The session emits events for transcripts and lifecycle changes, and provides methods to send LLM responses back.

When a new transcript arrives, the previous transcript handler is cancelled automatically, interrupting any in-flight LLM call.

Properties

PropertyTypeDescription
conversation_idOptional[str]The conversation ID assigned by the API. Available after init.
is_openboolWhether the session is still open.

on

Register a handler for an event. Returns the session for chaining.

1session.on("user_transcript", handler)

off

Remove a previously registered handler.

1session.off("user_transcript", handler)

once

Register a handler that fires once then removes itself.

1session.once("init", handler)

send_response

Send an LLM response back to the Speech Engine API for text-to-speech synthesis. Must be called inside an on_transcript handler. Calling it outside of a handler emits a warning and returns without sending.

1# String response
2await session.send_response("Hello, how can I help?")
3
4# Streamed response (OpenAI, Anthropic, or Gemini)
5stream = await openai_client.responses.create(model="gpt-4o", input=messages, stream=True)
6await session.send_response(stream)
ParameterTypeDescription
responsestr | async iterableA complete string or an async iterable of text chunks / LLM stream events.

The SDK auto-detects and extracts text from the following LLM stream formats:

ProviderEvent format
OpenAI Responses API{ type: "response.output_text.delta", delta: "text" }
OpenAI Chat Completions{ choices: [{ delta: { content: "text" } }] }
Anthropic Messages API{ type: "content_block_delta", delta: { type: "text_delta", text: "text" } }
Google Gemini API{ candidates: [{ content: { parts: [{ text: "text" }] } }] }

run

Run the receive loop until the WebSocket closes. This is the main entry point after constructing a session manually via create_session().

1session = engine.create_session(websocket)
2session.on("user_transcript", handle_transcript)
3await session.run()

close

Close the session and the underlying WebSocket connection.

1session.close()

Callbacks

The keyword arguments passed to serve(). All callbacks are optional. Handlers can be synchronous or asynchronous (coroutine) functions.

CallbackSignatureDescription
on_init(conversation_id: str, session) -> NoneSession initialized with a conversation ID.
on_transcript(transcript: list, session) -> NoneUser speech transcribed.
on_close(session) -> NoneClean disconnect from ElevenLabs.
on_disconnect(session) -> NoneWebSocket dropped unexpectedly.
on_error(error: Exception, session) -> NoneProtocol or WebSocket error.

Events

When using session.on() directly instead of callbacks, these are the event names and their handler signatures.

EventHandler signature
user_transcript(transcript: list[ConversationMessage])
init(conversation_id: str)
close()
disconnected()
error(error: Exception)

Event name constants are available for type-safe usage:

1from elevenlabs.speech_engine import USER_TRANSCRIPT, INIT, CLOSE, DISCONNECTED, ERROR
2
3session.on(USER_TRANSCRIPT, handle_transcript)

ConversationMessage

A single message in the conversation history. The full transcript is passed to on_transcript on every turn.

PropertyTypeDescription
role"user" | "agent"Who sent the message.
contentstrThe text content of the message.

Wire protocol

For reference, these are the JSON messages exchanged over the WebSocket connection. The SDK handles serialization and deserialization automatically.

Incoming (ElevenLabs API to developer server)

Message typeFieldsDescription
initconversation_id: stringSession initialized.
user_transcriptuser_transcript: TranscriptMessage[], event_id: numberUser speech transcribed.
pingKeep-alive. SDK responds with pong.
closeClean disconnect.
errormessage: stringError from the API.

Outgoing (developer server to ElevenLabs API)

Message typeFieldsDescription
agent_responsecontent: string, event_id: number, is_final: booleanLLM response chunk for TTS synthesis.
pongResponse to ping.