Realtime | ElevenLabs Documentation

Realtime speech-to-text transcription service. This WebSocket API enables streaming audio input and receiving transcription results.

Event Flow

Audio chunks are sent as input_audio_chunk messages
Transcription results are streamed back in various formats (partial, committed, with timestamps)
Supports manual commit or VAD-based automatic commit strategies

Authentication is done either by providing a valid API key in the xi-api-key header or by providing a valid token in the token query parameter. Tokens can be generated from the single use token endpoint. Use tokens if you want to transcribe audio from the client side.

Realtime speech-to-text transcription service. This WebSocket API enables streaming audio input and receiving transcription results. ## Event Flow - Audio chunks are sent as `input_audio_chunk` messages - Transcription results are streamed back in various formats (partial, committed, with timestamps) - Supports manual commit or VAD-based automatic commit strategies Authentication is done either by providing a valid API key in the `xi-api-key` header or by providing a valid token in the `token` query parameter. Tokens can be generated from the [single use token endpoint](/docs/api-reference/single-use/create). Use tokens if you want to transcribe audio from the client side.

HandshakeTry it

WSS

/v1/speech-to-text/realtime

Headers

xi-api-keystringRequired

Query Parameters

model_idstringRequired

ID of the model to use for transcription.

tokenstringOptional

Your authorization bearer token.

include_timestampsbooleanOptionalDefaults to false

Whether to receive the committed_transcript_with_timestamps event, which includes word-level timestamps.

audio_formatenumOptionalDefaults to pcm_16000

Audio encoding format for speech-to-text.

language_codestringOptional

Language code in ISO 639-1 or ISO 639-3 format.

commit_strategyenumOptionalDefaults to manual

Strategy for committing transcriptions.

Allowed values:

vad_silence_threshold_secsdoubleOptionalDefaults to 1.5

Silence threshold in seconds for VAD.

vad_thresholddoubleOptionalDefaults to 0.4

Threshold for voice activity detection.

min_speech_duration_msintegerOptionalDefaults to 250

Minimum speech duration in milliseconds.

min_silence_duration_msintegerOptionalDefaults to 2500

Minimum silence duration in milliseconds.

enable_loggingbooleanOptionalDefaults to true

When enable_logging is set to false zero retention mode will be used for the request. This will mean history features are unavailable for this request. Zero retention mode may only be used by enterprise customers.

Send

inputAudioChunkobjectRequired

Receive

sessionStartedobjectRequired

partialTranscriptobjectRequired

committedTranscriptobjectRequired

committedTranscriptWithTimestampsobjectRequired

scribeErrorobjectRequired

scribeAuthErrorobjectRequired

scribeQuotaExceededErrorobjectRequired

URL	wss://api.elevenlabs.io/v1/speech-to-text/realtime
Method	GET
Status	101 Switching Protocols