Realtime

Realtime speech-to-text transcription service. This WebSocket API enables streaming audio input and receiving transcription results. ## Event Flow - Audio chunks are sent as `input_audio_chunk` messages - Transcription results are streamed back in various formats (partial, committed, with timestamps) - Supports manual commit or VAD-based automatic commit strategies Authentication is done either by providing a valid API key in the `xi-api-key` header or by providing a valid token in the `token` query parameter. Tokens can be generated from the [single use token endpoint](/docs/api-reference/tokens/create). Use tokens if you want to transcribe audio from the client side.

Handshake

WSS
/v1/speech-to-text/realtime

Headers

xi-api-keystringOptional

Query parameters

model_idstringRequired
ID of the model to use for transcription.
tokenstringOptional

Single use token for authentication. Only used when initiating a session from the client. If provided, xi-api-key is no longer required for authentication.

include_timestampsbooleanOptionalDefaults to false

Whether to receive the committed_transcript_with_timestamps event, which includes word-level timestamps.

include_language_detectionbooleanOptionalDefaults to false

Whether to include the detected language code in the committed_transcript_with_timestamps event.

audio_formatenumOptionalDefaults to pcm_16000

Audio encoding format for speech-to-text.

language_codestringOptional

Language code in ISO 639-1 or ISO 639-3 format.

commit_strategyenumOptionalDefaults to manual
Strategy for committing transcriptions.
Allowed values:
vad_silence_threshold_secsdoubleOptional0.3-3Defaults to 1.5
Silence threshold in seconds for VAD.
vad_thresholddoubleOptional0.1-0.9Defaults to 0.4
Threshold for voice activity detection.
min_speech_duration_msintegerOptional50-2000Defaults to 100
Minimum speech duration in milliseconds.
min_silence_duration_msintegerOptional50-2000Defaults to 100
Minimum silence duration in milliseconds.
enable_loggingbooleanOptionalDefaults to true

When enable_logging is set to false zero retention mode will be used for the request. This will mean history features are unavailable for this request. Zero retention mode may only be used by enterprise customers.

Send

publishobjectRequired

Receive

Session Started PayloadobjectRequired
OR
Partial Transcript PayloadobjectRequired
OR
Committed Transcript PayloadobjectRequired
OR
Committed Transcript with Timestamps PayloadobjectRequired
OR
Scribe Error PayloadobjectRequired
OR
Scribe Auth Error PayloadobjectRequired
OR
Scribe Quota Exceeded Error PayloadobjectRequired
OR
Scribe Throttled Error PayloadobjectRequired
OR
Scribe Unaccepted Terms Error PayloadobjectRequired
OR
Scribe Rate Limited Error PayloadobjectRequired
OR
Scribe Queue Overflow Error PayloadobjectRequired
OR
Scribe Resource Exhausted Error PayloadobjectRequired
OR
Scribe Session Time Limit Exceeded Error PayloadobjectRequired
OR
Scribe Input Error PayloadobjectRequired
OR
Scribe Chunk Size Exceeded Error PayloadobjectRequired
OR
Scribe Insufficient Audio Activity Error PayloadobjectRequired
OR
Scribe Transcriber Error PayloadobjectRequired