Realtime speech-to-text transcription service. This WebSocket API enables streaming audio input and receiving transcription results.
input_audio_chunk messagesAuthentication is done either by providing a valid API key in the xi-api-key header or by providing a valid token in the token query parameter. Tokens can be generated from the single use token endpoint. Use tokens if you want to transcribe audio from the client side.
Single use token for authentication. Only used when initiating a session from the client. If provided, xi-api-key is no longer required for authentication.
Whether to receive the committed_transcript_with_timestamps event, which includes word-level timestamps.
Whether to include the detected language code in the committed_transcript_with_timestamps event.
Audio encoding format for speech-to-text.
Language code in ISO 639-1 or ISO 639-3 format.
List of keyterms to bias the model towards. Maximum 50 keyterms, each up to 20 characters. Adds a 20% premium to the base transcription cost.
When enable_logging is set to false zero retention mode will be used for the request. This will mean history features are unavailable for this request. Zero retention mode may only be used by enterprise customers.
Committed transcription result with word-level timestamps.