Multi-Context WebSocket
Multi-Context WebSocket
Multi-Context WebSocket
The Multi-Context Text-to-Speech WebSockets API allows for generating audio from text input while managing multiple independent audio generation streams (contexts) over a single WebSocket connection. This is useful for scenarios requiring concurrent or interleaved audio generations, such as dynamic conversational AI applications.
Each context, identified by a context id, maintains its own state. You can send text to specific
contexts, flush them, or close them independently. A close_socket message can be used to terminate
the entire connection gracefully.
For more information on best practices for how to use this API, please see the multi context websocket guide.
Your single use token. Use this if you want to initiate a session from the client. When providing this parameter, xi-api-key is no longer required for authentication.
The ISO 639-1 language code (for specific models).
Timeout for inactivity before a context is closed (seconds), can be up to 180 seconds.
Reduces latency by disabling chunk schedule and buffers. Recommended for full sentences/phrases.
If specified, system will best-effort sample deterministically. Integer between 0 and 4294967295.
Message to initialize a new TTS context in a multi-context stream.
Message to initialize or re-initialize a TTS context with text and settings for multi-stream connections.
This parameter controls text normalization with three modes - ‘auto’, ‘on’, and ‘off’. When set to ‘auto’, the system will automatically decide whether to apply text normalization (e.g., spelling out numbers). With ‘on’, text normalization will always be applied, while with ‘off’, it will be skipped. For the ‘eleven_flash_v2_5’ model, text normalization can only be enabled with Enterprise plans. Defaults to ‘auto’.