Multi-Context Websocket
Multi-Context Websocket
This guide shows you how to build real-time voice agents using the multi-context WebSocket API.
Multi-Context Websocket
This guide shows you how to build real-time voice agents using the multi-context WebSocket API.
Orchestrating voice agents using this multi-context WebSocket API is a complex task recommended for advanced developers. For a more managed solution, consider exploring our Agents Platform product, which simplifies many of these challenges.
eleven_v3 model.Building responsive voice agents requires the ability to manage audio streams dynamically, handle interruptions gracefully, and maintain natural-sounding speech across conversational turns. Our multi-context WebSocket API for Text to Speech (TTS) is specifically designed for these scenarios.
This API extends our standard TTS WebSocket functionality by introducing the concept of “contexts.” Each context operates as an independent audio generation stream within a single WebSocket connection. This allows you to:
The multi-context WebSocket API is optimized for voice applications and is not intended for generating multiple unrelated audio streams simultaneously. Each connection is limited to 5 concurrent contexts to reflect this.
This guide will walk you through connecting to the multi-context WebSocket, managing contexts, and applying best practices for building engaging voice agents.
These best practices are essential for building responsive, efficient voice agents with our multi-context WebSocket API.
Establish one WebSocket connection for each end-user session. This reduces overhead and latency compared to creating multiple connections. Within this single connection, you can manage multiple contexts for different parts of the conversation.
When generating long responses, stream the text in smaller chunks and use the flush: true flag
at the end of complete sentences. This improves the quality of the generated audio and improves
responsiveness.
Stream text into one context until an interruption occurs, then create a new context and close the existing one. This approach ensures smooth transitions when the conversation flow changes.
When a user interrupts your agent, you should close the current context and create a new one:
Contexts automatically timeout after a default of 20 seconds of inactivity. If you need to keep a context alive without generating text (for example, during a processing delay), you can send an empty text message to reset the timeout clock.
When your conversation ends, you can clean up all contexts by closing the socket:
Install the necessary dependencies for your chosen language:
Create a .env file in your project directory to store your API key: