WebSocket
Create real-time, interactive voice conversations with AI agents
This documentation is for developers integrating directly with the ElevenLabs WebSocket API. For convenience, consider using the official SDKs provided by ElevenLabs.
The ElevenLabs Conversational AI WebSocket API enables real-time, interactive voice conversations with AI agents. By establishing a WebSocket connection, you can send audio input and receive audio responses in real-time, creating life-like conversational experiences.
wss://api.elevenlabs.io/v1/convai/conversation?agent_id={agent_id}
Authentication
Using Agent ID
For public agents, you can directly use the agent_id
in the WebSocket URL without additional authentication:
Using a signed URL
For private agents or conversations requiring authorization, obtain a signed URL from your server, which securely communicates with the ElevenLabs API using your API key.
Example using cURL
Request:
Response:
WebSocket events
Client to server events
The following events can be sent from the client to the server:
Contextual Updates
Send non-interrupting contextual information to update the conversation state. This allows you to provide additional context without disrupting the ongoing conversation flow.
Use cases:
- Updating user status or preferences
- Providing environmental context
- Adding background information
- Tracking user interface interactions
Key points:
- Does not interrupt current conversation flow
- Updates are incorporated as tool calls in conversation history
- Helps maintain context without breaking the natural dialogue
Contextual updates are processed asynchronously and do not require a direct response from the server.
Next.js implementation example
This example demonstrates how to implement a WebSocket-based conversational AI client in Next.js using the ElevenLabs WebSocket API.
While this example uses the voice-stream
package for microphone input handling, you can
implement your own solution for capturing and encoding audio. The focus here is on demonstrating
the WebSocket connection and event handling with the ElevenLabs API.
Install required dependencies
First, install the necessary packages:
The voice-stream
package handles microphone access and audio streaming, automatically encoding the audio in base64 format as required by the ElevenLabs API.
This example uses Tailwind CSS for styling. To add Tailwind to your Next.js project:
Then follow the official Tailwind CSS setup guide for Next.js.
Alternatively, you can replace the className attributes with your own CSS styles.
Next steps
- Audio Playback: Implement your own audio playback system using Web Audio API or a library. Remember to handle audio queuing to prevent overlapping as the WebSocket sends audio events in chunks.
- Error Handling: Add retry logic and error recovery mechanisms
- UI Feedback: Add visual indicators for voice activity and connection status
Latency management
To ensure smooth conversations, implement these strategies:
- Adaptive Buffering: Adjust audio buffering based on network conditions.
- Jitter Buffer: Implement a jitter buffer to smooth out variations in packet arrival times.
- Ping-Pong Monitoring: Use ping and pong events to measure round-trip time and adjust accordingly.
Security best practices
- Rotate API keys regularly and use environment variables to store them.
- Implement rate limiting to prevent abuse.
- Clearly explain the intention when prompting users for microphone access.
- Optimized Chunking: Tweak the audio chunk duration to balance latency and efficiency.