Create real-time, interactive voice conversations with AI agents

This documentation is for developers integrating directly with the ElevenLabs WebSocket API. For convenience, consider using the official SDKs provided by ElevenLabs.

The ElevenLabs Conversational AI WebSocket API enables real-time, interactive voice conversations with AI agents. By establishing a WebSocket connection, you can send audio input and receive audio responses in real-time, creating life-like conversational experiences.

Endpoint: wss://api.elevenlabs.io/v1/convai/conversation?agent_id={agent_id}

Authentication

Using Agent ID

For public agents, you can directly use the agent_id in the WebSocket URL without additional authentication:

$wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<your-agent-id>

Using a Signed URL

For private agents or conversations requiring authorization, obtain a signed URL from your server, which securely communicates with the ElevenLabs API using your API key.

Example using cURL

Request:

$curl -X GET "https://api.elevenlabs.io/v1/convai/conversation/get_signed_url?agent_id=<your-agent-id>" \
> -H "xi-api-key: <your-api-key>"

Response:

1{
2 "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<your-agent-id>&token=<token>"
3}
Never expose your ElevenLabs API key on the client side.

WebSocket events

WebSocket API Reference

See the Conversational AI WebSocket API reference documentation for detailed message structures, parameters, and examples.

Latency Management

To ensure smooth conversations, implement these strategies:

  • Adaptive Buffering: Adjust audio buffering based on network conditions.
  • Jitter Buffer: Implement a jitter buffer to smooth out variations in packet arrival times.
  • Ping-Pong Monitoring: Use ping and pong events to measure round-trip time and adjust accordingly.

Security Best Practices

  • Rotate API keys regularly and use environment variables to store them.
  • Implement rate limiting to prevent abuse.
  • Clearly explain the intention when prompting users for microphone access.
  • Optimized Chunking: Tweak the audio chunk duration to balance latency and efficiency.

Additional Resources

Built with