WebSocket

Create real-time, interactive voice conversations with AI agents

This documentation is for developers integrating directly with the ElevenLabs WebSocket API. For convenience, consider using the official SDKs provided by ElevenLabs.

The ElevenLabs Conversational AI WebSocket API enables real-time, interactive voice conversations with AI agents. By establishing a WebSocket connection, you can send audio input and receive audio responses in real-time, creating life-like conversational experiences.

Endpoint: wss://api.elevenlabs.io/v1/convai/conversation?agent_id={agent_id}

Authentication

Using Agent ID

For public agents, you can directly use the agent_id in the WebSocket URL without additional authentication:

$wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<your-agent-id>

Using a Signed URL

For private agents or conversations requiring authorization, obtain a signed URL from your server, which securely communicates with the ElevenLabs API using your API key.

Example using cURL

Request:

$curl -X GET "https://api.elevenlabs.io/v1/convai/conversation/get_signed_url?agent_id=<your-agent-id>" \
> -H "xi-api-key: <your-api-key>"

Response:

1{
2 "signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=<your-agent-id>&token=<token>"
3}
Never expose your ElevenLabs API key on the client side.

Communication

Client-to-Server Messages

User Audio Chunk

Send audio data from the user to the server.

Format:

1{
2 "user_audio_chunk": "<base64-encoded-audio-data>"
3}

Notes:

  • Audio Format Requirements:

    • PCM 16-bit mono format
    • Base64 encoded
    • Sample rate of 16,000 Hz
  • Recommended Chunk Duration:

    • Send audio chunks approximately every 250 milliseconds (0.25 seconds)
    • This equates to chunks of about 4,000 samples at a 16,000 Hz sample rate
  • Optimizing Latency and Efficiency:

    • Balance Latency and Efficiency: Sending audio chunks every 250 milliseconds offers a good trade-off between responsiveness and network overhead.
    • Adjust Based on Needs:
      • Lower Latency Requirements: Decrease the chunk duration to send smaller chunks more frequently.
      • Higher Efficiency Requirements: Increase the chunk duration to send larger chunks less frequently.
    • Network Conditions: Adapt the chunk size if you experience network constraints or variability.

Pong Message

Respond to server ping messages by sending a pong message, ensuring the event_id matches the one received in the ping message.

Format:

1{
2 "type": "pong",
3 "event_id": 12345
4}

Server-to-Client Messages

conversation_initiation_metadata

Provides initial metadata about the conversation.

Format:

1{
2 "type": "conversation_initiation_metadata",
3 "conversation_initiation_metadata_event": {
4 "conversation_id": "conv_123456789",
5 "agent_output_audio_format": "pcm_16000"
6 }
7}

Other Server-to-Client Messages

TypePurpose
user_transcriptTranscriptions of the user’s speech
agent_responseAgent’s textual response
audioChunks of the agent’s audio response
interruptionIndicates that the agent’s response was interrupted
pingServer pings to measure latency
client-tool-callInitiate client tool call
client-tool-resultResponse for the client tool call
Message Formats

user_transcript:

1{
2 "type": "user_transcript",
3 "user_transcription_event": {
4 "user_transcript": "Hello, how are you today?"
5 }
6}

agent_response:

1{
2 "type": "agent_response",
3 "agent_response_event": {
4 "agent_response": "Hello! I'm doing well, thank you for asking. How can I assist you today?"
5 }
6}

audio:

1{
2 "type": "audio",
3 "audio_event": {
4 "audio_base_64": "SGVsbG8sIHRoaXMgaXMgYSBzYW1wbGUgYXVkaW8gY2h1bms=",
5 "event_id": 67890
6 }
7}

interruption:

1{
2 "type": "interruption",
3 "interruption_event": {
4 "event_id": 54321
5 }
6}

internal_tentative_agent_response:

1{
2 "type": "internal_tentative_agent_response",
3 "tentative_agent_response_internal_event": {
4 "tentative_agent_response": "I'm thinking about how to respond..."
5 }
6}

ping:

1{
2 "type": "ping",
3 "ping_event": {
4 "event_id": 13579,
5 "ping_ms": 50
6 }
7}

client_tool_call:

1{
2 "type": "client_tool_call",
3 "client_tool_call": {
4 "tool_name": string,
5 "tool_call_id": string,
6 "parameters": dict,
7 }
8}

client_tool_result:

1{
2 "type": "client_tool_result",
3 "tool_call_id": str,
4 "result": str,
5 "is_error": bool,
6}

Latency Management

To ensure smooth conversations, implement these strategies:

  • Adaptive Buffering: Adjust audio buffering based on network conditions.
  • Jitter Buffer: Implement a jitter buffer to smooth out variations in packet arrival times.
  • Ping-Pong Monitoring: Use ping and pong events to measure round-trip time and adjust accordingly.

Security Best Practices

  • Rotate API keys regularly and use environment variables to store them.
  • Implement rate limiting to prevent abuse.
  • Clearly explain the intention when prompting users for microphone access.
  • Optimized Chunking: Tweak the audio chunk duration to balance latency and efficiency.

Additional Resources

Built with