Understand and handle real-time communication events in your conversational applications.

Client events are system-level events that facilitate real-time communication between the client and server. These events manage various aspects of the conversation, including audio playback, transcription, interruptions, and more.

Overview

Client events are essential for maintaining the real-time nature of conversations. They handle everything from initialization to audio playback and user interactions.

These events are part of the WebSocket communication protocol and are automatically handled by our SDKs. Understanding them is crucial for advanced implementations and debugging.

Client event types

  • Automatically sent when starting a conversation
  • Initializes conversation settings and parameters
1// Example initialization metadata
2{
3 "type": "conversation_initiation_metadata",
4 "conversation_initiation_metadata_event": {
5 "conversation_id": "conv_123",
6 "agent_output_audio_format": "pcm_44100", // TTS output format
7 "user_input_audio_format": "pcm_16000" // ASR input format
8 }
9}
  • Health check event requiring immediate response

  • Automatically handled by SDK

  • Used to maintain WebSocket connection

    1// Example ping event structure
    2{
    3 "ping_event": {
    4 "event_id": 123456,
    5 "ping_ms": 50 // Optional, estimated latency in milliseconds
    6 },
    7 "type": "ping"
    8}
    1// Example ping handler
    2websocket.on('ping', () => {
    3 websocket.send('pong');
    4});
  • Contains base64 encoded audio for playback
  • Includes numeric event ID for tracking and sequencing
  • Handles voice output streaming
1// Example audio event structure
2{
3 "audio_event": {
4 "audio_base_64": "base64_encoded_audio_string",
5 "event_id": 12345
6 },
7 "type": "audio"
8}
1// Example audio event handler
2websocket.on('audio', (event) => {
3 const { audio_event } = event;
4 const { audio_base_64, event_id } = audio_event;
5 audioPlayer.play(audio_base_64);
6});
  • Contains finalized speech-to-text results
  • Represents complete user utterances
  • Used for conversation history
1// Example transcript event structure
2{
3 "type": "user_transcript",
4 "user_transcription_event": {
5 "user_transcript": "Hello, how can you help me today?"
6 }
7}
1// Example transcript handler
2websocket.on('user_transcript', (event) => {
3 const { user_transcription_event } = event;
4 const { user_transcript } = user_transcription_event;
5 updateConversationHistory(user_transcript);
6});
  • Contains complete agent message
  • Sent with first audio chunk
  • Used for display and history
1// Example response event structure
2{
3 "type": "agent_response",
4 "agent_response_event": {
5 "agent_response": "Hello, how can I assist you today?"
6 }
7}
1// Example response handler
2websocket.on('agent_response', (event) => {
3 const { agent_response_event } = event;
4 const { agent_response } = agent_response_event;
5 displayAgentMessage(agent_response);
6});
  • Contains truncated response after interruption
    • Updates displayed message
    • Maintains conversation accuracy
1// Example response correction event structure
2{
3 "type": "agent_response_correction",
4 "agent_response_correction_event": {
5 "original_agent_response": "Let me tell you about the complete history...",
6 "corrected_agent_response": "Let me tell you about..." // Truncated after interruption
7 }
8}
1// Example response correction handler
2websocket.on('agent_response_correction', (event) => {
3 const { agent_response_correction_event } = event;
4 const { corrected_agent_response } = agent_response_correction_event;
5 displayAgentMessage(corrected_agent_response);
6});
  • Represents a function call the agent wants the client to execute
  • Contains tool name, tool call ID, and parameters
  • Requires client-side execution of the function and sending the result back to the server

If you are using the SDK, callbacks are provided to handle sending the result back to the server.

1// Example tool call event structure
2{
3 "type": "client_tool_call",
4 "client_tool_call": {
5 "tool_name": "search_database",
6 "tool_call_id": "call_123456",
7 "parameters": {
8 "query": "user information",
9 "filters": {
10 "date": "2024-01-01"
11 }
12 }
13 }
14}
1// Example tool call handler
2websocket.on('client_tool_call', async (event) => {
3 const { client_tool_call } = event;
4 const { tool_name, tool_call_id, parameters } = client_tool_call;
5
6 try {
7 const result = await executeClientTool(tool_name, parameters);
8 // Send success response back to continue conversation
9 websocket.send({
10 type: "client_tool_result",
11 tool_call_id: tool_call_id,
12 result: result,
13 is_error: false
14 });
15 } catch (error) {
16 // Send error response if tool execution fails
17 websocket.send({
18 type: "client_tool_result",
19 tool_call_id: tool_call_id,
20 result: error.message,
21 is_error: true
22 });
23 }
24});

Event flow

Here’s a typical sequence of events during a conversation:

Best practices

  1. Error handling

    • Implement proper error handling for each event type
    • Log important events for debugging
    • Handle connection interruptions gracefully
  2. Audio management

    • Buffer audio chunks appropriately
    • Implement proper cleanup on interruption
    • Handle audio resource management
  3. Connection management

    • Respond to PING events promptly
    • Implement reconnection logic
    • Monitor connection health

Troubleshooting

  • Ensure proper WebSocket connection
  • Check PING/PONG responses
  • Verify API credentials
  • Check audio chunk handling
  • Verify audio format compatibility
  • Monitor memory usage
  • Log all events for debugging
  • Implement error boundaries
  • Check event handler registration

For detailed implementation examples, check our SDK documentation.

Built with