Client to server events

Send contextual information from the client to enhance conversational applications in real-time.

Client-to-server events are messages that your application proactively sends to the server to provide additional context during conversations. These events enable you to enhance the conversation with relevant information without interrupting the conversational flow.

For information on events the server sends to the client, see the Client events documentation.

Overview

Your application can send contextual information to the server to improve conversation quality and relevance at any point during the conversation. This does not have to be in response to a client event received from the server. This is particularly useful for sharing UI state, user actions, or other environmental data that may not be directly communicated through voice.

While our SDKs provide helper methods for sending these events, understanding the underlying protocol is valuable for custom implementations and advanced use cases.

Event types

Contextual updates

Contextual updates allow your application to send non-interrupting background information to the conversation.

Key characteristics:

  • Updates are incorporated as background information in the conversation.
  • Does not interrupt the current conversation flow.
  • Useful for sending UI state, user actions, or environmental data.
1// Contextual update event structure
2{
3 "type": "contextual_update",
4 "text": "User appears to be looking at pricing page"
5}
1// Example sending contextual updates
2function sendContextUpdate(information) {
3 websocket.send(
4 JSON.stringify({
5 type: 'contextual_update',
6 text: information,
7 })
8 );
9}
10
11// Usage examples
12sendContextUpdate('Customer status: Premium tier');
13sendContextUpdate('User navigated to Help section');
14sendContextUpdate('Shopping cart contains 3 items');

User messages

User messages allow you to send text directly to the conversation as if the user had spoken it. This is useful for text-based interactions or when you want to inject specific text into the conversation flow.

Key characteristics:

  • Text is processed as user input to the conversation.
  • Triggers the same response flow as spoken user input.
  • Useful for text-based interfaces or programmatic user input.
1// User message event structure
2{
3 "type": "user_message",
4 "text": "I would like to upgrade my account"
5}
1// Example sending user messages
2function sendUserMessage(text) {
3 websocket.send(
4 JSON.stringify({
5 type: 'user_message',
6 text: text,
7 })
8 );
9}
10
11// Usage examples
12sendUserMessage('I need help with billing');
13sendUserMessage('What are your pricing options?');
14sendUserMessage('Cancel my subscription');

User activity

User activity events serve as indicators to prevent interrupts from the agent.

Key characteristics:

  • Resets the turn timeout timer.
  • Does not affect conversation content or flow.
  • Useful for maintaining long-running conversations during periods of silence.
1// User activity event structure
2{
3 "type": "user_activity"
4}
1// Example sending user activity
2function sendUserActivity() {
3 websocket.send(
4 JSON.stringify({
5 type: 'user_activity',
6 })
7 );
8}
9
10// Usage example - send activity ping every 30 seconds
11setInterval(sendUserActivity, 30000);

Best practices

  1. Contextual updates

    • Send relevant but concise contextual information.
    • Avoid overwhelming the LLM with too many updates.
    • Focus on information that impacts the conversation flow or is important context from activity in a UI not accessible to the voice agent.
  2. User messages

    • Use for text-based user input when audio is not available or appropriate.
    • Ensure text content is clear and well-formatted.
    • Consider the conversation context when injecting programmatic messages.
  3. User activity

    • Send activity pings during periods of user interaction to maintain session.
    • Use reasonable intervals (e.g., 30-60 seconds) to avoid unnecessary network traffic.
    • Implement activity detection based on actual user engagement (mouse movement, typing, etc.).
  4. Timing considerations

    • Send updates at appropriate moments.
    • Consider grouping multiple contextual updates into a single update (instead of sending every small change separately).
    • Balance between keeping the session alive and avoiding excessive messaging.

For detailed implementation examples, check our SDK documentation.