Client to server events
Send contextual information from the client to enhance conversational applications in real-time.
Client-to-server events are messages that your application proactively sends to the server to provide additional context during conversations. These events enable you to enhance the conversation with relevant information without interrupting the conversational flow.
For information on events the server sends to the client, see the Client events documentation.
Overview
Your application can send contextual information to the server to improve conversation quality and relevance at any point during the conversation. This does not have to be in response to a client event received from the server. This is particularly useful for sharing UI state, user actions, or other environmental data that may not be directly communicated through voice.
While our SDKs provide helper methods for sending these events, understanding the underlying protocol is valuable for custom implementations and advanced use cases.
Event types
Contextual updates
Contextual updates allow your application to send non-interrupting background information to the conversation.
Key characteristics:
- Updates are incorporated as background information in the conversation.
- Does not interrupt the current conversation flow.
- Useful for sending UI state, user actions, or environmental data.
User messages
User messages allow you to send text directly to the conversation as if the user had spoken it. This is useful for text-based interactions or when you want to inject specific text into the conversation flow.
Key characteristics:
- Text is processed as user input to the conversation.
- Triggers the same response flow as spoken user input.
- Useful for text-based interfaces or programmatic user input.
User activity
User activity events serve as indicators to prevent interrupts from the agent.
Key characteristics:
- Resets the turn timeout timer.
- Does not affect conversation content or flow.
- Useful for maintaining long-running conversations during periods of silence.
Best practices
-
Contextual updates
- Send relevant but concise contextual information.
- Avoid overwhelming the LLM with too many updates.
- Focus on information that impacts the conversation flow or is important context from activity in a UI not accessible to the voice agent.
-
User messages
- Use for text-based user input when audio is not available or appropriate.
- Ensure text content is clear and well-formatted.
- Consider the conversation context when injecting programmatic messages.
-
User activity
- Send activity pings during periods of user interaction to maintain session.
- Use reasonable intervals (e.g., 30-60 seconds) to avoid unnecessary network traffic.
- Implement activity detection based on actual user engagement (mouse movement, typing, etc.).
-
Timing considerations
- Send updates at appropriate moments.
- Consider grouping multiple contextual updates into a single update (instead of sending every small change separately).
- Balance between keeping the session alive and avoiding excessive messaging.
For detailed implementation examples, check our SDK documentation.