1. Use the Turbo v2 model
Our cutting-edge Eleven Turbo v2 is ideally suited for tasks demanding extremely low latency. The new turbo model_id is
2. Use the streaming API
ElevenLabs provides three text-to-speech endpoints:
- A regular text-to-speech endpoint
- A streaming text-to-speech endpoint
- A websockets text-to-speech endpoint
The regular endpoint renders the audio file before returning it in the response. The streaming endpoint streams back the audio as it is being generated, resulting in much lower response time from request to first byte of audio received. For applications that require low latency, the streaming endpoint is therefore recommended.
3. Use the input streaming Websocket
For applications where the text prompts can be streamed to the text-to-speech endpoints (such as LLM output), this allows for prompts to be fed to the endpoint while the speech is being generated. You can also configure the streaming chunk size when using the websocket, with smaller chunks generally rendering faster. As such, we recommend sending content word by word, our model and tooling leverages context to ensure that sentence structure and more are persisted to the generated audio even if we only receive a word at a time.
4. Increase the
optimize_streaming_latency query parameter
This query parameter for the streaming and websockets endpoints configure the rendering process to trade off some audio quality in favor of reduced latency.
5. Update to the Enterprise plan
Enterprise customers receive top priority in the rendering queue, which ensures that they always experience the lowest possible latency, regardless of model usage load.
6. Use Premade and Synthetic Voices rather than Voice Clones
Premade and Synthetic voices generate speech faster than instant voice clones. Professional Voice Clones have the highest latency of all voice types, and are not recommended for low latency applications.
7. Reuse HTTPS Sessions When Streaming
When streaming through the websocket, reusing an established SSL/TLS session helps reduce latency by skipping the handshake process. This improves latency for all requests after the session’s first.
7a. Limit the Number of Websocket Connection Closures
Similarly, for websockets we leverage the WSS protocol and so an SSL/TLS handshake takes place at the beginning of a connection, which adds overhead. As such, we recommend to limit the number of times a connection is closed and reopened to the extent possible.
8. Leverage Servers Closer to the US
Today, our APIs are served from the US, and as such users may experience latency from increased network routing when communicating with these APIs outside of the United States.