Reducing Latency
Seven methods for reducing streaming latency, in order of highest to lowest effectiveness:
1. Use the Turbo v2.5 model
Our cutting-edge Eleven Turbo v2.5 is ideally suited for tasks demanding extremely low latency. The new turbo model_id is eleven_turbo_v2_5
.
2. Use the streaming API
ElevenLabs provides three text-to-speech endpoints:
- A regular text-to-speech endpoint
- A streaming text-to-speech endpoint
- A websockets text-to-speech endpoint
The regular endpoint renders the audio file before returning it in the response. The streaming endpoint streams back the audio as it is being generated, resulting in much lower response time from request to first byte of audio received. For applications that require low latency, the streaming endpoint is therefore recommended.
3. Use the input streaming Websocket
For applications where the text prompts can be streamed to the text-to-speech endpoints (such as LLM output), this allows for prompts to be fed to the endpoint while the speech is being generated. You can also configure the streaming chunk size when using the websocket, with smaller chunks generally rendering faster. As such, we recommend sending content word by word, our model and tooling leverages context to ensure that sentence structure and more are persisted to the generated audio even if we only receive a word at a time.
4. Update to the Enterprise plan
Enterprise customers receive top priority in the rendering queue, which ensures that they always experience the lowest possible latency, regardless of model usage load.
5. Use Default Voices, Synthetic Voices, & IVCs rather than PVCs
Based on our testing, we’ve observed that default voices (formerly called ‘premade’ voices), synthetic voices, and Instant Voice Clones tend to have lower latency compared to Professional Voice Clones. A Professional Voice Clone allows for a much more accurate clone of the original samples, albeit with a slight increase in latency at present. Unfortunately, it is a slight tradeoff and we’re currently working on optimizing the latency of the Professional Voice Clones for Turbo v2.5.
6. Reuse HTTPS Sessions When Streaming
When streaming through the websocket, reusing an established SSL/TLS session helps reduce latency by skipping the handshake process. This improves latency for all requests after the session’s first.
6a. Limit the Number of Websocket Connection Closures
Similarly, for websockets we leverage the WSS protocol and so an SSL/TLS handshake takes place at the beginning of a connection, which adds overhead. As such, we recommend to limit the number of times a connection is closed and reopened to the extent possible.
7. Leverage Servers Closer to the US
Today, our APIs are served from the US, and as such users may experience latency from increased network routing when communicating with these APIs outside of the United States.
Was this page helpful?