WebSocket

The Text-to-Speech WebSockets API is designed to generate audio from partial text input while ensuring consistency throughout the generated audio. Although highly flexible, the WebSockets API isn't a one-size-fits-all solution. It's well-suited for scenarios where: * The input text is being streamed or generated in chunks. * Word-to-audio alignment information is required. However, it may not be the best choice when: * The entire input text is available upfront. Given that the generations are partial, some buffering is involved, which could potentially result in slightly higher latency compared to a standard HTTP request. * You want to quickly experiment or prototype. Working with WebSockets can be harder and more complex than using a standard HTTP API, which might slow down rapid development and testing.