Our new, fastest model generates speech at ≈400ms latency and is over twice as fast as our V1 models. It also doesn't compromise on quality which stays on par with Multilingual V2.
For users of VoIP services, we now also support mulaw 8khz output with an even greater speed boost. See our API documentation to learn more.
We're working on adding multilingual support to Turbo v2, as well.
If you need help with integration or would like to speak about scale and support, feel free to contact our sales team.
Create human-like voices with our Text to Speech (TTS) system, built for high-quality narration, gaming, video, and accessibility. Expressive voices, multilingual support, and API integration make it easy to scale from personal projects to enterprise workflows.