Client-side streaming
Client-side streaming
Client-side streaming
How-to guide ยท Assumes you have completed the Speech to Text quickstart.
The ElevenLabs Realtime Speech to Text API enables you to transcribe audio streams in real-time with ultra-low latency using the Scribe Realtime v2 model. Whether youโre building voice assistants, transcription services, or any application requiring live speech recognition, this WebSocket-based API delivers partial transcripts as you speak and committed transcripts when speech segments are complete.
Scribe v2 Realtime can be implemented on the client side to transcribe audio in realtime, either via the microphone or manually chunking the audio.
The client side implementation differs from server side in a few ways:
For streaming audio from a URL, see the Server-side streaming guide.
This guide assumes you have set up your API key. Complete the quickstart first if you havenโt.
To use the client side SDK, you need to create a single use token. This is a temporary token that can be used to connect to the API without exposing your API key. This can be done via the ElevenLabs API on the server side.
Never expose your API key to the client.
A single use token automatically expires after 15 minutes.
Transcription can be done either via the microphone or manually chunking your own audio. Your own audio can be a file or a stream.
For a full list of parameters and options the API supports, please refer to the API reference.