Server-side streaming
Server-side streaming
Server-side streaming
How-to guide ยท Assumes you have completed the Speech to Text quickstart.
The ElevenLabs Realtime Speech to Text API enables you to transcribe audio streams in real-time with ultra-low latency using the Scribe Realtime v2 model. Whether youโre building voice assistants, transcription services, or any application requiring live speech recognition, this WebSocket-based API delivers partial transcripts as you speak and committed transcripts when speech segments are complete.
Scribe v2 Realtime can be implemented on the server side to transcribe audio in realtime, either via a URL, file or your own audio stream.
The server side implementation differs from client side in a few ways:
For streaming audio directly from the microphone, see the Client-side streaming guide.
This guide assumes you have set up your API key and SDK. Complete the quickstart first if you havenโt.
The SDK provides two ways to transcribe audio in realtime: streaming from a URL or manually chunking the audio from either a file or your own audio stream.
For a full list of parameters and options the API supports, please refer to the API reference.
This example shows how to stream an audio file from a URL using the official SDK.
The ffmpeg tool is required when streaming from an URL. Visit their website for installation instructions.
Create a new file named example.py or example.mts, depending on your language of choice and add the following code: