For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Connect
BlogHelp CenterAPI PricingSign up
OverviewElevenCreativeElevenAgentsElevenAPIReception AIAPI referenceChangelog
OverviewElevenCreativeElevenAgentsElevenAPIReception AIAPI referenceChangelog
  • Get started
    • Quickstart
    • Agents Quickstart
    • Choosing the right model
  • Tutorials
    • Text to Speech
    • Speech to Text
    • Speech Engine
    • Music
    • Text to Dialogue
    • Voice Changer
    • Voice Isolator
    • Dubbing
    • Sound effects
    • Forced Alignment
  • Concepts
    • Understanding audio streaming
    • Understanding latency
    • Voice cloning
  • How-to guides
        • Client-side streaming
        • Server-side streaming
        • Transcripts and commit strategies
        • Event reference
  • Reference
    • Libraries & SDKs
    • Errors
    • Agent tooling
    • Webhooks
    • Zero Retention Mode
    • Breaking changes policy
    • UI components
    • Example projects
    • Next.js template
    • Showcase
  • Private deployment
    • Overview
LogoLogo
Login
Login
Connect
BlogHelp CenterAPI PricingSign up
On this page
  • Overview
  • Quickstart
  • Next steps
How-to guidesSpeech to TextRealtime

Server-side streaming

This guide shows you how to transcribe audio in real time on the server side using ElevenLabs.
Was this page helpful?
Previous

Transcripts and commit strategies

This guide shows you how to handle transcripts and commit strategies with the ElevenLabs Realtime Speech to Text API.
Next
Built with

How-to guide ยท Assumes you have completed the Speech to Text quickstart.

Overview

The ElevenLabs Realtime Speech to Text API enables you to transcribe audio streams in real-time with ultra-low latency using the Scribe Realtime v2 model. Whether youโ€™re building voice assistants, transcription services, or any application requiring live speech recognition, this WebSocket-based API delivers partial transcripts as you speak and committed transcripts when speech segments are complete.

Scribe v2 Realtime can be implemented on the server side to transcribe audio in realtime, either via a URL, file or your own audio stream.

The server side implementation differs from client side in a few ways:

  • Uses an ElevenLabs API key instead of a single use token.
  • Supports streaming from a URL directly, without the need to manually chunk the audio.

For streaming audio directly from the microphone, see the Client-side streaming guide.

Quickstart

This guide assumes you have set up your API key and SDK. Complete the quickstart first if you havenโ€™t.

1

Configure the SDK

The SDK provides two ways to transcribe audio in realtime: streaming from a URL or manually chunking the audio from either a file or your own audio stream.

For a full list of parameters and options the API supports, please refer to the API reference.

Stream from URL
Manual audio chunking

This example shows how to stream an audio file from a URL using the official SDK.

The ffmpeg tool is required when streaming from an URL. Visit their website for installation instructions.

Create a new file named example.py or example.mts, depending on your language of choice and add the following code:

1from dotenv import load_dotenv
2import os
3import asyncio
4from elevenlabs import ElevenLabs, RealtimeEvents, RealtimeUrlOptions
5
6load_dotenv()
7
8async def main():
9 elevenlabs = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
10
11 # Create an event to signal when to stop
12 stop_event = asyncio.Event()
13
14 # Connect to a streaming audio URL
15 connection = await elevenlabs.speech_to_text.realtime.connect(RealtimeUrlOptions(
16 model_id="scribe_v2_realtime",
17 url="https://npr-ice.streamguys1.com/live.mp3",
18 include_timestamps=True,
19 ))
20
21 # Set up event handlers
22 def on_session_started(data):
23 print(f"Session started: {data}")
24
25 def on_partial_transcript(data):
26 print(f"Partial: {data.get('text', '')}")
27
28 def on_committed_transcript(data):
29 print(f"Committed: {data.get('text', '')}")
30
31 # Committed transcripts with word-level timestamps. Only received when include_timestamps is set to True.
32 def on_committed_transcript_with_timestamps(data):
33 print(f"Committed with timestamps: {data.get('words', '')}")
34
35 # Errors - will catch all errors, both server and websocket specific errors
36 def on_error(error):
37 print(f"Error: {error}")
38 # Signal to stop on error
39 stop_event.set()
40
41 def on_close():
42 print("Connection closed")
43
44 # Register event handlers
45 connection.on(RealtimeEvents.SESSION_STARTED, on_session_started)
46 connection.on(RealtimeEvents.PARTIAL_TRANSCRIPT, on_partial_transcript)
47 connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, on_committed_transcript)
48 connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT_WITH_TIMESTAMPS, on_committed_transcript_with_timestamps)
49 connection.on(RealtimeEvents.ERROR, on_error)
50 connection.on(RealtimeEvents.CLOSE, on_close)
51
52 print("Transcribing audio stream... (Press Ctrl+C to stop)")
53
54 try:
55 # Wait until error occurs or connection closes
56 await stop_event.wait()
57 except KeyboardInterrupt:
58 print("\nStopping transcription...")
59 finally:
60 await connection.close()
61
62if __name__ == "__main__":
63 asyncio.run(main())
2

Execute the code

1python example.py

You will see the transcription of the audio file printed to the console in partial and committed transcripts.

Next steps

Transcripts and commit strategies

Control when transcripts are committed and how to handle partial results.

Event reference

Full list of events and error types from the realtime STT API.