The ElevenLabs API provides the ability to stream responses back to a client in order to allow partial results for certain requests. To achieve this, we follow the Server-sent events standard. Our official Node and Python libraries include helpers to make parsing these events simpler.

Streaming is supported for the Text to Speech API, Voice Changer API & Audio Isolation API. This section focuses on how streaming works for requests made to the Text to Speech API.

In Python, a streaming request looks like:

from elevenlabs import stream
from elevenlabs.client import ElevenLabs

client = ElevenLabs()

audio_stream = client.text_to_speech.convert_as_stream(
    text="This is a test",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2"
)

# option 1: play the streamed audio locally
stream(audio_stream)

# option 2: process the audio bytes manually
for chunk in audio_stream:
    if isinstance(chunk, bytes):
        print(chunk)

In Node / Typescript, a streaming request looks like:

import { ElevenLabsClient, stream } from "elevenlabs";
import { Readable } from "stream";

const client = new ElevenLabsClient();

async function main() {
  const audioStream = await client.textToSpeech.convertAsStream(
    "JBFqnCBsd6RMkjVDRZzb",
    {
      text: "This is a test",
      model_id: "eleven_multilingual_v2",
    }
  );

  // option 1: play the streamed audio locally
  await stream(Readable.from(audioStream));

  // option 2: process the audio manually
  for await (const chunk of audioStream) {
    console.log(chunk);
  }
}

main();