POST
/
v1
/
text-to-speech
/
{voice_id}
/
stream
curl --request POST \
  --url https://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream \
  --header 'Content-Type: application/json' \
  --data '{
  "text": "<string>",
  "model_id": "<string>",
  "voice_settings": {
    "stability": 123,
    "similarity_boost": 123,
    "style": 123,
    "use_speaker_boost": true
  },
  "pronunciation_dictionary_locators": [
    {
      "pronunciation_dictionary_id": "<string>",
      "version_id": "<string>"
    }
  ]
}'
This response has no body data.

Headers

xi-api-key
string

Your API key. This is required by most endpoints to access our API programatically. You can view your xi-api-key using the 'Profile' tab on the website.

Path Parameters

voice_id
string
required

Voice ID to be used, you can use https://api.elevenlabs.io/v1/voices to list all the available voices.

Query Parameters

optimize_streaming_latency
integer
default: 0

You can turn on latency optimizations at some cost of quality. The best possible final latency varies by model. Possible values: 0 - default mode (no latency optimizations) 1 - normal latency optimizations (about 50% of possible latency improvement of option 3) 2 - strong latency optimizations (about 75% of possible latency improvement of option 3) 3 - max latency optimizations 4 - max latency optimizations, but also with text normalizer turned off for even more latency savings (best latency, but can mispronounce eg numbers and dates).

Defaults to 0.

output_format
string
default: mp3_44100_128

Output format of the generated audio. Must be one of: mp3_22050_32 - output format, mp3 with 22.05kHz sample rate at 32kbps. mp3_44100_32 - output format, mp3 with 44.1kHz sample rate at 32kbps. mp3_44100_64 - output format, mp3 with 44.1kHz sample rate at 64kbps. mp3_44100_96 - output format, mp3 with 44.1kHz sample rate at 96kbps. mp3_44100_128 - default output format, mp3 with 44.1kHz sample rate at 128kbps. mp3_44100_192 - output format, mp3 with 44.1kHz sample rate at 192kbps. Requires you to be subscribed to Creator tier or above. pcm_16000 - PCM format (S16LE) with 16kHz sample rate. pcm_22050 - PCM format (S16LE) with 22.05kHz sample rate. pcm_24000 - PCM format (S16LE) with 24kHz sample rate. pcm_44100 - PCM format (S16LE) with 44.1kHz sample rate. Requires you to be subscribed to Pro tier or above. ulaw_8000 - μ-law format (sometimes written mu-law, often approximated as u-law) with 8kHz sample rate. Note that this format is commonly used for Twilio audio inputs.

Body

application/json
text
string
required

The text that will get converted into speech.

model_id
string
default: eleven_monolingual_v1

Identifier of the model that will be used, you can query them using GET /v1/models. The model needs to have support for text to speech, you can check this using the can_do_text_to_speech property.

voice_settings
object

Voice settings overriding stored setttings for the given voice. They are applied only on the given request.

pronunciation_dictionary_locators
object[]

A list of pronunciation dictionary locators (id, version_id) to be applied to the text. They will be applied in order. You may have up to 3 locators per request