Using pronunciation dictionaries

Learn how to manage pronunciation dictionaries programmatically.

Overview

Pronunciation dictionaries allow you to customize how your AI agent pronounces specific words or phrases. This is particularly useful for:

  • Correcting pronunciation of names, places, or technical terms
  • Ensuring consistent pronunciation across conversations
  • Customizing regional pronunciation variations

ElevenLabs supports both IPA and CMU alphabets.

Phoneme tags only work with eleven_flash_v2, eleven_turbo_v2 & eleven_monolingual_v1 models. If you use phoneme tags with other models, they will silently skip the word.

Phoneme tags (IPA/CMU) only work for English. For other languages, use Alias tags instead, which replace words with alternative spellings or phrases that produce the desired pronunciation.

Quickstart

1

Create an API key

Create an API key in the dashboard here, which you’ll use to securely access the API.

Store the key as a managed secret and pass it to the SDKs either as a environment variable via an .env file, or directly in your app’s configuration depending on your preference.

.env
1ELEVENLABS_API_KEY=<your_api_key_here>
2

Install the SDK

We’ll also use the dotenv library to load our API key from an environment variable.

1pip install elevenlabs
2pip install python-dotenv
3

Create a pronunciation dictionary file

In this example, we will create a pronunciation dictionary file for the word tomato.

This rule will use the “IPA” alphabet and update the pronunciation for tomato and Tomato with a different pronunciation. PLS files are case sensitive which is why we include it both with and without a capital “T”.

You can use AI tools like Claude or ChatGPT to help generate IPA or CMU notations for specific words.

dictionary.pls
1<?xml version="1.0" encoding="UTF-8"?>
2<lexicon version="1.0"
3 xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
4 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5 xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
6 http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
7 alphabet="ipa" xml:lang="en-US">
8<lexeme>
9 <grapheme>tomato</grapheme>
10 <phoneme>/tə'meɪtoʊ/</phoneme>
11</lexeme>
12<lexeme>
13 <grapheme>Tomato</grapheme>
14 <phoneme>/tə'meɪtoʊ/</phoneme>
15</lexeme>
16</lexicon>
4

Create a pronunciation dictionary from a file via the SDK

Create a new file named example.py or example.mts, depending on your language of choice and add the following code:

1import requests
2from elevenlabs.play import play, PronunciationDictionaryVersionLocator
3
4with open("dictionary.pls", "rb") as f:
5 # this dictionary changes how tomato is pronounced
6 pronunciation_dictionary = elevenlabs.pronunciation_dictionaries.create_from_file(
7 file=f.read(), name="example"
8 )
9
10audio_1 = elevenlabs.text_to_speech.convert(
11 text="Without the dictionary: tomato",
12 voice_id="aMSt68OGf4xUZAnLpTU8",
13 model_id="eleven_turbo_v2",
14)
15
16audio_2 = elevenlabs.text_to_speech.convert(
17 text="With the dictionary: tomato",
18 voice_id="aMSt68OGf4xUZAnLpTU8",
19 model_id="eleven_turbo_v2",
20 pronunciation_dictionary_locators=[
21 PronunciationDictionaryVersionLocator(
22 pronunciation_dictionary_id=pronunciation_dictionary.id,
23 version_id=pronunciation_dictionary.version_id,
24 )
25 ],
26)
27
28# play the audio
29play(audio_1)
30play(audio_2)
5

Execute the code

1python example.py

You should hear two versions of the audio playing through your speakers, one with and one without the pronunciation dictionary.

Next steps

To learn more about pronunciation dictionaries, please refer to the API reference.