Keyterm prompting | ElevenLabs Documentation

How-to guide · Assumes you have completed the Speech to Text quickstart.

Overview

Keyterm prompting is available with the Scribe v2 model (batch and realtime) and comes at an additional cost. See the API pricing page for detailed pricing information.

Keyterm prompting is a feature that allows you to highlight words or phrases to bias the model towards transcribing them. This is useful for transcribing specific words or sentences that are not common in the audio, such as product names, names, or other specific terms. Keyterms are more powerful than biased keywords or customer vocabularies offered by other models, because it relies on the context to decide whether to transcribe that term or not.

	Batch (Scribe v2)	Realtime (Scribe v2 Realtime)
Max keyterms	1000	50
Max characters per keyterm	50	20

For example, if your company name is not a common phrase or has a unique spelling or pronunciation you can use keyterms to ensure the model transcribes correctly. Take the following audio:

Without keyterm prompting, the model might transcribe the above as:

I work at eleven labs.

Which uses the wrong style for the company name. With keyterm prompting, you can ensure the model transcribes the above with the correct spelling and style:

I work at ElevenLabs.

Context

The model is able to use context to determine whether a term should be transcribed or not. When providing the keyterm “ElevenLabs”, the above audio transcribes as expected, yet the model will still be able to transcribe the following correctly based on the context:

Which outputs the following transcription:

I've worked at many labs. In fact I've worked at eleven labs.

Batch transcription

Keyterm prompting is integrated into the batch Speech to Text API by passing the keyterms parameter to the convert method.

1 import os
2 from dotenv import load_dotenv
3 from io import BytesIO
4 import requests
5 from elevenlabs.client import ElevenLabs
6 
7 load_dotenv()
8 
9 elevenlabs = ElevenLabs(
10     api_key=os.getenv("ELEVENLABS_API_KEY"),
11 )
12 
13 audio_url = (
14     "https://storage.googleapis.com/eleven-public-cdn/documentation_assets/audio/stt-keyterm-prompting.mp3"
15 )
16 response = requests.get(audio_url)
17 audio_data = BytesIO(response.content)
18 
19 transcription = elevenlabs.speech_to_text.convert(
20     file=audio_data,
21     model_id="scribe_v2", # Model to use
22     # Keyterms to prompt the model with.
23     # Up to 1000 keyterms can be provided, with a maximum length of 50 characters each
24     keyterms=["ElevenLabs"],
25 )
26 
27 print(transcription)

Realtime streaming

Keyterm prompting is also available for the realtime Speech to Text WebSocket API. Pass the keyterms parameter when connecting.

1 connection = await elevenlabs.speech_to_text.realtime.connect(RealtimeUrlOptions(
2     model_id="scribe_v2_realtime",
3     keyterms=["ElevenLabs"],
4 ))

When using the WebSocket API directly, pass keyterms as query parameters:

wss://api.elevenlabs.io/v1/speech-to-text/realtime?model_id=scribe_v2_realtime&keyterms=ElevenLabs&keyterms=AnotherTerm

Next steps

API reference

Full Speech to Text API reference and parameters.

Entity detection

Automatically detect and label entities like names, dates, and locations in transcripts.