Keyterm prompting
How-to guide · Assumes you have completed the Speech to Text quickstart.
Overview
Keyterm prompting is available with the Scribe v2 model (batch and realtime) and comes at an additional cost. See the API pricing page for detailed pricing information.
Keyterm prompting is a feature that allows you to highlight words or phrases to bias the model towards transcribing them. This is useful for transcribing specific words or sentences that are not common in the audio, such as product names, names, or other specific terms. Keyterms are more powerful than biased keywords or customer vocabularies offered by other models, because it relies on the context to decide whether to transcribe that term or not.
For example, if your company name is not a common phrase or has a unique spelling or pronunciation you can use keyterms to ensure the model transcribes correctly. Take the following audio:
Without keyterm prompting, the model might transcribe the above as:
Which uses the wrong style for the company name. With keyterm prompting, you can ensure the model transcribes the above with the correct spelling and style:
Context
The model is able to use context to determine whether a term should be transcribed or not. When providing the keyterm “ElevenLabs”, the above audio transcribes as expected, yet the model will still be able to transcribe the following correctly based on the context:
Which outputs the following transcription:
Batch transcription
Keyterm prompting is integrated into the batch Speech to Text API by passing the keyterms parameter to the convert method.
Realtime streaming
Keyterm prompting is also available for the realtime Speech to Text WebSocket API. Pass the keyterms parameter when connecting.
When using the WebSocket API directly, pass keyterms as query parameters: