Create transcript

Transcribe an audio or video file.

Headers

xi-api-keystringRequired

Query parameters

enable_loggingbooleanOptionalDefaults to true

When enable_logging is set to false zero retention mode will be used for the request. This will mean history features are unavailable for this request, including request stitching. Zero retention mode may only be used by enterprise customers.

Request

This endpoint expects a multipart form.
model_idstringRequired

The ID of the model to use for transcription, currently only ‘scribe_v1’ and ‘scribe_v1_experimental’ are available.

filestringOptionalformat: "binary"

The file to transcribe. All major audio and video formats are supported. Exactly one of the file or cloud_storage_url parameters must be provided. The file size must be less than 1GB.

language_codestringOptional

An ISO-639-1 or ISO-639-3 language_code corresponding to the language of the audio file. Can sometimes improve transcription performance if known beforehand. Defaults to null, in this case the language is predicted automatically.

tag_audio_eventsbooleanOptionalDefaults to true

Whether to tag audio events like (laughter), (footsteps), etc. in the transcription.

num_speakersintegerOptional>=1<=32

The maximum amount of speakers talking in the uploaded file. Can help with predicting who speaks when. The maximum amount of speakers that can be predicted is 32. Defaults to null, in this case the amount of speakers is set to the maximum value the model supports.

timestamps_granularityenumOptionalDefaults to word

The granularity of the timestamps in the transcription. ‘word’ provides word-level timestamps and ‘character’ provides character-level timestamps per word.

Allowed values:
diarizebooleanOptionalDefaults to false

Whether to annotate which speaker is currently talking in the uploaded file.

additional_formatslist of objectsOptional

A list of additional formats to export the transcript to.

file_formatenumOptionalDefaults to other

The format of input audio. Options are ‘pcm_s16le_16’ or ‘other’ For pcm_s16le_16, the input audio must be 16-bit PCM at a 16kHz sample rate, single channel (mono), and little-endian byte order. Latency will be lower than with passing an encoded waveform.

Allowed values:
cloud_storage_urlstringOptional

The valid AWS S3 or Google Cloud Storage URL of the file to transcribe. Exactly one of the file or cloud_storage_url parameters must be provided. The file must be a valid publicly accessible cloud storage URL. The file size must be less than 2GB. URL can be pre-signed.

Response

Successful Response

language_codestring

The detected language code (e.g. ‘eng’ for English).

language_probabilitydouble

The confidence score of the language detection (0 to 1).

textstring

The raw text of the transcription.

wordslist of objects

List of words with their timing information.

additional_formatslist of optional objectsOptional

Requested additional formats of the transcript.

Errors