Create transcript

POST
/v1/speech-to-text

Transcribe an audio or video file.

This endpoint is currently only available to alpha users and is subject to change.

Request

This endpoint expects a multipart form containing a file.
filefileRequired
model_idstringRequired

The ID of the model to use for transcription, currently only ‘scribe_v1’ is available.

language_codestringOptional

An ISO-639-1 or ISO-639-3 language_code corresponding to the language of the audio file. Can sometimes improve transcription performance if known beforehand. Defaults to null, in this case the language is predicted automatically.

tag_audio_eventsbooleanOptionalDefaults to true

Whether to tag audio events like (laughter), (footsteps), etc. in the transcription.

num_speakersintegerOptional>=1<=31

The maximum amount of speakers talking in the uploaded file. Can help with predicting who speaks when. The maximum amount of speakers that can be predicted is 31. Defaults to null, in this case the amount of speakers is set to the maximum value the model supports.

Response

Successful Response

language_codestring

The detected language code (e.g. ‘eng’ for English).

language_probabilitydouble

The confidence score of the language detection (0 to 1).

textstring

The raw text of the transcription.

wordslist of objects

List of words with their timing information.

Errors

Built with