A guide on how to transcribe audio with ElevenLabs

Text to Speech product feature

Overview

With speech to text, you can transcribe spoken audio into text with state of the art accuracy. With automatic language detection, you can transcribe audio in a multitude of languages.

Creating a transcript

1

Upload audio

In the ElevenLabs dashboard, navigate to the Speech to Text page and click the “Transcribe files” button. From the modal, you can upload an audio or video file to transcribe.

Speech to Text upload

2

Select options

Select the primary language of the audio and the maximum number of speakers. If you don’t know either, you can leave the defaults which will attempt to detect the language and number of speakers automatically.

Finally choose whether you wish to tag audio events like laughter or applause, then click the “Transcribe” button.

3

View results

Click on the name of the audio file you uploaded in the center pane to view the results. You can click on a word to start a playback of the audio at that point.

Click the “Export” button in the top right to download the results in a variety of formats.

Transcript Editor

1

Open transcript

In the ElevenLabs dashboard, navigate to the Speech to Text page and click any transcript to open the Transcript Editor.

Open transcript

2

Edit basic details

You can rename your transcript by clicking the name and typing a new one.

Rename transcript

3

Follow along with the audio

Click the play button in the bottom of the screen to start playing the audio. Our editor will automatically highlight the text to show you where you are.

Follow along

4

Select edit mode

Select Edit mode using the tabs in the top left. This reveals the editing features.

Edit mode

5

Edit Transcript

Click the pencil icon next to a transcribed segment to edit the text. When you click enter, our system will automatically update the timecodes for the segment under the hood.

Edit transcript

6

Manage Speakers

Our transcript editor comes with powerful features for managing speaker allocation.

Click the ‘Manage Speakers’ button, and you’ll see a list of speakers in the left pane. You can rename speakers, add new ones, and transfer lines attributed to one speaker to another.

Manage speakers

7

Split and merge segments

Select ‘adjust segments’ in the toolbar to switch to the segment editing mode.

This mode allows you to split and merge segments. This is useful if you want to edit the transcription only for a certain part of a segment, or assign a certain part of a segment to a different speaker.

Split and merge

FAQ

Supported languages

The Scribe v1 model supports 99 languages, including:

Afrikaans (afr), Amharic (amh), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Burmese (mya), Cantonese (yue), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Fulah (ful), Galician (glg), Ganda (lug), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Igbo (ibo), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kabuverdianu (kea), Kannada (kan), Kazakh (kaz), Khmer (khm), Korean (kor), Kurdish (kur), Kyrgyz (kir), Lao (lao), Latvian (lav), Lingala (lin), Lithuanian (lit), Luo (luo), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Maltese (mlt), Mandarin Chinese (cmn), Māori (mri), Marathi (mar), Mongolian (mon), Nepali (nep), Northern Sotho (nso), Norwegian (nor), Occitan (oci), Odia (ori), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Shona (sna), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Tajik (tgk), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Umbundu (umb), Urdu (urd), Uzbek (uzb), Vietnamese (vie), Welsh (cym), Wolof (wol), Xhosa (xho) and Zulu (zul).

Yes, the tool supports uploading both audio and video files. The maximum file size for either is 1GB.

Renaming speakers

Yes, you can rename speakers by clicking the “edit” button next to the “Speakers” label.

Built with