Speech to Text
A guide on how to transcribe audio with ElevenLabs

Overview
With speech to text, you can transcribe spoken audio into text with state of the art accuracy. With automatic language detection, you can transcribe audio in a multitude of languages.
Creating a transcript
Upload audio
In the ElevenLabs dashboard, navigate to the Speech to Text page and click the “Transcribe files” button. From the modal, you can upload an audio or video file to transcribe.
Select options
Select the primary language of the audio and the maximum number of speakers. If you don’t know either, you can leave the defaults which will attempt to detect the language and number of speakers automatically.
Finally choose whether you wish to tag audio events like laughter or applause, then click the “Transcribe” button.
Transcript Editor
Open transcript
In the ElevenLabs dashboard, navigate to the Speech to Text page and click any transcript to open the Transcript Editor.
Follow along with the audio
Click the play button in the bottom of the screen to start playing the audio. Our editor will automatically highlight the text to show you where you are.
Edit Transcript
Click the pencil icon next to a transcribed segment to edit the text. When you click enter, our system will automatically update the timecodes for the segment under the hood.
Manage Speakers
Our transcript editor comes with powerful features for managing speaker allocation.
Click the ‘Manage Speakers’ button, and you’ll see a list of speakers in the left pane. You can rename speakers, add new ones, and transfer lines attributed to one speaker to another.
Split and merge segments
Select ‘adjust segments’ in the toolbar to switch to the segment editing mode.
This mode allows you to split and merge segments. This is useful if you want to edit the transcription only for a certain part of a segment, or assign a certain part of a segment to a different speaker.
FAQ
What languages are supported?
Supported languages
The Scribe v1 model supports 99 languages, including:
Afrikaans (afr), Amharic (amh), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Burmese (mya), Cantonese (yue), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Fulah (ful), Galician (glg), Ganda (lug), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Igbo (ibo), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kabuverdianu (kea), Kannada (kan), Kazakh (kaz), Khmer (khm), Korean (kor), Kurdish (kur), Kyrgyz (kir), Lao (lao), Latvian (lav), Lingala (lin), Lithuanian (lit), Luo (luo), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Maltese (mlt), Mandarin Chinese (cmn), Māori (mri), Marathi (mar), Mongolian (mon), Nepali (nep), Northern Sotho (nso), Norwegian (nor), Occitan (oci), Odia (ori), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Shona (sna), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Tajik (tgk), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Umbundu (umb), Urdu (urd), Uzbek (uzb), Vietnamese (vie), Welsh (cym), Wolof (wol), Xhosa (xho) and Zulu (zul).
Can I upload video files?
Yes, the tool supports uploading both audio and video files. The maximum file size for either is 1GB.
Can I rename speakers?
Renaming speakers
Yes, you can rename speakers by clicking the “edit” button next to the “Speakers” label.