Question 1

What languages does Scribe support?

Accepted Answer

Excellent Accuracy (≤ 5% Word Error Rate - WER)
Bulgarian, Catalan, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Indonesian, Italian, Japanese, Kannada, Malay, Malayalam, Macedonian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swedish, Turkish, Ukrainian, Vietnamese

High Accuracy (>5% to ≤10% WER)
Bengali, Belarusian, Bosnian, Cantonese, Estonian, Filipino, Gujarati, Hungarian, Kazakh, Latvian, Lithuanian, Mandarin, Marathi, Nepali, Odia, Persian, Slovenian, Tamil, Telugu

Good (>10% to ≤25% WER)
Afrikaans, Arabic, Armenian, Assamese, Asturian, Azerbaijani, Burmese, Cebuano, Croatian, Georgian, Hausa, Hebrew, Icelandic, Javanese, Kabuverdianu, Korean, Kyrgyz, Lingala, Maltese, Mongolian, Māori, Occitan, Punjabi, Sindhi, Swahili, Tajik, Thai, Urdu, Uzbek, Welsh

Moderate (>25% to ≤50% WER)
Amharic, Chichewa, Fulah, Ganda, Igbo, Irish, Khmer, Kurdish, Lao, Luxembourgish, Luo, Northern Sotho, Pashto, Shona, Somali, Umbundu, Wolof, Xhosa, Zulu

Question 2

What is Cantonese speech to text and how does it work?

Accepted Answer

Speech to text is a technology that transcribes spoken Cantonese into written text using automatic speech recognition (ASR). It processes audio signals, identifies speech patterns, and transcribes them into text with high accuracy.

ElevenLabs' AI-powered speech to text software is designed to transcribe audio and video content with human-like precision, making it ideal for voice-to-text conversion, audio transcription, and real-time speech recognition.

speech to text technology is used in:
 ✔ Audio-to-text transcription for podcasts, meetings, and interviews.
 ✔ Captions and subtitles in video content.
 ✔ Voice-to-text software for hands-free typing and accessibility tools.

ElevenLabs ASR offers fast, reliable, and highly accurate speech to text conversion for multiple languages and accents.

Question 3

How do I transcribe Cantonese video to text?

Accepted Answer

ElevenLabs provides video transcription to transcribe spoken Cantonese dialogue into text format, making it easy to create subtitles, captions, and searchable transcripts.

Steps to transcribe video to text:
1. Upload your video file to ElevenLabs ASR
2. Speech recognition technology processes the audio
3. A transcript is generated automatically, with timestamps
4. Download the text file or export subtitles for editing.

This AI-powered video transcription model helps content creators, businesses, and educators quickly transcribe video speech into accurate text for accessibility and content repurposing.

Question 4

Does ElevenLabs support real-time speech-to-text conversion?

Accepted Answer

Scribe currently works well for use-cases where the input audio is available upfront. A low-latency, real-time version will be released soon.

Question 5

How much does Scribe cost?

Accepted Answer

$0.40 per hour of transcribed audio, falling well below this at scale with Enterprise plans.

Model	FLEURS
Scribe v1	5.9% WER
Deepgram Nova 2	19.3% WER
Gemini Flash 2	17.6% WER
Whisper Large v3	13.2% WER

Free Cantonese Speech to Text Transcription

Every word, perfectly captured

Cantonese Transcription Benchmark

Powerful Cantonese Audio to Text features for your app

Industry-leading accuracy

Smart speaker diarization

Precise word-level timestamps

Dynamic audio tagging

Global language support

Language Overview

Cantonese Language Information

Developers

Integrate ElevenLabs Scribe

AI Speech to Text transcription in 99 languages

Frequently asked questions