
Meet Flash
You’ve never experienced human-like TTS this fast
Transcribe Speech to Text with the world's most accurate ASR model
Scribe, our first Speech to Text model, is the world’s most accurate transcription model. Built to handle the unpredictability of real-world audio, Scribe transcribes speech in 99 languages, featuring word-level timestamps, speaker diarization, and audio-event tagging—all delivered in a structured response for seamless integration.
Scribe is engineered for precision. In FLEURS & Common Voice benchmark tests across 99 languages, it consistently outperforms leading models like Gemini 2.0 Flash, Whisper Large V3 and Deepgram Nova-3. Whether it’s meeting summaries, movie subtitles, or even song lyrics, Scribe delivers the lowest automated transcription word error rate in Italian (98.7%), English (96.7%) and 97 other languages.
Scribe makes ASR universally accessible—dramatically reducing errors in traditionally underserved languages such as Serbian, Cantonese, and Malayalam, where competing models often exceed 40% word error rates.
Developers can integrate Scribe today via our Speech to Text API to get structured JSON transcripts with speaker diarization and word-level timestamps & non-speech event markers (e.g. laughter). A low-latency version for real-time applications will be released soon.
Creators and businesses can use Scribe directly via the ElevenLabs dashboard to upload audio or video files and generate formatted transcripts.
Start building with Scibe:
API Documentation | Try in the ElevenLabs Dashboard
Research lead, training, architecture
Flavio Schneider
Project lead, pre-training data, fine-tuning data
Tim von Känel
Inference, Optimizations
Maximiliano Levi
Research Contributors
Johan Nordberg, Piotr Dabkowski
Frontend
Austin Malerba
Backend
Hristo Stoychev
Data Acquisition
Alex George
You’ve never experienced human-like TTS this fast
Our all in one platform for building customizable, interactive voice agents