Meet Scribe

Written by: Tim von Känel; Flavio Schneider
Published: Feb 26, 2025

ListenListen to this article

0:00

0:000:00

Scribe, our first Speech to Text model, is the world’s most accurate transcription model. Built to handle the unpredictability of real-world audio, Scribe transcribes speech in 99 languages, featuring word-level timestamps, speaker diarization, and audio-event tagging—all delivered in a structured response for seamless integration.

Scribe is engineered for precision. In FLEURS & Common Voice benchmark tests across 99 languages, it consistently outperforms leading models like Gemini 2.0 Flash, Whisper Large V3 and Deepgram Nova-3. Whether it’s meeting summaries, movie subtitles, or even song lyrics, Scribe delivers the lowest automated transcription word error rate in Italian (98.7%), English (96.7%) and 97 other languages.

Scribe makes ASR universally accessible—dramatically reducing errors in traditionally underserved languages such as Serbian, Cantonese, and Malayalam, where competing models often exceed 40% word error rates.

The world's most accurate ASR model by IIElevenLabs.

Developers can integrate Scribe today via our Speech to Text API to get structured JSON transcripts with speaker diarization and word-level timestamps & non-speech event markers (e.g. laughter). A low-latency version for real-time applications will be released soon.

Creators and businesses can use Scribe directly via the ElevenLabs dashboard to upload audio or video files and generate formatted transcripts.

Start building with Scribe:

API Documentation | Try in the ElevenLabs Dashboard

Benchmarks

FLEURS - Word Error Rate % - 102 Languages

Bar chart comparing word error rates for different languages and speech recognition models.

Common Voice - Word Error Rate % - 102 Languages

Bar chart comparing word error rates for different voice recognition models across various countries.

Contributions

Research lead, training, architecture

Flavio Schneider

Project lead, pre-training data, fine-tuning data

Tim von Känel

Inference, Optimizations

Maximiliano Levi

Research Contributors

Johan Nordberg, Piotr Dabkowski

Frontend

Austin Malerba

Backend

Hristo Stoychev

Data Acquisition

Alex George

Meet Scribe

Benchmarks

FLEURS - Word Error Rate % - 102 Languages

Common Voice - Word Error Rate % - 102 Languages

Contributions

Similar articles

Meet Flash

Introducing Conversational AI Agents

Eleven v3 is Now Generally Available

Introducing Scribe v2