Découvrez Scribe

Transcrivez la parole en texte avec le modèle ASR le plus précis au monde

Introducing IIscribe V1, the world's most accurate speech-to-text model.

Scribe, our first Speech to Text model, is the world’s most accurate transcription model. Built to handle the unpredictability of real-world audio, Scribe transcribes speech in 99 languages, featuring word-level timestamps, speaker diarization, and audio-event tagging—all delivered in a structured response for seamless integration.

Scribe is engineered for precision. In FLEURS & Common Voice benchmark tests across 99 languages, it consistently outperforms leading models like Gemini 2.0 Flash, Whisper Large V3 and Deepgram Nova-3. Whether it’s meeting summaries, movie subtitles, or even song lyrics, Scribe delivers the lowest automated transcription word error rate in Italian (98.7%), English (96.7%) and 97 other languages.

Scribe makes ASR universally accessible—dramatically reducing errors in traditionally underserved languages such as Serbian, Cantonese, and Malayalam, where competing models often exceed 40% word error rates.

The world's most accurate ASR model by IIElevenLabs.

Developers can integrate Scribe today via our Speech to Text API to get structured JSON transcripts with speaker diarization and word-level timestamps & non-speech event markers (e.g. laughter). A low-latency version for real-time applications will be released soon.

Creators and businesses can use Scribe directly via the ElevenLabs dashboard to upload audio or video files and generate formatted transcripts.

Commencez à créer avec Scribe :

API Documentation | Try in the ElevenLabs Dashboard

Benchmarks

FLEURS - Word Error Rate % - 102 Languages

Bar chart comparing word error rates for different languages and speech recognition models.

Common Voice - Word Error Rate % - 102 Languages

Bar chart comparing word error rates for different voice recognition models across various countries.

Contributions

Research lead, training, architecture

Flavio Schneider

Project lead, pre-training data, fine-tuning data

Tim von Känel

Inference, Optimizations

Maximiliano Levi

Research Contributors

Johan Nordberg, Piotr Dabkowski

Frontend

Austin Malerba

Backend

Hristo Stoychev

Data Acquisition

Alex George

En voir plus

Recherche
Text on a gray gradient background introducing IIFlash v2.5, highlighting 75ms model latency and support for 32 languages.

Découvrez Flash

Vous n'avez jamais expérimenté un TTS aussi rapide et de type humain

ElevenLabs

Créez avec l'audio AI de la plus haute qualité.

Se lancer gratuitement

Vous avez déjà un compte ? Se connecter