.webp&w=3840&q=75)
Speech to Text
Speech to Text
The most accurate Speech to Text models
Scribe is the most accurate Speech to Text model. Scribe v2 Realtime sets the benchmark for live transcriptions - powering agents and real-time applications. Both available via API.
Transcribe live speech in under 150ms with Scribe v2 Realtime
Scribe v2 Realtime uses ElevenLabs’ streaming-first architecture to turn live speech into text instantly, across 90 languages.

Transcribe live speech
Scribe v2 Realtime captures live speech in under 150 ms with exceptional accuracy – built for agents, meetings, and AI Agents that demand instant understanding.
High accuracy and ultra-low latency
Scribe v2 Realtime delivers industry-leading accuracy with sub-150 ms latency, setting a new benchmark for real-time speech recognition.
Voice Activity Detection
Automatically detect when speech starts and stops, segmenting speech with precision for smoother live processing.
Transcribe in 90 languages
Delivering exceptional accuracy across accents, dialects, and recording conditions.
Live in the API
Build Scribe Realtime v2 into your products with the API. With full-streaming support and commit control.
Transcribe, caption and edit audio and video content with Scribe v1
Create captions, subtitles, and editable transcripts for podcasts, videos, interviews, and other recorded content – all with industry-leading accuracy in Studio or via API.



Transcribe audio and video
Upload audio or video in any format — MP4, MOV, MP3, WAV, and more. Scribe v1 automatically converts speech into precise text, ready for captions, subtitles, or editing.
Over 95% transcription accuracy
Scribe achieves industry-leading transcription accuracy, delivering clean, editable text even in challenging audio conditions or across diverse accents.
Powerful transcription tools
Edit and finalize the transcripts directly in ElevenLabs or use our managed services team, to get to 100% accuracy.
Dynamic audio tagging
From laughter to footsteps, Scribe tags every sound event, enriching your transcripts with the full context.
Smart speaker diarization
In any conversation, even the busiest ones, Scribe intuitively distinguishes and labels every speaker.
Enterprise-grade security and infrastructure at scale
Enterprise-grade security and infrastructure at scale
Built for every workflow, from API to agents
Speech to Text APIs and SDKs
Integrate Scribe v1 and Scribe v2 Realtime into your product with the API or SDKs.

ElevenLabs Agents
Enable real-time voice interactions with instant, low-latency transcription.
.webp&w=3840&q=100)
ElevenLabs Studio
Convert recordings into editable text, captions, and repurposable content.

Frequently asked questions
AI Speech to Text transcription in 99 languages
Our AI speech to text transcription supports 99 languages, just select the language and upload your audio file.
.webp&w=3840&q=75)
