%20(2).webp&w=3840&q=80)
Speech to Text
Speech to Text
The most accurate Speech to Text models
Scribe v2 is the most accurate Speech to Text model. Scribe v2 Realtime sets the benchmark for live transcriptions - powering agents and real-time applications. Both available via API.
Real-time Speech to Text in under 150 ms with Scribe v2 Realtime
Scribe v2 Realtime uses ElevenLabs’ streaming-first architecture to turn live speech to text instantly, across 90+ languages.

Transcribe live speech
Scribe v2 Realtime captures live speech in under 150 ms with exceptional accuracy – built for agents, meetings, and AI Agents that demand instant understanding.
High accuracy and ultra-low latency
Scribe v2 Realtime delivers industry-leading accuracy with sub-150 ms latency, setting a new benchmark for real-time speech recognition.
Voice Activity Detection
Automatically detect when speech starts and stops, segmenting speech with precision for smoother live processing.
Transcribe in 90+ languages
Delivering exceptional accuracy across accents, dialects, and recording conditions.
Live in the API
Build Scribe Realtime v2 into your products with the API. With full-streaming support and commit control.
Convert speech to text, caption, and edit audio and video with Scribe v2
Create captions, subtitles, and editable transcripts for podcasts, videos, interviews, and other recorded content – all with industry-leading accuracy in Studio or via API.



Transcribe audio and video
Upload audio or video in any format — MP4, MOV, MP3, WAV, and more. Scribe v2 automatically converts speech into precise text, ready for captions, subtitles, or editing.
Industry leading transcription accuracy
Scribe v2 achieves industry-leading transcription accuracy, delivering clean, editable text even in challenging audio conditions or across diverse accents.
Keyterm prompting
Select up to 100 specific words or sentences for Scribe to accurately transcribe based on context.
Dynamic audio tagging
From laughter to footsteps, Scribe v2 tags every sound event, enriching your transcripts with the full context.
Speaker & entity detection
Scribe v2 intuitively distinguishes and labels every speaker and calculates entity timestamps.
Enterprise-grade security and infrastructure at scale

Built for every workflow, from API to agents
Speech to Text APIs and SDKs
Integrate Scribe v2 and Scribe v2 Realtime into your product with the API or SDKs.

ElevenLabs Agents
Enable real-time voice interactions with instant, low-latency transcription.
.webp&w=3840&q=100)
ElevenLabs Studio
Convert recordings into editable text, captions, and repurposable content.

Frequently asked questions
AI Speech to Text transcription across 90+ languages
Our AI speech to text transcription supports 90+ languages, just select the language and upload your audio file.

