
Realtime Speech to Text
Realtime Speech to Text
Transcribe live speech instantly
Scribe v2 Realtime is the most accurate real-time transcription model with 150ms latency across 90+ languages. Available via API.
Introducing Scribe v2 Realtime, built for speed and accuracy
Ultra-fast, ultra-accurate, and built for live speech. Scribe v2 Realtime delivers instant transcription for agents, meetings, and conversational AI.
High Accuracy
Trained on diverse global data and fine-tuned for natural speech, Scribe achieves industry-best Word Error Rates across major languages and accents.
Ultra-low Latency
Stream audio and receive transcriptions in ~150 ms, enabling real-time understanding for live agents, meetings, and conversational AI.
Real-time speech for agents, apps, and every language

Purpose-built for Agents and voice apps
Scribe v2 Realtime is purpose-built for developers creating conversational agents, meeting assistants, and voice applications where speed and accuracy are critical.
Capture speech accurately in 90 languages
Scribe v2 Realtime ensures consistent understanding everywhere, delivering exceptional accuracy across 90 languages, handling diverse accents, dialects, and acoustic conditions with ease.
Multiple audio formats
Supports PCM (8–48 kHz) and μ-law encoding for compatibility across telephony, browser, and studio setups.
Voice Activity Detection
Detects when speech starts and stops, segmenting audio precisely for smooth, efficient real-time transcription.
Manual Commit control
Gives developers control over when to finalize transcripts – ideal for custom streaming and fine-tuned accuracy.
Speech recognition engineered for real-time performance





Built on a new generation of models
Built on the foundation of Scribe v1, Scribe v2 Realtime delivers ~150 ms latency with breakthrough accuracy across accents, tones, and environments.
Predictive transcription for low latency
Scribe v2 Realtime uses predictive transcription to anticipate the most probable next words and punctuation – enabling real-time accuracy.
Complex vocabulary
Built-in support for complex vocabulary including technical language, medications, and proper nouns.
Streaming support
Send audio in continuous chunks and receive live transcriptions instantly – no buffering, just real-time understanding.
Text conditioning
Scribe v2 Realtime continues transcription seamlessly, even when connection resets.
Enterprise-grade security and infrastructure at scale
Enterprise-grade security and infrastructure at scale
Unmatched accuracy, even in the most complex environments

Natural Speech
Filler words, pauses and emotional cues

Low-quality audio
Background noise or low-bandwidth audio

Accents
Diverse accents and pronunciations

Domain terms
Acronyms, brands, financial or medical terms
Built for every workflow, from agents to production
ElevenLabs Agents
Power real-time voice interactions and conversational AI with instant, low-latency transcription. Scribe v2 Realtime enables agents to listen, understand, and respond faster than ever.

Scribe Realtime API
Integrate ultra-fast Speech-to-Text directly into your product with a simple WebSocket or REST API. Stream audio as it happens and receive accurate text in under 100 ms.

Flexible pricing based on your needs
Experience best-in-class accuracy and responsiveness with pricing designed to scale from startups to enterprise teams.
$0.28 per hour & lower
on annual Business plans


