Scribe v2 Realtime: the most accurate model for live transcription
Scribe v2 Realtime sets a new standard for low-latency Speech to Text.
Designed for live use cases—voice agents, meeting assistants, and real-time captioning—it transcribes speech in under 150 ms across English, French, German, Italian, Spanish, and Portuguese, and 90 languages.
Scribe v2 Realtime is specifically built for agentic use cases. On 500 hard samples containing background noise and complex information, it significantly outperforms all other models.
Key features
- Negative latency: Next word and punctuation prediction
- Automatic language detection: Speak in any language, switch language mid conversation
- Text conditioning: Scribe v2 Realtime continues the transcription based on the previous batch, useful when restarting a connection
- Voice Activity Detection (VAD)
- Manual commit: Full control over when to finalize transcript segments
- Multiple audio formats: Support for PCM (48kHz) and μ-law encoding
- Enterprise ready with SOC 2, ISO 27001, PCI DSS L1, HIPAA, and GDPR compliance, EU and India data residency options and Zero retention mode for sensitive workloads
Scribe v2 Realtime delivers human-level understanding in real time, enabling natural conversation and immediate response in live environments. Scribe v2 Realtime achieves 93.5% accuracy across 30 commonly used European and Asian languages.
Build with the API
Scribe v2 Realtime is available today through the ElevenLabs API.
Explore the documentation: https://elevenlabs.io/docs/cookbooks/speech-to-text/streaming
Use Scribe v2 Realtime in ElevenLabs Agents
Deploy natural, human-sounding agents powered by Scribe v2 Realtime. Build voice assistants for support, sales, or in-product experiences that can understand and respond in real time.
Learn more: https://elevenlabs.io/agents
Start building today
Use Scribe v2 Realtime through our API or directly within ElevenLabs Agents.
Sign up here: https://elevenlabs.io/app/sign-up