When will Scribe v2 Realtime be integrated to your Agents Platform?

Scribe v2 Realtime is integrated with the agents platform as an option, it won’t be the default model yet.

What is the concurrency limit for Scribe v2 Realtime?

It will be 30+ for enterprise clients. Similar to Turbo/Flash TTS.

Do you offer speaker diarization?

Some providers offering speaker diarization for realtime, like Deepgram, have major issues on non English languages. This isn’t a priority at the moment for a realtime model.

Will we support dual channels with Scribe v2 Realtime?

No, dual channel support is not planned.

Contact sales Log in

ElevenCreative

Realtime Speech to Text

Transcribe live speech instantly

Start transcribing Explore the docs

Scribe v2 Realtime is the most accurate real-time transcription model with 150ms latency across 90+ languages. Available via API.

Introducing Scribe v2 Realtime, built for speed and accuracy

Ultra-fast, ultra-accurate, and built for live speech. Scribe v2 Realtime delivers instant transcription for agents, meetings, and conversational AI.

High Accuracy

Trained on diverse global data and fine-tuned for natural speech, Scribe achieves industry-best Word Error Rates across major languages and accents.

Ultra-low Latency

Stream audio and receive transcriptions in ~150 ms, enabling real-time understanding for live agents, meetings, and conversational AI.

Real-time speech for agents, apps, and every language

Live call

I’m

happy

help.

What’s

your

address?

It’s

john.doe@me.com

Thanks.

And

your

phone

number?

1-800-404

Purpose-built for Agents and voice apps

Scribe v2 Realtime is purpose-built for developers creating conversational agents, meeting assistants, and voice applications where speed and accuracy are critical.

Japanese

Hindi

Polish

Swedish

Mandarin

Vietnamese

French

Capture speech accurately in 90 languages

Scribe v2 Realtime ensures consistent understanding everywhere, delivering exceptional accuracy across 90 languages, handling diverse accents, dialects, and acoustic conditions with ease.

Multiple audio formats

Supports PCM (8–48 kHz) and μ-law encoding for compatibility across telephony, browser, and studio setups.

Voice Activity Detection

Detects when speech starts and stops, segmenting audio precisely for smooth, efficient real-time transcription.

Manual Commit control

Gives developers control over when to finalize transcripts – ideal for custom streaming and fine-tuned accuracy.

Speech recognition engineered for real-time performance

Built on a new generation of models

Built on the foundation of Scribe v1, Scribe v2 Realtime delivers ~150 ms latency with breakthrough accuracy across accents, tones, and environments.

Scribe

makes

uses

has

new

Predictive transcription for low latency

Scribe v2 Realtime uses predictive transcription to anticipate the most probable next words and punctuation – enabling real-time accuracy.

Complex vocabulary

Built-in support for complex vocabulary including technical language, medications, and proper nouns.

Streaming support

Send audio in continuous chunks and receive live transcriptions instantly – no buffering, just real-time understanding.

Text conditioning

Scribe v2 Realtime continues transcription seamlessly, even when connection resets.

Enterprise-grade security and infrastructure at scale

Unmatched accuracy, even in the most complex environments

Natural Speech

Filler words, pauses and emotional cues

Low-quality Audio

Background noise or low-bandwidth audio

Accents

Diverse accents and pronunciations

Domain Terms

Acronyms, brands, financial or medical terms

Built for every workflow, from agents to production

ElevenLabs Agents

Power real-time voice interactions and conversational AI with instant, low-latency transcription. Scribe v2 Realtime enables agents to listen, understand, and respond faster than ever.

Create agent Explore docs

Scribe Realtime API

Integrate ultra-fast Speech-to-Text directly into your product with a simple WebSocket or REST API. Stream audio as it happens and receive accurate text in under 100 ms.

Explore docs