Realtime Speech to Text

Transcribe live speech instantly

Scribe v2 Realtime is the most accurate real-time transcription model with 150ms latency across 90+ languages. Available via API.

Introducing Scribe v2 Realtime, built for speed and accuracy

Ultra-fast, ultra-accurate, and built for live speech. Scribe v2 Realtime delivers instant transcription for agents, meetings, and conversational AI.

High Accuracy

Trained on diverse global data and fine-tuned for natural speech, Scribe achieves industry-best Word Error Rates across major languages and accents.

Scribe beats all competing models in accuracy benchmarks

Ultra-low Latency

Stream audio and receive transcriptions in ~150 ms, enabling real-time understanding for live agents, meetings, and conversational AI.

Real-time speech for agents, apps, and every language

Scribe Card background
Live call
I’m
happy
to
help.
What’s
your
email
address?
It’s
john.doe@me.com
Thanks.
And
your
phone
number?
1-800-404

Purpose-built for Agents and voice apps

Scribe v2 Realtime is purpose-built for developers creating conversational agents, meeting assistants, and voice applications where speed and accuracy are critical.

Japanese
Hindi
Polish
Swedish
Mandarin
Vietnamese
French

Capture speech accurately in 90 languages

Scribe v2 Realtime ensures consistent understanding everywhere, delivering exceptional accuracy across 90 languages, handling diverse accents, dialects, and acoustic conditions with ease.

Multiple audio formats

Supports PCM (8–48 kHz) and μ-law encoding for compatibility across telephony, browser, and studio setups.

Voice Activity Detection

Detects when speech starts and stops, segmenting audio precisely for smooth, efficient real-time transcription.

Manual Commit control

Gives developers control over when to finalize transcripts – ideal for custom streaming and fine-tuned accuracy.

Speech recognition engineered for real-time performance

V2
V1
V2

Built on a new generation of models

Built on the foundation of Scribe v1, Scribe v2 Realtime delivers ~150 ms latency with breakthrough accuracy across accents, tones, and environments.

Scribe
makes
uses
is
has
new

Predictive transcription for low latency

Scribe v2 Realtime uses predictive transcription to anticipate the most probable next words and punctuation – enabling real-time accuracy.

Complex vocabulary

Built-in support for complex vocabulary including technical language, medications, and proper nouns.

Streaming support

Send audio in continuous chunks and receive live transcriptions instantly – no buffering, just real-time understanding.

Text conditioning

Scribe v2 Realtime continues transcription seamlessly, even when connection resets.

Enterprise-grade security and infrastructure at scale

Foreground

Unmatched accuracy, even in the most complex environments

  • Scribe Background 2

    Natural Speech

    Filler words, pauses and emotional cues

  • Scribe 1

    Low-quality audio

    Background noise or low-bandwidth audio

  • Scribe background 4

    Accents

    Diverse accents and pronunciations

  • Scribe background 3

    Domain terms

    Acronyms, brands, financial or medical terms

Built for every workflow, from agents to production

ElevenLabs Agents

Power real-time voice interactions and conversational AI with instant, low-latency transcription. Scribe v2 Realtime enables agents to listen, understand, and respond faster than ever.

Agents Graphic - scribe

Scribe Realtime API

Integrate ultra-fast Speech-to-Text directly into your product with a simple WebSocket or REST API. Stream audio as it happens and receive accurate text in under 100 ms.

Scribe code snippet

Flexible pricing based on your needs

Experience best-in-class accuracy and responsiveness with pricing designed to scale from startups to enterprise teams.

$0.28 per hour & lower

on annual Business plans

UI Screenshot

Frequently asked questions

Latest updates

Create with the highest quality AI Audio