Speech to Text

The most accurate Speech to Text models

Scribe is the most accurate Speech to Text model. Scribe v2 Realtime sets the benchmark for live transcriptions - powering agents and real-time applications. Both available via API.

Scribe v2 Realtime

Transcribe live speech in under 150ms with Scribe v2 Realtime

Scribe v2 Realtime uses ElevenLabs’ streaming-first architecture to turn live speech into text instantly, across 90 languages.

Live call
I’m
happy
to
help.
What’s
your
email
address?
It’s
john.doe@me.com
Thanks.
And
your
phone
number?
1-800-404

Transcribe live speech

Scribe v2 Realtime captures live speech in under 150 ms with exceptional accuracy – built for agents, meetings, and AI Agents that demand instant understanding.

A bar chart showing Scribe Realtime outperforming Gemini, OpenAI and Deepgram Speech to Text models on accuracy.

High accuracy and ultra-low latency

Scribe v2 Realtime delivers industry-leading accuracy with sub-150 ms latency, setting a new benchmark for real-time speech recognition.

Voice Activity Detection

Automatically detect when speech starts and stops, segmenting speech with precision for smoother live processing.

Transcribe in 90 languages

Delivering exceptional accuracy across accents, dialects, and recording conditions.

Live in the API

Build Scribe Realtime v2 into your products with the API. With full-streaming support and commit control.

Scribe v1

Transcribe, caption and edit audio and video content with Scribe v1

Create captions, subtitles, and editable transcripts for podcasts, videos, interviews, and other recorded content – all with industry-leading accuracy in Studio or via API.

Amidst the outer atmosphere of the planet Aurora, the sky shimmered with fractured light, as though the planet’s veil were made of stained glass suspended in space.
Sensors pulsed with irregular patterns, the kind no algorithm could quite reconcile.
Amidst the outer atmosphere of the planet Aurora, the sky shimmered with fractured light, as though the planet’s veil were made of stained glass suspended in space.

Transcribe audio and video

Upload audio or video in any format — MP4, MOV, MP3, WAV, and more. Scribe v1 automatically converts speech into precise text, ready for captions, subtitles, or editing.

A bar chart showing Scribe v1 outperforming Gemini, OpenAI and Deepgram Speech to Text models on accuracy.

Over 95% transcription accuracy

Scribe achieves industry-leading transcription accuracy, delivering clean, editable text even in challenging audio conditions or across diverse accents.

Powerful transcription tools

Edit and finalize the transcripts directly in ElevenLabs or use our managed services team, to get to 100% accuracy.

Dynamic audio tagging

From laughter to footsteps, Scribe tags every sound event, enriching your transcripts with the full context.

Smart speaker diarization

In any conversation, even the busiest ones, Scribe intuitively distinguishes and labels every speaker.

Enterprise-grade security and infrastructure at scale

Foreground

Built for every workflow, from API to agents

Speech to Text APIs and SDKs

Integrate Scribe v1 and Scribe v2 Realtime into your product with the API or SDKs.

Scribe API code snippet

ElevenLabs Agents

Enable real-time voice interactions with instant, low-latency transcription.

Agents UI screenshot

ElevenLabs Studio

Convert recordings into editable text, captions, and repurposable content.

Studio UI mockup

Frequently asked questions

Latest updates

The most realistic voice AI platform