What is Scribe v2 Realtime?

Scribe v2 Realtime is a streaming Speech to Text model built for live transcription. It delivers 150 ms latency with 93.5% accuracy across 30 languages - outperforming Gemini Flash 2.5, GPT-4o Mini Transcribe, and Deepgram Nova 3 on the FLEURS benchmark.

How does Scribe v2 Realtime differ from Scribe v2?

Scribe v2 Realtime is optimized for streaming with 150 ms latency. Scribe v2 (batch) is built for recorded audio with additional features like speaker diarization, dynamic audio tagging, and 99 language support. Use Realtime for agents and live applications; use batch for post-processing workflows.

How accurate is Scribe for realtime transcription?

Scribe v2 Realtime achieves best-in-class accuracy across 99 languages and is robust to challenging audio conditions, accents, and recording quality. It outperforms previous generation models and other leading APIs on public benchmarks.

Approximately 150 ms end-to-end, excluding application and network latency. This is 3x faster than GPT-4o Mini Transcribe at 500 ms.

What is negative latency / predictive transcription?

Scribe anticipates the next word and punctuation before they're spoken. This allows transcripts to commit without waiting for silence, resulting in smoother real-time output.

What languages are supported?

90+ languages with automatic language detection. The model handles mid-conversation language switches without configuration changes.

What audio formats are supported?

PCM audio from 8 kHz to 48 kHz sample rates, and μ-law encoding. Compatible with telephony, browser, and studio sources.

Does Scribe v2 Realtime support speaker diarization?

No currently. For multi-speaker identification, use Scribe v2 (batch) which supports up to 48 speakers.

What is the concurrency limit?

30+ concurrent streams on Business plans. Enterprise plans include elevated limits. Contact sales for high-volume requirements.

Is Scribe v2 Realtime available in ElevenLabs Agents?

Yes. Scribe v2 Realtime is integrated into the Agents platform by default.

What compliance certifications are available?

SOC 2, ISO 27001, PCI DSS Level 1, HIPAA, and GDPR. Zero retention mode and EU/India data residency available for Enterprise.

Realtime Speech to Text API

Transcribe speech live with Scribe v2 Realtime

Get API key Explore docs

Scribe v2 Realtime is the most accurate real-time STT with 150ms latency across 90+ languages. Available via API.

Demo

Code

Lovable
Veed model
Synthesia
Stripe
Perplexity
Twilio

Built for speed and accuracy

Get API key Explore Docs

Ultra-fast, ultra-accurate, and built for live speech. Scribe v2 Realtime delivers instant transcription for realtime use-cases.

Highest-accuracy realtime transcription

Scribe v2 Realtime achieves industry-leading transcription accuracy with ~150ms latency, even in challenging audio conditions or across diverse accents.

Uh, hi! So, um, I was wondering if you wanted to meet up for coffee? Maybe tomorrow morning? [nervous laugh] Totally fine if not!

Natural Speech

Low-quality Audio

Accents

Domain Terms

Designed for every scenario

Transcription that works in noisy environments, with background music, strong accents, and low-quality audio.

Speech recognition engineered for real-time performance

Built on the foundation of Scribe v1, Scribe v2 Realtime delivers ~150 ms latency with breakthrough accuracy across accents, tones, and environments.

Purpose-built for Agents and voice apps

Scribe v2 Realtime is purpose-built for developers creating conversational agents, meeting assistants, and voice applications where speed and accuracy are critical.

Can I get a refund?

Sure. Can you share your order number please?

It's EL4543490

Thank you. I have initiated the order refund process.

Refund completed

Predictive transcription for low latency

Scribe v2 Realtime uses predictive transcription to anticipate the most probable next words and punctuation – enabling real-time accuracy.

Scribe

makes

uses

is

has

new

Voice Activity Detection

Detects when speech starts and stops, segmenting audio precisely for smooth, efficient real-time transcription.

Manual Commit Control

Gives developers control over when to finalize transcripts – ideal for custom streaming and fine-tuned accuracy.

Multiple Audio Formats

Supports PCM (8–48 kHz) and μ-law encoding for compatibility across telephony, browser, and studio setups.

Models optimized for every use-case

Scribe v2 for bulk use-cases, and Scribe v2 Realtime for low-latency use-cases

Scribe v2

Highest accuracy, designed for batch workloads.

>95% Accuracy
90+ Languages
Non-Speech Event Detection
Entity Detection
Keyterm Prompting

Scribe v2 Realtime

Lowest latency, for realtime workloads.

Under 150ms Latency
90+ Languages
Transcription Streaming
Voice Activity Detection
Automatic Language Recognition

Transcribe speech in 90+ languages and a wide range of accents

Delivering exceptional accuracy across accents, dialects, and recording conditions.

Change the languageCode to preview languages

import { useScribe } from "@elevenlabs/react";

const scribe = useScribe({
  modelId: "scribe_v2_realtime",

  languageCode: 
, // Set language

  onSessionStarted: () =>
    console.log("Session started"),
  onPartialTranscript: (data) =>
    console.log("Partial:", data.text)
});

English

Chinese

Spanish

French

Portuguese

German

Japanese

Italian

Hindi

EnglishClick to preview

Powering the world’s leading companies and brands

View customer stories

“From dubbing Reels in local languages, to generating music and character voices in Horizon, ElevenLabs platform enables global creators, businesses, and enterprises to build with voice, music, and sound at scale.”
“Scribe’s unmatched accuracy across so many languages lets Fieldy understand every daily conversation and easily scale across continents. Fieldy has increased user retention by 50% after moving to ElevenLabs Scribe.”
“ElevenLabs made it easy for us to quickly bring powerful text-to-speech capabilities to our SDK, allowing Agents to respond in real time with expressive voices to user questions or as feedback to what it’s seeing.”
“Twilio has integrated ElevenLabs’ generative AI voice technology into its CPaaS, enhancing ConversationRelay. This integration allows businesses and developers to create conversational AI voice interactions that sound human, feel expressive, and respond in real time directly from the Twilio CPaaS platform. We at ElevenLabs are excited that Twilio has chosen ElevenLabs to enhance ConversationRelay with the most expressive, human sounding voices available. ”

APIs built for production

Flexible pricing based on your needs

Experience best-in-class accuracy and responsiveness with pricing designed to scale from startups to enterprise teams.

$0.28 per hour & lower

on annual Business plans

Explore docs

Frequently asked questions

Latest updates

All updates

Realtime Speech to Text API

Transcribe speech live with Scribe v2 Realtime