
Scribe v2 just got an upgrade
- Category
- Developer
- Date
Scribe v2 Realtime is the most accurate real-time STT with 150ms latency across 90+ languages. Available via API.
Ultra-fast, ultra-accurate, and built for live speech. Scribe v2 Realtime delivers instant transcription for realtime use-cases.
Scribe v2 Realtime achieves industry-leading transcription accuracy with ~150ms latency, even in challenging audio conditions or across diverse accents.
Uh, hi! So, um, I was wondering if you wanted to meet up for coffee? Maybe tomorrow morning? [nervous laugh] Totally fine if not!
Transcription that works in noisy environments, with background music, strong accents, and low-quality audio.
Built on the foundation of Scribe v1, Scribe v2 Realtime delivers ~150 ms latency with breakthrough accuracy across accents, tones, and environments.

Scribe v2 Realtime is purpose-built for developers creating conversational agents, meeting assistants, and voice applications where speed and accuracy are critical.
Scribe v2 Realtime uses predictive transcription to anticipate the most probable next words and punctuation – enabling real-time accuracy.
Detects when speech starts and stops, segmenting audio precisely for smooth, efficient real-time transcription.
Gives developers control over when to finalize transcripts – ideal for custom streaming and fine-tuned accuracy.
Supports PCM (8–48 kHz) and μ-law encoding for compatibility across telephony, browser, and studio setups.
Scribe v2 for bulk use-cases, and Scribe v2 Realtime for low-latency use-cases

Highest accuracy, designed for batch workloads.

Lowest latency, for realtime workloads.
Delivering exceptional accuracy across accents, dialects, and recording conditions.
Change the languageCode to preview languages
import { useScribe } from "@elevenlabs/react";
const scribe = useScribe({
modelId: "scribe_v2_realtime",
languageCode: , // Set language
onSessionStarted: () =>
console.log("Session started"),
onPartialTranscript: (data) =>
console.log("Partial:", data.text)
});“From dubbing Reels in local languages, to generating music and character voices in Horizon, ElevenLabs platform enables global creators, businesses, and enterprises to build with voice, music, and sound at scale.”
“Scribe’s unmatched accuracy across so many languages lets Fieldy understand every daily conversation and easily scale across continents. Fieldy has increased user retention by 50% after moving to ElevenLabs Scribe.”
“ElevenLabs made it easy for us to quickly bring powerful text-to-speech capabilities to our SDK, allowing Agents to respond in real time with expressive voices to user questions or as feedback to what it’s seeing.”

“Twilio has integrated ElevenLabs’ generative AI voice technology into its CPaaS, enhancing ConversationRelay. This integration allows businesses and developers to create conversational AI voice interactions that sound human, feel expressive, and respond in real time directly from the Twilio CPaaS platform. We at ElevenLabs are excited that Twilio has chosen ElevenLabs to enhance ConversationRelay with the most expressive, human sounding voices available. ”

Experience best-in-class accuracy and responsiveness with pricing designed to scale from startups to enterprise teams.
$0.28 per hour & lower
on annual Business plans









