
ElevenLabs is now a Kiro Power
- Category
- ElevenAPI
- Date
Scribe v2 Realtime is the most accurate real-time STT with 150ms latency across 90+ languages. Available via API.
Ultra-fast, ultra-accurate, and built for live speech. Scribe v2 Realtime delivers instant transcription for realtime use-cases.
Scribe v2 Realtime achieves industry-leading transcription accuracy with ~150ms latency, even in challenging audio conditions or across diverse accents.
Uh, hi! So, um, I was wondering if you wanted to meet up for coffee? Maybe tomorrow morning? [nervous laugh] Totally fine if not!
Transcription that works in noisy environments, with background music, strong accents, and low-quality audio.
Built on the foundation of Scribe v1, Scribe v2 Realtime delivers ~150 ms latency with breakthrough accuracy across accents, tones, and environments.

Scribe v2 Realtime is purpose-built for developers creating conversational agents, meeting assistants, and voice applications where speed and accuracy are critical.
Scribe v2 Realtime uses predictive transcription to anticipate the most probable next words and punctuation – enabling real-time accuracy.
Detects when speech starts and stops, segmenting audio precisely for smooth, efficient real-time transcription.
Gives developers control over when to finalize transcripts – ideal for custom streaming and fine-tuned accuracy.
Supports PCM (8–48 kHz) and μ-law encoding for compatibility across telephony, browser, and studio setups.
Scribe v2 for bulk use-cases, and Scribe v2 Realtime for low-latency use-cases

Highest accuracy, designed for batch workloads.

Lowest latency, for realtime workloads.
Delivering exceptional accuracy across accents, dialects, and recording conditions.
Change the languageCode to preview languages
import { useScribe } from "@elevenlabs/react";
const scribe = useScribe({
modelId: "scribe_v2_realtime",
languageCode: , // Set language
onSessionStarted: () =>
console.log("Session started"),
onPartialTranscript: (data) =>
console.log("Partial:", data.text)
});“From dubbing Reels in local languages, to generating music and character voices in Horizon, ElevenLabs platform enables global creators, businesses, and enterprises to build with voice, music, and sound at scale.”
“Scribe’s unmatched accuracy across so many languages lets Fieldy understand every daily conversation and easily scale across continents. Fieldy has increased user retention by 50% after moving to ElevenLabs Scribe.”
“ElevenLabs made it easy for us to quickly bring powerful text-to-speech capabilities to our SDK, allowing Agents to respond in real time with expressive voices to user questions or as feedback to what it’s seeing.”

“Twilio has integrated ElevenLabs’ generative AI voice technology into its CPaaS, enhancing ConversationRelay. This integration allows businesses and developers to create conversational AI voice interactions that sound human, feel expressive, and respond in real time directly from the Twilio CPaaS platform. We at ElevenLabs are excited that Twilio has chosen ElevenLabs to enhance ConversationRelay with the most expressive, human sounding voices available. ”

Experience best-in-class accuracy and responsiveness with pricing designed to scale from startups to enterprise teams.
$0.28 per hour & lower
on annual Business plans

Scribe v2 Realtime is a streaming Speech to Text model built for live transcription. It delivers 150 ms latency with 93.5% accuracy across 30 languages - outperforming Gemini Flash 2.5, GPT-4o Mini Transcribe, and Deepgram Nova 3 on the FLEURS benchmark.
Scribe v2 Realtime is optimized for streaming with 150 ms latency. Scribe v2 (batch) is built for recorded audio with additional features like speaker diarization, dynamic audio tagging, and 99 language support. Use Realtime for agents and live applications; use batch for post-processing workflows.
Scribe v2 Realtime achieves best-in-class accuracy across 99 languages and is robust to challenging audio conditions, accents, and recording quality. It outperforms previous generation models and other leading APIs on public benchmarks.
Approximately 150 ms end-to-end, excluding application and network latency. This is 3x faster than GPT-4o Mini Transcribe at 500 ms.
Scribe anticipates the next word and punctuation before they're spoken. This allows transcripts to commit without waiting for silence, resulting in smoother real-time output.
90+ languages with automatic language detection. The model handles mid-conversation language switches without configuration changes.
PCM audio from 8 kHz to 48 kHz sample rates, and μ-law encoding. Compatible with telephony, browser, and studio sources.
No currently. For multi-speaker identification, use Scribe v2 (batch) which supports up to 48 speakers.
30+ concurrent streams on Business plans. Enterprise plans include elevated limits. Contact sales for high-volume requirements.
Yes. Scribe v2 Realtime is integrated into the Agents platform by default.
SOC 2, ISO 27001, PCI DSS Level 1, HIPAA, and GDPR. Zero retention mode and EU/India data residency available for Enterprise.







.webp&w=3840&q=80)
