Skip to content

Realtime Speech to Text API

Transcribe speech live with Scribe v2 Realtime

Scribe v2 Realtime is the most accurate real-time transcription model with 150ms latency across 90+ languages. Available via API.

  • Lovable
  • Veed model
  • Synthesia
  • Stripe
  • Perplexity
  • Twilio

Built for speed and accuracy

Ultra-fast, ultra-accurate, and built for live speech. Scribe v2 Realtime delivers instant transcription for realtime use-cases.

Scribe v2 Realtime achieves industry-leading transcription accuracy with ~150ms latency, even in challenging audio conditions or across diverse accents.

Highest-accuracy realtime transcription

Scribe v2 Realtime achieves industry-leading transcription accuracy with ~150ms latency, even in challenging audio conditions or across diverse accents.

Uh, hi! So, um, I was wondering if you wanted to meet up for coffee? Maybe tomorrow morning? [nervous laugh] Totally fine if not!

Designed for every scenario

Transcription that works in noisy environments, with background music, strong accents, and low-quality audio.

Speech recognition engineered for real-time performance

Built on the foundation of Scribe v1, Scribe v2 Realtime delivers ~150 ms latency with breakthrough accuracy across accents, tones, and environments.

Can I get a refund?
Sure. Can you share your order number please?
It's EL4543490
Thank you. I have initiated the order refund process.
Refund completed

Purpose-built for Agents and voice apps

Scribe v2 Realtime is purpose-built for developers creating conversational agents, meeting assistants, and voice applications where speed and accuracy are critical.

Scribe
makes
uses
is
has
new

Predictive transcription for low latency

Scribe v2 Realtime uses predictive transcription to anticipate the most probable next words and punctuation – enabling real-time accuracy.

Voice Activity Detection

Detects when speech starts and stops, segmenting audio precisely for smooth, efficient real-time transcription.

Manual Commit Control

Gives developers control over when to finalize transcripts – ideal for custom streaming and fine-tuned accuracy.

Multiple Audio Formats

Supports PCM (8–48 kHz) and μ-law encoding for compatibility across telephony, browser, and studio setups.

Models optimized for every use-case

Scribe v2 for bulk use-cases, and Scribe v2 Realtime for low-latency use-cases

Black Mountain

Scribe v2

Highest accuracy, designed for batch workloads.

  • >95% Accuracy
  • 90+ Languages
  • Non-Speech Event Detection
  • Entity Detection
  • Keyterm Prompting
Mountains

Scribe v2 Realtime

Lowest latency, for realtime workloads.

  • Under 150ms Latency
  • 90+ Languages
  • Transcription Streaming
  • Voice Activity Detection
  • Automatic Language Recognition

Transcribe speech in 90+ languages and a wide range of accents

Delivering exceptional accuracy across accents, dialects, and recording conditions.

Change the languageCode to preview languages

import { useScribe } from "@elevenlabs/react";

const scribe = useScribe({
  modelId: "scribe_v2_realtime",

  languageCode: 
, // Set language onSessionStarted: () => console.log("Session started"), onPartialTranscript: (data) => console.log("Partial:", data.text) });
Flag for en
English
Flag for zh
Chinese
Flag for es
Spanish
Flag for fr
French
Flag for pt
Portuguese
Flag for de
German
Flag for ja
Japanese
Flag for it
Italian
Flag for hi
Hindi
Flag for en
EnglishClick to preview

Powering the world’s leading companies and brands

  • From dubbing Reels in local languages, to generating music and character voices in Horizon, ElevenLabs platform enables global creators, businesses, and enterprises to build with voice, music, and sound at scale.
    Meta Color Logo
  • Scribe’s unmatched accuracy across so many languages lets Fieldy understand every daily conversation and easily scale across continents. Fieldy has increased user retention by 50% after moving to ElevenLabs Scribe.
    Fieldy logo
  • ElevenLabs made it easy for us to quickly bring powerful text-to-speech capabilities to our SDK, allowing Agents to respond in real time with expressive voices to user questions or as feedback to what it’s seeing.
    Stream Color Logo
  • Twilio has integrated ElevenLabs’ generative AI voice technology into its CPaaS, enhancing ConversationRelay. This integration allows businesses and developers to create conversational AI voice interactions that sound human, feel expressive, and respond in real time directly from the Twilio CPaaS platform. We at ElevenLabs are excited that Twilio has chosen ElevenLabs to enhance ConversationRelay with the most expressive, human sounding voices available.
    Twilio logo

APIs built for production

Foreground

Flexible pricing based on your needs

Experience best-in-class accuracy and responsiveness with pricing designed to scale from startups to enterprise teams.

$0.28 per hour & lower

on annual Business plans

UI Screenshot

Frequently asked questions

Latest updates

The most realistic audio AI platform