
Speech to Text API
Transcribe speech with ElevenLabs Scribe v2
Highest accuracy speech to text for bulk applications. Detect emphasis & sound effects, and guide transcription with keyterm prompting.
Uh, hi! So, um, I was wondering if you wanted to meet up for coffee? Maybe tomorrow morning? [nervous laugh] Totally fine if not!
- Lovable
- Veed model
- Synthesia
- Stripe
- Perplexity
- Twilio
Most accurate Speech to Text API for batch workloads
Create captions, subtitles, and editable transcripts for podcasts, videos, interviews, and other recorded content – all with industry-leading accuracy via API.
Unprecedented transcription accuracy
Scribe v2 achieves industry-leading transcription accuracy, delivering clean, editable text even in challenging audio conditions or across diverse accents.
Uh, hi! So, um, I was wondering if you wanted to meet up for coffee? Maybe tomorrow morning? [nervous laugh] Totally fine if not!
Designed for every scenario
Transcription that works in noisy environments, with background music, strong accents, and low-quality audio.
Fine-grained control over timing, speakers, and non-speech events.
The ElevenLabs Transcription API can detect laughter, emotion, and sound effects. Use keyterm prompting to guide transcription with domain-specific terms.
Transcribe audio and video
.webp&w=3840&q=95)
Clean, editable transcripts
.webp&w=3840&q=95)
Keyterm prompting

Dynamic audio tagging
Capture non-speech events like laughter, applause, music, and background noise. Transcripts include the full context of your audio, not just the words.
Smart speaker diarization
Automatically identify and label up to 48 speakers. Clear attribution of who said what, organized into readable transcripts.
Entity detection
Automatically identify and tag 56 entity types including names, dates, locations, and organizations within your transcripts.

Scribe v2
Highest accuracy, designed for batch workloads.
- >95% Accuracy
- 90+ Languages
- Non-Speech Event Detection
- Entity Detection
- Keyterm Prompting

Scribe v2 Realtime
Lowest latency, for realtime workloads.
- Under 150ms Latency
- 90+ Languages
- Transcription Streaming
- Voice Activity Detection
- Automatic Language Recognition
Transcribe speech in 90+ languages and a wide range of accents
Delivering exceptional accuracy across accents, dialects, and recording conditions.
Change the languageCode to preview languages
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
const elevenlabs = new ElevenLabsClient({
apiKey: "<your_api_key>"
});
const response = await fetch(
"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
);
const audioBlob = new Blob([await response.arrayBuffer()], { type: "audio/mp3" });
const transcription = await elevenlabs
.speechToText.convert({
file: audioBlob,
modelId: "scribe_v2",
tagAudioEvents: true,
languageCode: , // Set language
diarize: true
});
console.log(transcription);Powering the world’s leading companies and brands
“From dubbing Reels in local languages, to generating music and character voices in Horizon, ElevenLabs platform enables global creators, businesses, and enterprises to build with voice, music, and sound at scale.”
“Scribe’s unmatched accuracy across so many languages lets Fieldy understand every daily conversation and easily scale across continents. Fieldy has increased user retention by 50% after moving to ElevenLabs Scribe.”
“ElevenLabs made it easy for us to quickly bring powerful text-to-speech capabilities to our SDK, allowing Agents to respond in real time with expressive voices to user questions or as feedback to what it’s seeing.”

“Twilio has integrated ElevenLabs’ generative AI voice technology into its CPaaS, enhancing ConversationRelay. This integration allows businesses and developers to create conversational AI voice interactions that sound human, feel expressive, and respond in real time directly from the Twilio CPaaS platform. We at ElevenLabs are excited that Twilio has chosen ElevenLabs to enhance ConversationRelay with the most expressive, human sounding voices available. ”
APIs built for production

.webp&w=3840&q=80)




.webp&w=3840&q=80)

.webp&w=3840&q=80)