SPEECH TO TEXT

Transcribe, caption, and edit speech with the world’s most accurate ASR model

Achieve industry-leading transcription accuracy in 99 languages with Scribe. Go beyond transcription with auto-generated captions, video alignment, text-based editing, and seamless API and Studio integration.

Speaker 1
Quick
check-in.
Maple
Street
is
a
mess.
Time
to
fix
it.
Speaker 2
Totally.
Some
of
those
potholes
could
swallow
a
small
car.
Speaker 1
Or
a
very
brave
skateboarder.
Speaker 2
We
start
next
week.
Jonas,
four-week
timeline?
Speaker 3
Yep,
unless
the
concrete
throws
a
tantrum.
Speaker 1
I'll
handle
flyers,
maybe
toss
in
a
joke.
"Maple
Street,
soon
less
bumpy."
Speaker 2
Perfect.
Keep
it
simple
and
positive.
Speaker 3
And
no
squirrels
on
sight,
please.
Speaker 1
Agreed.
Let's
roll.
Thanks,
team.
The world's most accurate ASR model by IIElevenLabs.

Every word, perfectly captured

Scribe listens to every nuance, capturing each word with unmatched precision across 99 languages. With character-level timestamps, speaker diarization, and audio-event tagging, it delivers structured transcripts ready for integration or editing.

Transcribe audio and video

Video and audio transcription

Upload video or audio in MP4, MOV, MP3, WAV, and more. Scribe automatically converts speech into accurate text, ready for captions, subtitles, or editing.

Captions and Subtitles

Auto-generate captions and subtitles

Create captions for any video in one click. Generate multilingual subtitles for YouTube, TikTok, and more—improving accessibility and reach.

Voiceovers

Edit voiceovers by editing text

Fix mistakes or refine narration without re-recording. Edit transcripts directly and Scribe updates the audio, streamlining video and podcast production.

Timeline

Timeline editor for precision

Align dialogue, background music, and sound effects with video. Use our timeline editor to place audio exactly where it belongs.

Powerful Audio to Text features for your app

Transform your audio into flawless text with Scribe, the world's most advanced ASR (automatic speech recognition) model with the simplest speech to text API integration

Sirius software interface with gradient color bar, labeled "II Scribe V1," "Gemini 2.0 Flash," and "Whisper Large v3" on a black background.

Industry-leading accuracy

Achieve precision like never before—Scribe delivers the industry's lowest word error rate for perfectly accurate transcription

Three glowing, multicolored circular shapes on a black background.

Smart speaker diarization

In any conversation, even the busiest ones, Scribe intuitively distinguishes and labels every speaker for clear, organized transcripts

Audio level meter with red and white bars, showing levels around 1:00.

Precise word-level timestamps

Capture the exact moment each word is spoken for seamless subtitle syncing and interactive audio experiences.

laughter

Dynamic audio tagging

From laughter to footsteps, Scribe’s transcription model tags every sound event, enriching your transcripts with the full context of your audio

99 Languages supported

Global language support

Break language barriers with support for 99 languages—Scribe unlocks AI transcription capabilities for languages previously out of reach

Voice Library Image 1

Voice cleanup and editing tools

Remove background noise, reverb, and unwanted sounds for clean dialogue. Change narrator voices instantly with AI voice changer.

Developers

Integrate ElevenLabs Scribe

Seamlessly integrate the world’s most accurate speech to text model, into your application. Get started with our developer-friendly examples that showcase features like diarization, character-level timestamps, and audio-event tagging for flawless transcriptions

Bar chart showing word error rates for different languages and speech recognition models.

FLEURS Benchmark Performance

Scribe's performance is state of the art on the FLEURS benchmark

A bar chart comparing word error rates for different voice recognition models across various countries.

Common Voice Benchmark Performance

Scribe's performance is state of the art on the Common Voice benchmark

Benchmarks

The world's most accurate ASR model, supporting over 99 languages

Center screen displays a presentation slide titled "The world's most accurate ASR model" by IIElevenLabs, with a gradient bar labeled "II Scribe" and version "V1".

Benchmarked against other ASR models, Scribe delivers over 98% transcription accuracy in major languages while dramatically reducing errors in traditionally underserved ones—such as Serbian, Cantonese and Malayalam

Start transcribing free

Speech to Text Pricing Plans

Free

$0/mo
Get started

Hours included

Price per included hour

Price per additional hour

2 hours 30 minutes

Free tier requires attribution and does not have commercial licensing

Frequently asked questions

Recent Speech to Text Guides & How To's

Research
Introducing IIscribe V1, the world's most accurate speech-to-text model.

Meet Scribe

Resources
A close-up of a professional microphone in a recording studio with audio equipment in the background.

Best Speech to Text Apps 2025

ElevenLabs

Create with the highest quality AI Audio

Get started free

Already have an account? Log in