Speech to Text

Every word, perfectly captured

Scribe listens to every nuance, capturing each word with unmatched precision. Delivering audio transcription in 99 languages—with character-level timestamps, speaker diarization, and audio-event tagging—it returns structured results for seamless integration

Start transcribing free

Powerful Audio to Text features for your app

Transform your audio into flawless text with Scribe, the world's most advanced ASR (automatic speech recognition) model with the simplest speech to text API integration

Sirius software interface with gradient color bar, labeled "II Scribe V1," "Gemini 2.0 Flash," and "Whisper Large v3" on a black background.

Industry-leading accuracy

Achieve precision like never before—Scribe delivers the industry's lowest word error rate for perfectly accurate transcription

Three glowing, multicolored circular shapes on a black background.

Smart speaker diarization

In any conversation, even the busiest ones, Scribe intuitively distinguishes and labels every speaker for clear, organized transcripts

Audio level meter with red and white bars, showing levels around 1:00.

Precise word-level timestamps

Capture the exact moment each word is spoken. Scribe’s detailed timestamps enable seamless subtitle syncing and interactive audio experiences

Dynamic audio tagging

From laughter to footsteps, Scribe’s transcription model tags every sound event, enriching your transcripts with the full context of your audio

Global language support

Break language barriers with support for 99 languages—Scribe unlocks AI transcription capabilities for languages previously out of reach

Developers

Integrate ElevenLabs Scribe

Seamlessly integrate the world’s most accurate speech to text model, into your application. Get started with our developer-friendly examples that showcase features like diarization, character-level timestamps, and audio-event tagging for flawless transcriptions

Quickstart Speech to Text API reference

FLEURS Benchmark Performance

Scribe's performance is state of the art on the FLEURS benchmark

Common Voice Benchmark Performance

Scribe's performance is state of the art on the Common Voice benchmark

Benchmarks

The world's most accurate ASR model, supporting over 99 languages

Center screen displays a presentation slide titled "The world's most accurate ASR model" by IIElevenLabs, with a gradient bar labeled "II Scribe" and version "V1".

Benchmarked against other ASR models, Scribe delivers over 98% transcription accuracy in major languages while dramatically reducing errors in traditionally underserved ones—such as Serbian, Cantonese and Malayalam

Start transcribing free

AI Speech to Text transcription in 99 languages

Our AI speech to text transcription supports 99 languages, just select the language and upload your audio file.

Frequently asked questions

Excellent Accuracy (≤ 5% Word Error Rate - WER)
Bulgarian, Catalan, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Indonesian, Italian, Japanese, Kannada, Malay, Malayalam, Macedonian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swedish, Turkish, Ukrainian, Vietnamese

High Accuracy (>5% to ≤10% WER)
Bengali, Belarusian, Bosnian, Cantonese, Estonian, Filipino, Gujarati, Hungarian, Kazakh, Latvian, Lithuanian, Mandarin, Marathi, Nepali, Odia, Persian, Slovenian, Tamil, Telugu

Good (>10% to ≤25% WER)
Afrikaans, Arabic, Armenian, Assamese, Asturian, Azerbaijani, Burmese, Cebuano, Croatian, Georgian, Hausa, Hebrew, Icelandic, Javanese, Kabuverdianu, Korean, Kyrgyz, Lingala, Maltese, Mongolian, Māori, Occitan, Punjabi, Sindhi, Swahili, Tajik, Thai, Urdu, Uzbek, Welsh

Moderate (>25% to ≤50% WER)
Amharic, Chichewa, Fulah, Ganda, Igbo, Irish, Khmer, Kurdish, Lao, Luxembourgish, Luo, Northern Sotho, Pashto, Shona, Somali, Umbundu, Wolof, Xhosa, Zulu

Speech-to-text (STT) is a technology that converts spoken language into written text using automatic speech recognition (ASR). It processes audio signals, identifies speech patterns, and transcribes them into text with high accuracy. ElevenLabs' AI-powered speech-to-text software is designed to transcribe audio and video content with human-like precision, making it ideal for voice-to-text conversion, audio transcription, and real-time speech recognition. Speech-to-text technology is used in: ✔ Audio-to-text transcription for podcasts, meetings, and interviews. ✔ Captions and subtitles in video content. ✔ Voice-to-text software for hands-free typing and accessibility tools. ElevenLabs ASR offers fast, reliable, and highly accurate speech-to-text conversion for multiple languages and accents.

ElevenLabs provides video transcription to convert spoken dialogue into text format, making it easy to create subtitles, captions, and searchable transcripts. Steps to transcribe video to text: 1. Upload your video file to ElevenLabs ASR 2. Speech recognition technology processes the audio 3. A transcript is generated automatically, with timestamps 4. Download the text file or export subtitles for editing. This AI-powered video transcription model helps content creators, businesses, and educators quickly convert video speech into accurate text for accessibility and content repurposing.

Scribe currently works well for use-cases where the input audio is available upfront. A low-latency, real-time version will be released soon.

Starting from $0.40 per hour of transcribed audio, falling well below this at scale with Enterprise plans.

Recent Speech to Text Guides & How To's

Research

Introducing IIscribe V1, the world's most accurate speech-to-text model.

Research

Meet Scribe

Transcribe Speech to Text with the world's most accurate ASR model

Feb 26, 2025

Flavio Schneider,

Tim von Känel

Resources

Resources

Text to Speech vs Speech to Text: What is the Difference?

Learn all about the differences between text to speech and speech to text technology.

Dec 31, 2023

Resources

A close-up of a professional microphone in a recording studio with audio equipment in the background.

Resources

Best Speech to Text Apps 2025

Discover the 10 best speech to text apps currently on the market. Find the perfect dictation/transcription tool, whatever your requirements or budget.

Dec 31, 2023

Product

Introducing iScribe v1, the world's most accurate speech-to-text model.

Product

Scribe comparison to OpenAI’s 4o Speech to Text model

One month after its launch, Scribe keeps proving it’s the most advanced speech to text model in the industry.

Mar 24, 2025

Badi Badkoube, Growth

You might be interested in

AUDIO TO TEXT VIDEO TO TEXT DUBBING VOICE ISOLATOR VOICE CLONING VOICE DESIGN

Transcribe speech to text with the world’s most accurate ASR model

Every word, perfectly captured

Powerful Audio to Text features for your app

Industry-leading accuracy

Smart speaker diarization

Precise word-level timestamps

Dynamic audio tagging

Global language support

Developers

Integrate ElevenLabs Scribe

FLEURS Benchmark Performance

Common Voice Benchmark Performance

AI Speech to Text transcription in 99 languages

Frequently asked questions

Recent Speech to Text Guides & How To's

Meet Scribe

Text to Speech vs Speech to Text: What is the Difference?

Best Speech to Text Apps 2025

Scribe comparison to OpenAI’s 4o Speech to Text model

You might be interested in

SPEECH TO TEXT

Transcribe speech to text with the world’s most accurate ASR model

Every word, perfectly captured

Powerful Audio to Text features for your app

Industry-leading accuracy

Smart speaker diarization

Precise word-level timestamps

Dynamic audio tagging

Global language support

Developers

Integrate ElevenLabs Scribe

FLEURS Benchmark Performance

Common Voice Benchmark Performance

The world's most accurate ASR model, supporting over 99 languages

AI Speech to Text transcription in 99 languages

Frequently asked questions

What languages does Scribe support?

What is speech-to-text and how does it work?

How do I transcribe video to text?

Does ElevenLabs support real-time speech-to-text conversion?

How much does Scribe cost?

Recent Speech to Text Guides & How To's

Meet Scribe

Text to Speech vs Speech to Text: What is the Difference?

Best Speech to Text Apps 2025

Scribe comparison to OpenAI’s 4o Speech to Text model

You might be interested in