For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Connect
BlogHelp CenterAPI PricingSign up
OverviewElevenCreativeElevenAgentsElevenAPIReception AIAPI referenceChangelog
OverviewElevenCreativeElevenAgentsElevenAPIReception AIAPI referenceChangelog
    • Introduction
    • Models
  • Capabilities
    • Text to Speech
    • Speech to Text
    • Music
    • Text to Dialogue
    • Image & Video
    • Voice Changer
    • Voice Isolator
    • Dubbing
    • Sound Effects
    • Voices
    • Voice Remixing
    • Forced Alignment
    • Voice Agents
    • Speech Engine
  • Administration
    • Account
    • Billing
    • Pay As You Go
    • Consolidated billing
    • Data Residency
    • Usage analytics
    • Files
LogoLogo
Login
Login
Connect
BlogHelp CenterAPI PricingSign up
On this page
  • Overview
  • Usage
  • Supported languages
  • Key facts
Capabilities

Forced Alignment

Learn how to turn spoken audio and text into a time-aligned transcript with ElevenLabs.

Was this page helpful?
Previous

Speech Engine

Add voice to your own chat agent or LLM with ElevenLabs.
Next
Built with

Overview

The ElevenLabs Forced Alignment API turns spoken audio and text into a time-aligned transcript. This is useful for cases where you have audio recording and a transcript, but need exact timestamps for each word or phrase in the transcript. This can be used for:

  • Matching subtitles to a video recording
  • Generating timings for an audiobook recording of an ebook

Usage

The Forced Alignment API can be used by interfacing with the ElevenLabs API directly.

Developers

Learn how to integrate Forced Alignment into your application.

API reference

Full API reference for the Forced Alignment endpoint.

Supported languages

Our multilingual v2 models support 29 languages:

English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.

Key facts

  • Input text format: Plain string only — do not wrap input text in JSON or any other structure
  • Diarization: Not supported; providing diarized text will produce unexpected results
  • Pricing: Same rate as the Speech to Text API
  • Maximum file size: 3 GB
  • Maximum audio duration: 10 hours
  • Maximum text length: 675,000 characters