Voice changer | ElevenLabs Documentation

Overview

ElevenLabs voice changer API lets you transform any source audio (recorded or uploaded) into a different, fully cloned voice without losing the performance nuances of the original. It’s capable of capturing whispers, laughs, cries, accents, and subtle emotional cues to achieve a highly realistic, human feel and can be used to:

Change any voice while preserving emotional delivery and nuance
Create consistent character voices across multiple languages and recording sessions
Fix or replace specific words and phrases in existing recordings

Explore our voice library to find the perfect voice for your project.

Developer quickstart

Learn how to integrate voice changer into your application.

Product guide

Step-by-step guide for using voice changer in ElevenLabs.

Supported languages

Our multilingual v2 models support 29 languages:

English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.

The eleven_english_sts_v2 model only supports English.

Best practices

Audio quality

Record in a quiet environment to minimize background noise
Maintain appropriate microphone levels - avoid too quiet or peaked audio
Use remove_background_noise=true if environmental sounds are present

Recording guidelines

Keep segments under 5 minutes for optimal processing
Feel free to include natural expressions (laughs, sighs, emotions)
The source audio’s accent and language will be preserved in the output

Parameters

Style: Set to 0% when input audio is already expressive
Stability: Use 100% for maximum voice consistency
Language: Choose source audio that matches your desired accent and language

FAQ

Can I convert more than 5 minutes of audio?

Yes, but you must split it into smaller chunks (each under 5 minutes). This helps ensure stability and consistent output.

Can I use my own custom/cloned voice for output?

Absolutely. Provide your custom voice’s voice_id and specify the correct model_id.

How is billing handled?

You’re charged at 1000 characters’ worth of usage per minute of processed audio. There’s no additional fee based on file size.

Does the model reproduce background noise?

Possibly. Use remove_background_noise=true or the Voice Isolator tool to minimize environmental sounds in the final output.

Which model is best for English audio?

Though eleven_english_sts_v2 is available, our eleven_multilingual_sts_v2 model often outperforms it, even for English material.

How does style & stability work?

“Style” adds interpretative flair; “stability” enforces consistency. For high-energy performances in the source audio, turn style down and stability up.