Voice changer
Learn how to transform audio between voices while preserving emotion and delivery.
Overview
ElevenLabs voice changer API lets you transform any source audio (recorded or uploaded) into a different, fully cloned voice without losing the performance nuances of the original. It’s capable of capturing whispers, laughs, cries, accents, and subtle emotional cues to achieve a highly realistic, human feel and can be used to:
- Change any voice while preserving emotional delivery and nuance
- Create consistent character voices across multiple languages and recording sessions
- Fix or replace specific words and phrases in existing recordings
Explore our voice library to find the perfect voice for your project.
Learn how to integrate voice changer into your application.
Step-by-step guide for using voice changer in ElevenLabs.
Supported languages
Our v2 models support 29 languages:
English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.
The eleven_english_sts_v2
model only supports English.
Best practices
Audio quality
- Record in a quiet environment to minimize background noise
- Maintain appropriate microphone levels - avoid too quiet or peaked audio
- Use
remove_background_noise=true
if environmental sounds are present
Recording guidelines
- Keep segments under 5 minutes for optimal processing
- Feel free to include natural expressions (laughs, sighs, emotions)
- The source audio’s accent and language will be preserved in the output
Parameters
- Style: Set to 0% when input audio is already expressive
- Stability: Use 100% for maximum voice consistency
- Language: Choose source audio that matches your desired accent and language
FAQ
Can I convert more than 5 minutes of audio?
Yes, but you must split it into smaller chunks (each under 5 minutes). This helps ensure stability and consistent output.
Can I use my own custom/cloned voice for output?
Absolutely. Provide your custom voice’s voice_id
and specify the correct
model_id
.
How is billing handled?
You’re charged at 1000 characters’ worth of usage per minute of processed audio. There’s no additional fee based on file size.
Does the model reproduce background noise?
Possibly. Use remove_background_noise=true
or the Voice Isolator tool to minimize
environmental sounds in the final output.
Which model is best for English audio?
Though eleven_english_sts_v2
is available, our
eleven_multilingual_sts_v2
model often outperforms it, even for English material.
How does style & stability work?
“Style” adds interpretative flair; “stability” enforces consistency. For high-energy performances in the source audio, turn style down and stability up.