Voice changer

Learn how to transform audio between voices while preserving emotion and delivery.

Overview

ElevenLabs voice changer API lets you transform any source audio (recorded or uploaded) into a different, fully cloned voice without losing the performance nuances of the original. It’s capable of capturing whispers, laughs, cries, accents, and subtle emotional cues to achieve a highly realistic, human feel and can be used to:

  • Change any voice while preserving emotional delivery and nuance
  • Create consistent character voices across multiple languages and recording sessions
  • Fix or replace specific words and phrases in existing recordings

Explore our voice library to find the perfect voice for your project.

Supported languages

Our v2 models support 29 languages:

English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.

The eleven_english_sts_v2 model only supports English.

Best practices

Audio quality

  • Record in a quiet environment to minimize background noise
  • Maintain appropriate microphone levels - avoid too quiet or peaked audio
  • Use remove_background_noise=true if environmental sounds are present

Recording guidelines

  • Keep segments under 5 minutes for optimal processing
  • Feel free to include natural expressions (laughs, sighs, emotions)
  • The source audio’s accent and language will be preserved in the output

Parameters

  • Style: Set to 0% when input audio is already expressive
  • Stability: Use 100% for maximum voice consistency
  • Language: Choose source audio that matches your desired accent and language

FAQ

Yes, but you must split it into smaller chunks (each under 5 minutes). This helps ensure stability and consistent output.

Absolutely. Provide your custom voice’s voice_id and specify the correct model_id.

You’re charged at 1000 characters’ worth of usage per minute of processed audio. There’s no additional fee based on file size.

Possibly. Use remove_background_noise=true or the Voice Isolator tool to minimize environmental sounds in the final output.

Though eleven_english_sts_v2 is available, our eleven_multilingual_sts_v2 model often outperforms it, even for English material.

“Style” adds interpretative flair; “stability” enforces consistency. For high-energy performances in the source audio, turn style down and stability up.

Built with