A guide on using our voice changer tool for the most natural-sounding speech-to-speech conversion

Voice changer (previously Speech-to-Speech) allows you to convert one voice (source voice) into another (cloned voice) while preserving the tone and delivery of the original voice.

The possibilities are endless! Voice changer can be used to complement Text-to-Speech (TTS) by fixing pronunciation errors or infusing that special performance you’ve been wanting to exude. Voice changer is especially useful for emulating those subtle, idiosyncratic characteristics of the voice that give a more emotive and human feel. Some key features include:

  • Greater accuracy with whispering
  • The ability to create audible sighs, laughs, or cries
  • Greatly improved detection of tone and emotion
  • Accurately follows the input speaking cadence
  • Language/accent retention

Source audio (Brian):

Output audio (Lily):

Record or Upload

Audio can be uploaded either directly with an audio file, or spoken live through a microphone. The audio file must be less than 50mb in size, and either the audio file or your live recording cannot exceed 5 minutes in length. This is consistent among all subscription tiers, and is to ensure a stable output. If you have material longer than 5 minutes, we recommend breaking it up into smaller sections and generating them separately. Additionally, if your file size is too large, you may need to compress/convert it to an mp3.

To upload, either click the “Upload Audio” button in the audio box, or drag and drop your audio file directly onto it.

To record, first press the “Record Audio” button in the audio box, and then once you are ready to begin recording, press the Microphone button to start. After you’re finished recording, press the “Stop” button.

You will then see the audio file of this recording, which you can then playback to listen to - this is helpful to determine if you are happy with your performance/recording, or if you notice background noise that may inhibit the AI’s ability to produce a clean output. The character cost will be displayed on the bottom-left corner, and you will not be charged this quota for recording anything - only when you press “Generate”. The cost for a voice changer generation is solely duration-based at 1000 characters per minute.

If you need to re-do the recording, simply press the trash icon to remove it and start over. When you’re happy with your recording, you can select any voice or model you prefer, and you do not need to re-record the input audio.

Models

Voice changer is now available for all 29 languages currently supported by the Multilingual v2 model. The English v2 model is also available for specifically English speech, but the Multilingual v2 model generally performs better, even for English audio.

The settings for each model are consistent with the settings in our TTS m2 models. If the input audio is very expressive and energetic with lots of dynamic range, it’s best to keep Style all the way down to 0% and Stability all the up to 100% - we don’t want to inhibit the performance with the AI’s interpretation, so this will give the most consistent and stable results.

Other Tips and Tricks

Voice changer is exceptional in preserving accents and natural speech cadences with many different output voices you desire. For example, if you decide to upload an audio sample with a voice that is native to Portuguese, your output voices will adopt that same language and accent. Again, the input sample is the most important factor, and this is the data that voice changer will work with. If a British voice is chosen (let’s take our Default voice “George” as an example), but your recorded voice is an American accent, the final output will be George’s voice with an American accent.

When recording your voice, ensure that the input gain of your microphone is suitable. A quiet recording may make it more difficult for the AI to pick up what is being said, while a louder recording could produce audio clipping which is also undesirable. Additionally, try your best to prevent background noise from being present in the recording, as the AI will pick up everything, and it may try to “voice” any miscellaneous noises that it hears.

Optional: If you’re recording in a noisy environment, you may want to use our Voice Isolator tool on the recording. You can then add the edited audio file to voice changer as an upload.

Be expressive! Whether you’re shouting, crying, laughing, or anything in between, voice changer will copy that performance to a tee. We’re constantly striving to increase the realism of AI through many different features, and voice changer is our most useful tool in this regard. You can get really creative here!