Professional Voice Cloning
Learn how to clone a voice using the Clone Voice API.
This guide will show you how to create a Professional Voice Clone (PVC) using the PVC API. To create a PVC via the dashboard, refer to the Professional Voice Clone product guide.
Creating a PVC requires you to be on the Creator plan or above.
For an outline of the differences between Instant Voice Clones and Professional Voice Clones, refer to the Voices capability guide.
If you are unsure about what is permissible from a legal standpoint, please consult the Terms of Service and our AI Safety information for more information.
In terms of creating a PVC via the API, it contains considerably more steps than creating an Instant Voice Clone. This is due to the fact that PVCs are more complex and require more data and fine-tuning to create a high quality clone.
Using the Professional Voice Clone API
Create an API key
Create an API key in the dashboard here, which you’ll use to securely access the API.
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an .env
file, or directly in your app’s configuration depending on your preference.
Create a PVC voice
Create a new file named example.py
or example.mts
, depending on your language of choice and add the following code to create a PVC voice:
Upload audio files
Next we’ll upload the audio sample files that will be used to train the PVC. Review the Tips and suggestions section of the PVC product guide for more information on how to get best results from your audio files.
Begin speaker separation
This step will attempt to separate the audio files into individual speakers. This is required if you are uploading audio with multiple speakers.
Retrieve speaker audio
Since the previous step will take some time to complete, the following step should be run in a separate process after the previous step has completed.
Once speaker separation is complete, you will have a list of speakers for each sample. In the case of samples with multiple speakers, you will have to pick the speaker you want to use for the PVC. To identify the speaker, you can retrieve the audio for each speaker and listen to them.
Update samples with speaker IDs
Once speaker separation is complete, you can update the samples to select which speaker you want to use for the PVC.
Verify the PVC
Before training can begin, a verification step is required to ensure you have permission to use the voice. First request the verification CAPTCHA.
The image contains several lines of text that the voice owner will need to read out loud and record. Once done, submit the recording to verify the identity of the voice’s owner.
(Optional) Request manual verification
If you are unable to verify the CAPTCHA, you can request manual verification. Note that this will take longer to process.
This should only be used if the previous verification steps have failed or are not possible, for instance if the voice owner is visually impaired.
For a list of the files that are required for manual verification, please contact support as each case may be unique.
Train the PVC
Next, begin the training process. This will take some time to complete based on the length and number of samples provided.
Use the newly created voice
Once the PVC is verified, you can use it in the same way as any other voice. See the Speech to Text quickstart for more information on how to use a voice.
Next steps
Explore the API reference for more information on creating a Professional Voice Clone.