Professional Voice Cloning | ElevenLabs Documentation

This guide will show you how to create a Professional Voice Clone (PVC) using the PVC API. To create a PVC via the dashboard, refer to the Professional Voice Clone product guide.

Creating a PVC requires you to be on the Creator plan or above.

For an outline of the differences between Instant Voice Clones and Professional Voice Clones, refer to the Voices capability guide.

If you are unsure about what is permissible from a legal standpoint, please consult the Terms of Service and our AI Safety information for more information.

In terms of creating a PVC via the API, it contains considerably more steps than creating an Instant Voice Clone. This is due to the fact that PVCs are more complex and require more data and fine-tuning to create a high quality clone.

Using the Professional Voice Clone API

Create an API key

Create an API key in the dashboard here, which you’ll use to securely access the API.

Store the key as a managed secret and pass it to the SDKs either as a environment variable via an .env file, or directly in your app’s configuration depending on your preference.

.env

1 ELEVENLABS_API_KEY=<your_api_key_here>

Install the SDK

We’ll also use the dotenv library to load our API key from an environment variable.

1 pip install elevenlabs
2 pip install python-dotenv

Create a PVC voice

Create a new file named example.py or example.mts, depending on your language of choice and add the following code to create a PVC voice:

1 # example.py
2 import os
3 import time
4 import base64
5 from contextlib import ExitStack
6 from io import BytesIO
7 from dotenv import load_dotenv
8 from elevenlabs.client import ElevenLabs
9 
10 load_dotenv()
11 
12 elevenlabs = ElevenLabs(
13   api_key=os.getenv("ELEVENLABS_API_KEY"),
14 )
15 
16 voice = elevenlabs.voices.pvc.create(
17     name="My Professional Voice Clone",
18     language="en",
19     description="A professional voice clone of my voice"
20 )
21 
22 print(voice)

Upload audio files

Next we’ll upload the audio sample files that will be used to train the PVC. Review the Tips and suggestions section of the PVC product guide for more information on how to get best results from your audio files.

1 # Define the list of file paths explicitly
2 # Replace with the paths to your audio and/or video files.
3 # The more files you add, the better the clone will be.
4 sample_file_paths = [
5     "/path/to/your/first_sample.mp3",
6     "/path/to/your/second_sample.wav",
7     "relative/path/to/another_sample.mp4"
8 ]
9 
10 samples = None
11 
12 files_to_upload = []
13 # Use ExitStack to manage multiple open files
14 with ExitStack() as stack:
15     for filepath in sample_file_paths:
16         # Open each file and add it to the stack
17         audio_file = stack.enter_context(open(filepath, "rb"))
18         filename = os.path.basename(filepath)
19 
20         # Create a File object for the SDK
21         files_to_upload.append(
22             BytesIO(audio_file.read())
23         )
24 
25     samples = elevenlabs.voices.pvc.samples.create(
26         voice_id=voice.voice_id,
27         files=files_to_upload # Pass the list of File objects
28     )

Begin speaker separation

This step will attempt to separate the audio files into individual speakers. This is required if you are uploading audio with multiple speakers.

1 sample_ids_to_check = []
2 for sample in samples:
3     if sample.sample_id:
4         print(f"Starting separation for sample: {sample.sample_id}")
5         elevenlabs.voices.pvc.samples.speakers.separate(
6             voice_id=voice.voice_id,
7             sample_id=sample.sample_id
8         )
9         sample_ids_to_check.append(sample.sample_id)
10 
11 while sample_ids_to_check:
12     # Create a copy of the list to iterate over, so we can remove items from the original
13     ids_in_batch = list(sample_ids_to_check)
14     for sample_id in ids_in_batch:
15         status_response = elevenlabs.voices.pvc.samples.speakers.get(
16             voice_id=voice.voice_id,
17             sample_id=sample_id
18         )
19         status = status_response.status
20         print(f"Sample {sample_id} status: {status}")
21         if status == "completed" or status == "failed":
22             sample_ids_to_check.remove(sample_id)
23 
24     if sample_ids_to_check:
25         # Wait before the next poll cycle
26         time.sleep(5) # Wait for 5 seconds
27 
28 print("All samples have been processed or removed from polling.")

Retrieve speaker audio

Since the previous step will take some time to complete, the following step should be run in a separate process after the previous step has completed.

Once speaker separation is complete, you will have a list of speakers for each sample. In the case of samples with multiple speakers, you will have to pick the speaker you want to use for the PVC. To identify the speaker, you can retrieve the audio for each speaker and listen to them.

1 # Get the list of samples from the voice created in Step 3
2 voice = elevenlabs.voices.get(voice_id=voice_id)
3 
4 samples = voice.samples
5 
6 # Loop over each sample and save the audio for each speaker to a file
7 speaker_audio_output_dir = "path/to/speakers/"
8 if not os.path.exists(speaker_audio_output_dir):
9     os.makedirs(speaker_audio_output_dir)
10 
11 for sample in samples:
12     speaker_info = elevenlabs.voices.pvc.samples.speakers.get(
13         voice_id=voice.voice_id,
14         sample_id=sample.sample_id
15     )
16 
17     # Proceed only if separation is actually complete
18     if getattr(speaker_info, 'status', 'unknown') != "completed":
19         continue
20 
21     if hasattr(speaker_info, 'speakers') and speaker_info.speakers:
22         speaker_list = speaker_info.speakers
23         if isinstance(speaker_info.speakers, dict):
24             speaker_list = speaker_info.speakers.values()
25 
26         for speaker in speaker_list:
27             audio_response = elevenlabs.voices.pvc.samples.speakers.audio.get(
28                 voice_id=voice.voice_id,
29                 sample_id=sample.sample_id,
30                 speaker_id=speaker.speaker_id
31             )
32 
33             audio_base64 = audio_response.audio_base_64
34             audio_data = base64.b64decode(audio_base64)
35             output_filename = os.path.join(speaker_audio_output_dir, f"sample_{sample.sample_id}_speaker_{speaker.speaker_id}.mp3")
36 
37             with open(output_filename, "wb") as f:
38                 f.write(audio_data)

Update samples with speaker IDs

Once speaker separation is complete, you can update the samples to select which speaker you want to use for the PVC.

1 elevenlabs.voices.pvc.samples.update(
2     voice_id=voice.voice_id,
3     sample_id=sample.sample_id,
4     selected_speaker_ids=[speaker.speaker_id]
5 )

Verify the PVC

Before training can begin, a verification step is required to ensure you have permission to use the voice. First request the verification CAPTCHA.

1 captcha_response = elevenlabs.voices.pvc.verification.captcha.get(voice.voice_id)
2 
3 # Save captcha image to file
4 captcha_buffer = base64.b64decode(captcha_response)
5 with open('captcha.png', 'wb') as f:
6     f.write(captcha_buffer)

The image contains several lines of text that the voice owner will need to read out loud and record. Once done, submit the recording to verify the identity of the voice’s owner.

1 elevenlabs.voices.pvc.verification.captcha.verify(
2     voice_id=voice.voice_id,
3     recording=open('path/to/recording.mp3', 'rb')
4 )

(Optional) Request manual verification

If you are unable to verify the CAPTCHA, you can request manual verification. Note that this will take longer to process.

This should only be used if the previous verification steps have failed or are not possible, for instance if the voice owner is visually impaired.

For a list of the files that are required for manual verification, please contact support as each case may be unique.

1 elevenlabs.voices.pvc.verification.request(
2     voice_id=voice.voice_id,
3     files=[open('path/to/verification/files.txt', 'rb')],
4 )

Train the PVC

Next, begin the training process. This will take some time to complete based on the length and number of samples provided.

1 elevenlabs.voices.pvc.train(
2     voice_id=voice.voice_id,
3     # Specify the model the PVC should be trained on
4     model_id="eleven_multilingual_v2"
5 )
6 
7 # Poll the fine tuning status until it is complete or fails
8 # This example specifically checks for the eleven_multilingual_v2 model
9 while True:
10     voice_details = elevenlabs.voices.get(voice_id=voice.voice_id)
11     fine_tuning_state = None
12     if voice_details.fine_tuning and voice_details.fine_tuning.state:
13         fine_tuning_state = voice_details.fine_tuning.state.get("eleven_multilingual_v2")
14 
15     if fine_tuning_state:
16         progress = None
17         if voice_details.fine_tuning.progress and voice_details.fine_tuning.progress.get("eleven_multilingual_v2"):
18             progress = voice_details.fine_tuning.progress.get("eleven_multilingual_v2")
19         print(f"Fine tuning progress: {progress}")
20 
21         if fine_tuning_state == "fine_tuned" or fine_tuning_state == "failed":
22             print("Fine tuning completed or failed")
23             break
24     # Wait for 5 seconds before polling again
25     time.sleep(5)

Use the newly created voice

Once the PVC is verified, you can use it in the same way as any other voice. See the Text to Speech quickstart for more information on how to use a voice.

Next steps

Explore the API reference for more information on creating a Professional Voice Clone.