Professional Voice Cloning

Learn how to clone a voice using the Clone Voice API.

This guide will show you how to create a Professional Voice Clone (PVC) using the PVC API. To create a PVC via the dashboard, refer to the Professional Voice Clone product guide.

Creating a PVC requires you to be on the Creator plan or above.

For an outline of the differences between Instant Voice Clones and Professional Voice Clones, refer to the Voices capability guide.

If you are unsure about what is permissible from a legal standpoint, please consult the Terms of Service and our AI Safety information for more information.

In terms of creating a PVC via the API, it contains considerably more steps than creating an Instant Voice Clone. This is due to the fact that PVCs are more complex and require more data and fine-tuning to create a high quality clone.

Using the Professional Voice Clone API

1

Create an API key

Create an API key in the dashboard here, which you’ll use to securely access the API.

Store the key as a managed secret and pass it to the SDKs either as a environment variable via an .env file, or directly in your app’s configuration depending on your preference.

.env
1ELEVENLABS_API_KEY=<your_api_key_here>
2

Install the SDK

We’ll also use the dotenv library to load our API key from an environment variable.

1pip install elevenlabs
2pip install python-dotenv
3

Create a PVC voice

Create a new file named example.py or example.mts, depending on your language of choice and add the following code to create a PVC voice:

1# example.py
2import os
3import time
4import base64
5from contextlib import ExitStack
6from io import BytesIO
7from dotenv import load_dotenv
8from elevenlabs.client import ElevenLabs
9
10load_dotenv()
11
12elevenlabs = ElevenLabs(
13 api_key=os.getenv("ELEVENLABS_API_KEY"),
14)
15
16voice = elevenlabs.voices.pvc.create(
17 name="My Professional Voice Clone",
18 language="en",
19 description="A professional voice clone of my voice"
20)
21
22print(voice)
4

Upload audio files

Next we’ll upload the audio sample files that will be used to train the PVC. Review the Tips and suggestions section of the PVC product guide for more information on how to get best results from your audio files.

1# Define the list of file paths explicitly
2# Replace with the paths to your audio and/or video files.
3# The more files you add, the better the clone will be.
4sample_file_paths = [
5 "/path/to/your/first_sample.mp3",
6 "/path/to/your/second_sample.wav",
7 "relative/path/to/another_sample.mp4"
8]
9
10samples = None
11
12files_to_upload = []
13# Use ExitStack to manage multiple open files
14with ExitStack() as stack:
15 for filepath in sample_file_paths:
16 # Open each file and add it to the stack
17 audio_file = stack.enter_context(open(filepath, "rb"))
18 filename = os.path.basename(filepath)
19
20 # Create a File object for the SDK
21 files_to_upload.append(
22 BytesIO(audio_file.read())
23 )
24
25 samples = elevenlabs.voices.pvc.samples.create(
26 voice_id=voice.voice_id,
27 files=files_to_upload # Pass the list of File objects
28 )
5

Begin speaker separation

This step will attempt to separate the audio files into individual speakers. This is required if you are uploading audio with multiple speakers.

1sample_ids_to_check = []
2for sample in samples:
3 if sample.sample_id:
4 print(f"Starting separation for sample: {sample.sample_id}")
5 elevenlabs.voices.pvc.samples.speakers.separate(
6 voice_id=voice.voice_id,
7 sample_id=sample.sample_id
8 )
9 sample_ids_to_check.append(sample.sample_id)
10
11while sample_ids_to_check:
12 # Create a copy of the list to iterate over, so we can remove items from the original
13 ids_in_batch = list(sample_ids_to_check)
14 for sample_id in ids_in_batch:
15 status_response = elevenlabs.voices.pvc.samples.speakers.get(
16 voice_id=voice.voice_id,
17 sample_id=sample_id
18 )
19 status = status_response.status
20 print(f"Sample {sample_id} status: {status}")
21 if status == "completed" or status == "failed":
22 sample_ids_to_check.remove(sample_id)
23
24 if sample_ids_to_check:
25 # Wait before the next poll cycle
26 time.sleep(5) # Wait for 5 seconds
27
28print("All samples have been processed or removed from polling.")
6

Retrieve speaker audio

Since the previous step will take some time to complete, the following step should be run in a separate process after the previous step has completed.

Once speaker separation is complete, you will have a list of speakers for each sample. In the case of samples with multiple speakers, you will have to pick the speaker you want to use for the PVC. To identify the speaker, you can retrieve the audio for each speaker and listen to them.

1# Get the list of samples from the voice created in Step 3
2voice = elevenlabs.voices.get(voice_id=voice_id)
3
4samples = voice.samples
5
6# Loop over each sample and save the audio for each speaker to a file
7speaker_audio_output_dir = "path/to/speakers/"
8if not os.path.exists(speaker_audio_output_dir):
9 os.makedirs(speaker_audio_output_dir)
10
11for sample in samples:
12 speaker_info = elevenlabs.voices.pvc.samples.speakers.get(
13 voice_id=voice.voice_id,
14 sample_id=sample.sample_id
15 )
16
17 # Proceed only if separation is actually complete
18 if getattr(speaker_info, 'status', 'unknown') != "completed":
19 continue
20
21 if hasattr(speaker_info, 'speakers') and speaker_info.speakers:
22 speaker_list = speaker_info.speakers
23 if isinstance(speaker_info.speakers, dict):
24 speaker_list = speaker_info.speakers.values()
25
26 for speaker in speaker_list:
27 audio_response = elevenlabs.voices.pvc.samples.speakers.audio.get(
28 voice_id=voice.voice_id,
29 sample_id=sample.sample_id,
30 speaker_id=speaker.speaker_id
31 )
32
33 audio_base64 = audio_response.audio_base_64
34 audio_data = base64.b64decode(audio_base64)
35 output_filename = os.path.join(speaker_audio_output_dir, f"sample_{sample.sample_id}_speaker_{speaker.speaker_id}.mp3")
36
37 with open(output_filename, "wb") as f:
38 f.write(audio_data)
7

Update samples with speaker IDs

Once speaker separation is complete, you can update the samples to select which speaker you want to use for the PVC.

1elevenlabs.voices.pvc.samples.update(
2 voice_id=voice.voice_id,
3 sample_id=sample.sample_id,
4 selected_speaker_ids=[speaker.speaker_id]
5)
8

Verify the PVC

Before training can begin, a verification step is required to ensure you have permission to use the voice. First request the verification CAPTCHA.

1captcha_response = elevenlabs.voices.pvc.verification.captcha.get(voice.voice_id)
2
3# Save captcha image to file
4captcha_buffer = base64.b64decode(captcha_response)
5with open('captcha.png', 'wb') as f:
6 f.write(captcha_buffer)

The image contains several lines of text that the voice owner will need to read out loud and record. Once done, submit the recording to verify the identity of the voice’s owner.

1elevenlabs.voices.pvc.verification.captcha.verify(
2 voice_id=voice.voice_id,
3 recording=open('path/to/recording.mp3', 'rb')
4)
9

(Optional) Request manual verification

If you are unable to verify the CAPTCHA, you can request manual verification. Note that this will take longer to process.

This should only be used if the previous verification steps have failed or are not possible, for instance if the voice owner is visually impaired.

For a list of the files that are required for manual verification, please contact support as each case may be unique.

1elevenlabs.voices.pvc.verification.request(
2 voice_id=voice.voice_id,
3 files=[open('path/to/verification/files.txt', 'rb')],
4)
10

Train the PVC

Next, begin the training process. This will take some time to complete based on the length and number of samples provided.

1elevenlabs.voices.pvc.train(
2 voice_id=voice.voice_id,
3 # Specify the model the PVC should be trained on
4 model_id="eleven_multilingual_v2"
5)
6
7# Poll the fine tuning status until it is complete or fails
8# This example specifically checks for the eleven_multilingual_v2 model
9while True:
10 voice_details = elevenlabs.voices.get(voice_id=voice.voice_id)
11 fine_tuning_state = None
12 if voice_details.fine_tuning and voice_details.fine_tuning.state:
13 fine_tuning_state = voice_details.fine_tuning.state.get("eleven_multilingual_v2")
14
15 if fine_tuning_state:
16 progress = None
17 if voice_details.fine_tuning.progress and voice_details.fine_tuning.progress.get("eleven_multilingual_v2"):
18 progress = voice_details.fine_tuning.progress.get("eleven_multilingual_v2")
19 print(f"Fine tuning progress: {progress}")
20
21 if fine_tuning_state == "fine_tuned" or fine_tuning_state == "failed":
22 print("Fine tuning completed or failed")
23 break
24 # Wait for 5 seconds before polling again
25 time.sleep(5)
11

Use the newly created voice

Once the PVC is verified, you can use it in the same way as any other voice. See the Speech to Text quickstart for more information on how to use a voice.

Next steps

Explore the API reference for more information on creating a Professional Voice Clone.