Pacing and Emotion

Effective techniques to guide ElevenLabs AI in pacing the speech and conveying emotions.

Pacing

Based on varying user feedback and test results, it’s been theorized that using a singular long sample for voice cloning has brought more success for some, compared to using multiple smaller samples. The current theory is that the AI stitches these samples together without any separation, causing pacing issues and faster speech. This is likely why some people have reported fast-talking clones.

To control the pacing of the speaker, you can write in a style similar to that of a book. While it’s not a perfect solution, it can help improve the pacing and ensure that the AI generates a voiceover at the right speed. With this technique, you can create high-quality voiceovers that are both customized and easy to listen to.

"I wish you were right, I truly do, but you're not," he said slowly.

Emotion

If you want the AI to express a specific emotion, the best approach is to write in a style similar to that of a book. To find good prompts to use, you can flip through some books and identify words and phrases that convey the desired emotion.

For instance, you can use dialogue tags to express emotions, such as he said, confused, or he shouted angrily. These types of prompts will help the AI understand the desired emotional tone and try to generate a voiceover that accurately reflects it. With this approach, you can create highly customized voiceovers that are perfect for a variety of applications.

"Are you sure about that?" he said, confused.
"Don’t test me!" he shouted angrily.

You will also have to somehow remove the prompt as the AI will read exactly what you give it. The AI can also sometimes infer the intended emotion from the text’s context, even without the use of tags.

"That is funny!"
"You think so?"

This is not always perfect since you are relying on the AI to understand if something is sarcastic, funny etc from the context of the text.