Our models are non-deterministic, meaning outputs can vary based on inputs. While we strive to enhance predictability, some variability is inherent. This guide outlines common issues and preventive measures.
If the generated voice output varies in volume or tone, it is often due to inconsistencies in the voice clone training audio.
To minimize issues, consider breaking your text into smaller segments. This approach helps maintain consistent volume and reduces degradation over longer audio generations. Utilize our ElevenCreative Studio feature to generate several smaller audio segments simultaneously, ensuring better quality and consistency.
Refer to our guides for optimizing Instant and Professional Voice Clones for best practices and advice.
The multilingual models may rarely mispronounce certain words, even in English. This issue appears to be somewhat arbitrary but seems to be voice and text-dependent. It occurs more frequently with certain voices and text, especially when using words that also appear in other languages.
The AI can sometimes switch languages or accents throughout a single generation, especially if that generation is longer in length. This issue is similar to the mispronunciation problem and is something we are actively working to improve.
The models may mispronounce certain numbers, symbols and acronyms. For example, the numbers “1, 2, 3” might be pronounced as “one,” “two,” “three” in English. To ensure correct pronunciation in another language, write them out phonetically or in words as you want them to be spoken.
Corrupt speech is a rare issue where the model generates muffled or distorted audio. This occurs unpredictably, and we have not identified a cause. If encountered, regenerate the section to resolve the issue.
Audio quality may degrade during extended text-to-speech conversions, especially with the Multilingual v1 model. To mitigate this, break text into sections under 800 characters.
For some voices, this voice setting can lead to instability, including inconsistent speed, mispronunciation and the addition of extra sounds. We recommend keeping this setting at 0, especially if you find you are experiencing these issues in your generated audio.
The import function attempts to import the file you provide to the website. Given the variability in website structures and book formatting, including images, always verify the import for accuracy.
Occasionally, glitches or sharp breaths may occur between paragraphs. This is rare and differs from standard Text to Speech issues. If encountered, regenerate the preceding paragraph, as the problem often originates there.
If an issue persists after following this troubleshooting guide, please contact our support team.