We're thrilled to announce the addition of six new voices to our pre-made collection, available for all users: Bill, Drew, George, Lily, Molly, and Paul. These voices are versatile and suitable for various styles such as News, Narration, and Documentaries. Please note that these voices might sounds slightly different next week as we upgrade our model. Concurrently, we're deprecating the voices of Elli, Bella, and Matthew as part of our ongoing effort to enhance and update our voice offerings.
We're excited to unveil our latest innovation: Speech to Speech, powered by our advanced Eleven English v2 model. This groundbreaking tool allows you to merge the content and style of an audio clip you upload with a selected voice of your choice. The Eleven English v2 model is now readily accessible to all our platform users. Experience the future of voice synthesis today!
We are very excited to announce the release of AudioNative, our state-of-the-art audio player. AudioNative is designed to provide publishers, bloggers, and content creators with a seamless way to integrate audio into their websites. With AudioNative, you can effortlessly convert your articles into audio and embed them into your website with a single line of code. AudioNative is now available to everyone with a Creator subscription or higher and comes with advanced metrics and analytics dashboard for each AudioNative enabled project.
We are excited to introduce our latest offering: the Eleven v2 Turbo model. Engineered exclusively for English, this model marries the exceptional speech quality of our Eleven Multilingual v2 with an impressive latency of approximately 400ms.
We've now incorporated the option for 8kHz μ-law encoded audio output through our API.
We've enhanced our Projects feature to now support automatic ACX audio normalization.
We've introduced a feature that allows users to specify metadata for Projects. This metadata will be automatically embedded in the downloaded files
We've enhanced the accuracy of our dubbing system's translations and addressed issues related to talking speed.
We're thrilled to announce the launch of "Dubbing," our latest state-of-the-art technology innovation. Designed with utmost precision and advanced algorithms, Dubbing seamlessly converts your content into a diverse range of languages with automatic voiceovers.
We've introduced a feature that allows for pauses in speech through our API and the Speech Synthesis page. To add a three-second pause using SSML, you can for example input: "Martin walked down the street <break time="1.5s"/> he looked into the camera and laughed." Breaks currently have a maximum duration of three seconds and can be passed directly into the text passed to our API. They will be billed at 10 characters each.
We're excited to introduce Projects – our state-of-the-art long-form speech synthesis editor. With Projects, you can effortlessly transform articles or entire books into audio in mere minutes. This feature is now available to everyone with a Creator subscription or higher.
We added the ability for users to protect their accounts using Two-Factor Authentication.
We have introduced a new output audio format - PCM for normal, streaming and websockets endpoints. This format comes with 4 options for sampling rate: 16kHz, 22.05kHz, 24kHz and 44.1kHz. Default audio format remains to be mp3.
We have reworked the Voice Library to make it easier to use and more accessible. This includes new design, top level use-case categories, search, review of all existing voices and moderation queue for the new ones.
We are excited to introduce Eleven Multilingual v2, our latest state-of-the-art speech synthesis model. With unparalleled speech synthesis capabilities, Eleven Multilingual v2 supports 29 languages. In the upcoming days, we will release an update that further enhances its speed.
It is now possible to share professionally cloned voices through the Voice Library.
A dashboard displaying in-depth character usage trends and additional metrics is now accessible to users subscribed to the Independent Publisher tier and higher.
We've enhanced our platform by incorporating over 30 additional voices, covering an extensive array of styles and applications, now readily accessible to all users at no cost. These voices are professionally crafted, boasting exceptional quality to elevate your user experience.
We're releasing Projects, our advanced speech synthesis editor, into alpha for select users.
We are thrilled to announce the release of our AI Speech Classifier. This tool empowers users to effortlessly discern whether an audio clip has been generated by ElevenLabs. Best of all, the AI Speech Classifier is accessible to everyone, absolutely free of charge.
We have introduced the Voice Library feature, which offers our subscribed users an exciting opportunity to share their voices and earn characters concurrently. Moving forward, we have plans to augment the Voice Library by incorporating the ability to share voices that have been professionally replicated with high fidelity. This will open up new avenues for users to exchange and experience a diverse range of high-quality voices.
The form for adding professional voices now includes a language selection option, which allows you to specify the language used in the uploaded samples. The text displayed in the voice verification prompt reflects the selected language.
You can now share your generated voices and earn free characters for their usage by community!
The GET history items endpoint now supports pagination.
We added an endpoint to get a history item by its ID.
Both text-to-speech endpoints (streaming and non-streaming) now also return the history_item_id as response header. Please note that there is some slight delay until the sample is accessible through the history after the text-to-speech endpoint was called.
Our new multilingual model for Speech Synthesis now supports 7 additional languages: Spanish, French, Italian, German, Polish, Hindi and Portuguese. Multilingual TTS now also allows for generating speech in multiple languages using a single prompt.
Creator subscribers and above now enjoy higher quality 96 kbps audio outputs.
Professional Voice Cloning will be released later this year, allowing users on the Creator, Independent Publisher and Growing Business plans to create a near-perfect digital version of their own voice. While Instant Cloning lets you clone voices from very short samples, Professional Cloning requires more audio data for training but it produces much higher fidelity output - the speech produced by the model is almost indistinguishable from the original.
You can submit your recordings for training right now through VoiceLab. We will sequentially release the Pro Voices to individual accounts on a first-come, first-serve basis, beginning in July. Read more in FAQ.
We are introducing a rewards system for providing feedback. With each generation, the audio player now displays thumbs-up/down icons next to the download button. Users can now earn character rewards for providing feedback on model output.
We are introducing Voice Design, a random voice generator that can create in infinite amount of voices based on your conditions.
Voice settings now appear in history items for API users.
We are lowering the maximum character limit to 2,500 characters within one generation. That limit is temporary across all tiers and we are working to increase it back up for creator tier over the next few days.
With this release, the speech synthesis will get even better! We have added changes to how we train the model resulting in even better results on longer fragments. You can head to usualpanel, to test it out straight away! Our core changes include:
It is expected that your cloned or default Voices will result in minor changes. Enjoy!
To allow you more control on your voices we are adding Voice Settings as part of Speech Synthesis, where you can control how similar and varied the voice is!
We are adding 2 settings that will help you control your voices - this has biggest effect on the voices you clone. One is Stability - which allows you decide on variability of the voice - whether you want it to be more stable and similar between regenerations, or more varied, or expressive. The more varied it is, it might result in slight artifacts and with higher stability, the speech will be cleaner. On the other hand you can also adjust the similarity - the more similar it is, the more it will depend on the voice samples you have uploaded. Similarly, if the samples aren't clean it might result in speech artifacts. If prefer for the voice to be more clear, but similar to generic voices then move this slider to the left!