- Voice AI platform ElevenLabs takes a radical leap forward in its efforts to eliminate the linguistic barriers of content with the release of a new foundational deep learning model supporting multilingual capabilities across 28 languages - the Eleven Multilingual v2
- The advancement will allow media companies, game developers, publishers and independent creators across the world to dramatically improve the accessibility of their content
- These new capabilities, which follow a swath of new feature releases and improvements since the platform launched in January, also mark the official end of the company’s Beta phase
- ElevenLabs’ mission is to make all content universally accessible in any language and in any voice
ElevenLabs, the world-leader in voice AI software, has today launched a new multilingual voice generation model capable of accurately producing ‘emotionally rich’ AI audio in nearly 30 languages.
The advancement, based entirely on in-house research, will allow creators to produce localized audio content for international markets across Europe, Asia and the Middle East. ElevenLabs has spent the last 18 months analyzing the markers of human speech, building new mechanisms for understanding context and conveying emotions in speech generation, as well as synthesizing new, unique voices.
With Eleven Multilingual v2, when text is inputted to the ElevenLabs text-to-speech platform, the new model can automatically identify nearly 30 written languages and generate speech in them with an unprecedented level of authenticity.
At the same time, regardless of whether a synthetic voice or cloned voice is being used, the speaker’s unique voice characteristics are maintained across all languages, including their original accent. This means the same voice can be used to bring content to life across 28 separate languages.
This roll-out follows the public release of Professional Voice Cloning to all creators on the platform. This product update, which was made available alongside additional safety and security features, allows users to create a perfect digital copy of their own voice; one that’s virtually indistinguishable from the original. Today’s release means your voice will be able to speak across the almost 30 languages offered by the multilingual model.
Supported languages now include; Chinese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Japanese, Ukrainian, Greek, Czech, Finnish, Romanian, Danish, Bulgarian, Malay, Slovak, Croatian, Classic Arabic and Tamil.
Following recent feature launches and ongoing improvements to the platform, ElevenLabs has also confirmed today that the platform is officially coming out of Beta. This transition marks a pivotal moment in the company’s dedication to provide reliable and cutting-edge tools for its 1 million+ global users.
Looking ahead, ElevenLabs plans to introduce a mechanism that allows users to share voices on the platform and benefit from the development of new audio, fostering opportunities for human-AI collaboration.
Mati Staniszewski, CEO and co-founder of ElevenLabs, comments:
ElevenLabs was started with the dream of making all content universally accessible in any language and in any voice. With the release of Eleven Multilingual v2, we are one step closer to making this dream a reality and making human-quality AI voices available in every dialect.
“Our text-to-speech generation tools help level the playing field and bring top quality spoken audio capabilities to all the creators out there. Those benefits now extend to multilingual applications across almost 30 languages. Eventually we hope to cover even more languages and voices with help of AI, and eliminate the linguistic barriers to content. At ElevenLabs, we believe these leaps in accessibility will ultimately foster greater creativity, innovation, and diversity.
By lowering the cost and resources needed to create high-quality audio content in multiple languages, ElevenLabs is enabling companies and creators to produce more imaginative and accessible content which resonates across cultures and languages.
For independent game developers and publishers, the multilingual speech generation tool provides new opportunities to translate game experiences and audio content for international audiences, connecting with players and listeners in their own languages without compromising on quality or accuracy of the spoken audio.
Similarly, educational institutions now have the means to provide learners with accurate audio content in target languages instantly, bolstering language comprehension and pronunciation skills, as well as catering to different teaching styles and learning needs for international students.
Creators of all types can use ElevenLabs’ tool to improve content accessibility for people with visual impairments or additional learning needs by supplementing visual content with speech available in multiple languages.
Its initial suite of AI voice tools unveiled in January 2023 included the ability to turn any text into speech via a selection of pre-designed, synthetic voices and the ability to create a clone of your own voice. The multilingual speech synthesis tool is another step forward on ElevenLabs’ mission to make all content universally accessible in any language and in any voice.
The technology has already been embraced across multiple creative verticals and sectors, including enabling indie authors to create audio books, voicing secondary characters in video games, supporting the visually impaired to access online written content, and powering the world’s first AI radio channel. ElevenLabs has also partnered with a range of leading content creators and studios, including AI video generators D-ID, one of the world largest audiobook publishers Storytel, open-access science video platform ScienceCast whose video generation tool condenses scientific research papers published on arXiv, leading global content creator platform TheSoul Publishing, incredible game developers like Embark Studios and Paradox Interactive, and the media platform MNTN.