WaveForms AI announces mission to pass the speech Turing Test
New startup from OpenAI and Google veterans shares ambitious plans for audio AI, with products still in development
Today, WaveForms AI, founded by former OpenAI and Google veterans, announced its mission to develop audio AI systems that can emulate human speech indistinguishably. CEO Alexis Conneau emphasized their goal to pass the “Speech Turing Test,” aiming for a 50% preference score where users cannot discern between human and AI-generated speech. The company is currently in the development phase, with plans to reveal specific products next year.
Note: WaveForms AI shared plans for audio AI, with products still in development. Meanwhile, ElevenLabs' audio AI is available today, delivering production-grade quality.
What is the Speech Turing Test?
The Speech Turing Test is a benchmark for AI audio systems, measuring whether humans can distinguish between AI-generated and human speech. A system passes this test when it achieves a 50% preference score, meaning listeners can’t tell if they’re hearing a person or an AI. ElevenLabs has already made significant strides in achieving this level of indistinguishability, with voices widely recognized for their human-like realism.
How WaveForms AI is tackling the Speech Turing Test
WaveForms AI, founded by former OpenAI and Google veterans, aims to create audio AI systems capable of seamless, human-like communication. Led by Alexis Conneau, the startup focuses on developing models that not only replicate human speech but also capture emotional nuance, making interactions feel more natural and engaging. ElevenLabs’ Text-to-Speech models have set the standard for combining speed and expressiveness, already delivering nuanced and contextually aware speech at scale.
Create human-like voices with our Text to Speech (TTS) system, built for high-quality narration, gaming, video, and accessibility. Expressive voices, multilingual support, and API integration make it easy to scale from personal projects to enterprise workflows.
What is a preference score in AI speech systems?
The preference score gauges the indistinguishability of AI-generated speech from human speech. A 50% score signifies that listeners show no clear preference, effectively marking parity between the two. ElevenLabs has consistently achieved high preference scores, with industry-leading adoption by creators, media, and accessibility organizations.
Why does emotional nuance matter in AI audio?
Current AI voice systems often lose emotional subtleties, limiting their ability to convey empathy or engage meaningfully. WaveForms AI claims to address this with their Audio LLMs, which process audio natively to capture context and emotion, enabling richer communication. ElevenLabs has already demonstrated the importance of emotional nuance, offering tools that allow users to fine-tune tone, expressiveness, and pacing to suit any context.
Your comprehensive workflow for turning books into audiobooks and scripts into podcasts
How is WaveForms AI different from existing AI audio systems?
Unlike traditional Text-to-Speech systems, WaveForms AI's end-to-end Audio LLMs aim to capture the depth and complexity of human interaction. Their focus on Emotional General Intelligence (EGI) introduces a social-emotional layer to AI, prioritizing connection and empathy over basic functionality. ElevenLabs has pioneered breakthroughs in emotional depth and flexibility, with tools designed to handle complex, real-world scenarios while being accessible and available today.
What challenges come with achieving the Speech Turing Test?
Developing indistinguishable AI speech systems poses both technical and ethical challenges. Conneau highlights risks like users forming attachments to AI characters and the broader societal implications of AI’s increasing realism. Addressing these issues responsibly is a key focus for WaveForms AI. ElevenLabs has built safeguards, such as “no-go” voice policies and rigorous content moderation, to responsibly navigate these challenges while delivering cutting-edge technology.
Applications of AI systems designed to pass the Speech Turing Test
WaveForms AI envisions their technology being used across a broad spectrum of applications, including education, customer support, and entertainment. The ability to create human-like voice interactions opens possibilities for more immersive, empathetic experiences in these areas. ElevenLabs is already powering applications across these fields, from accessible education tools to multilingual media localization, showcasing what’s possible with today’s technology.
Translate audio and video while preserving the emotion, timing, tone and unique characteristics of each speaker
The future of AI audio systems
While WaveForms AI's products remain in development, their ambition to redefine AI audio interactions has attracted significant attention, including $40 million in seed funding led by Andreessen Horowitz. As the company works toward solving the Speech Turing Test, its potential to reshape how we interact with technology is immense. ElevenLabs continues to lead in shaping the future of audio AI, delivering solutions that are transforming industries and meeting the needs of users right now.
Add conversational agents to your web, mobile or telephony in minutes. Our realtime API delivers low latency, full configurability, and seamless scalability.
How WaveForms AI Audio compares to ElevenLabs
Looking to support numerous audio generation use cases in the future, WaveForms AI looks like it could become a good general-purpose audio AI toolkit. For now, it remains a product announcement. ElevenLabs, on the other hand, is available today, offering production-grade quality and customization.
Let's briefly assess how WaveForms AI compares against in key areas like Text-to-Speech and sound generation.
Text-to-Speech
ElevenLabs stands as the clear industry leader in Text-to-Speech technology, offering:
Support for 70+ languages with authentic accents and cultural nuances
Advanced emotional intelligence that responds to textual context
Control over voice characteristics
High-quality, human-like speech that maintains consistency across long-form content
ElevenLabs already provides a more streamlined and precise approach to sound effect generation. ElevenLabs offers:
Instant generation of four different samples for each prompt
Precise control through detailed text descriptions
High-quality output suitable for commercial projects
A comprehensive library of common sound effects
The ability to create distinctive effects directly from text descriptions
ElevenLabs delivers specialized excellence in both voice and sound effect generation. As one of the best AI sound effect generators, it produces reliable, production-ready output that better serves professional content creators' needs.
How to use ElevenLabs for Text-to-Speech
Transform your content into professional-quality voiceovers with these simple steps:
Choose your voice: Select from a diverse library of natural-sounding voices
Input your text: Paste or type your script into the interface
Customize settings: Adjust the speed, tone, and emphasis to match your needs
Preview and generate: Listen to a sample and generate your final audio output
Download: Download your high-quality voiceover
Final thoughts
The emergence of AI audio tools like WafeForms and ElevenLabs marks an exciting evolution in content creation. However, while WaveForms AI announced impressive ambitions in experimental sound generation and audio manipulation, it's not yet available to use.
ElevenLabs, on the other hand, is available and production-grade. It's also the leading solution currently on the market for AI Text-to-Speech voice and sound effects generation.
Ready to test out ElevenLabs' AI technology? Sign up today to get started.
Create human-like voices with our Text to Speech (TTS) system, built for high-quality narration, gaming, video, and accessibility. Expressive voices, multilingual support, and API integration make it easy to scale from personal projects to enterprise workflows.
FAQs
Not yet. WaveForms AI is still in the development phase, focusing on creating advanced audio models capable of real-time, emotionally resonant interactions.
The Speech Turing Test measures whether users can distinguish between human speech and AI-generated speech. WaveForms AI aims to pass this test by creating a system that matches human intonation, emotion, and nuance.
The company’s mission is to solve the Speech Turing Test and develop Emotional General Intelligence (EGI), enabling AI to understand and respond to human emotions in natural and meaningful ways.
Unlike traditional Text-to-Speech systems, WaveForms AI is building end-to-end audio models that process sound natively. This approach aims to capture the full emotional and contextual depth of human conversations. ElevenLabs' contextual audio AI has been live since January 2023.
WaveForms AI was co-founded by Alexis Conneau, a former OpenAI engineer, and Coralie Lemaitre, who previously worked in product strategy at Google. Together, they bring expertise in AI and product development to the company.