![](/_next/image?url=https%3A%2F%2Feleven-public-cdn.elevenlabs.io%2Fpayloadcms%2Felevenlabs-voice-cloning-card.jpg&w=3840&q=95)
Automate video voiceovers, ad reads, podcasts, and more, in your own voice
OpenAI has been expanding its portfolio with new products, and one of the most talked about is their Voice Assistant technology. It's set to revolutionize how we interact with machines using voice, yet much about its broad deployment remains under wraps.
Allegedly, OpenAI is developing a technology that integrates audio, text, and image recognition capabilities into a single product. This technology could, for example, assist children with their math homework or provide users with practical information about their immediate environment, such as language translation or vehicle repair guidance.
The rumoured Voice Assistant is designed to naturally interact with users through speech. It leverages advancements in Automatic Speech Recognition (ASR), Large Language Models (LLMs), and Text to Speech (TTS) systems. The integration of these technologies allows the Voice Assistant to understand spoken input, process the information contextually, and respond in a natural, human-like voice.
OpenAI is expected to demo a real-time voice assistant tomorrow. What does it take to deliver an immersive, or even magical experience?
— Jim Fan (@DrJimFan) May 12, 2024
Almost all voice AI go through 3 stages:
1. Speech recognition or "ASR": audio -> text1, think Whisper;
2. LLM that plans what to say next:… pic.twitter.com/q41KlGKM42
Almost all voice AI systems follow three steps:
Adhering strictly to these three stages can lead to significant delays. If users have to wait five seconds for each response, the interaction becomes cumbersome and unnatural, diminishing the user experience even if the audio sounds realistic.
Effective natural dialogue doesn't operate sequentially:
Enhancing real-time dialogue isn't just about speeding up each neural network process; it requires a fundamental redesign of the entire system. We need to maximize the overlap of these components and learn to make real-time adjustments effectively.
OpenAI seems to be working on having phone calls inside of chatGPT. This is probably going to be a small part of the event announced on Monday.
— Ananay (@ananayarora) May 11, 2024
(1/n) pic.twitter.com/KT8Hb54DwA
Apparently, the Apple - OpenAI deal just closed! One day before the voice assistant announcement :)
— Bindu Reddy (@bindureddy) May 13, 2024
Guess Apple decided that it couldn't make it on its own 🤷
The new Siri will be from OpenAI pic.twitter.com/Yfr6oCJiwQ
The potential applications of this technology are vast, ranging from personal and business uses to helping community health workers provide better services by interacting in local languages or aiding individuals with speech impairments.
Rumors suggest that this technology could potentially be integrated into systems like Apple's iOS, offering a more seamless and interactive user experience than Siri. However, details on such collaborations or the full capabilities of the Voice Assistant have not been officially confirmed.
One thing that is certain to feature in any advanced voice assistant is cutting-edge voice AI. ElevenLabs models combine proprietary methods for context awareness and high compression to deliver ultra-realistic, lifelike speech across a range of emotions and languages. Our contextual text to speech model is built to understand word relationships and adjusts delivery based on context. It also has no hardcoded features, meaning it can dynamically predict thousands of voice characteristics while generating speech. Our models are optimised for particular applications, such as long-form and multilingual speech generation or latency-sensitive tasks.
Sign up to access a professional AI audio toolkit and start creating content or building applications now!
Automate video voiceovers, ad reads, podcasts, and more, in your own voice
Over 60 companies are working to strengthen Europe’s role in global AI
Calls now start at 10 cents per minute — an ~50% discount across Starter, Creator and Pro plans