Summary:
- Introduction to Generative AI and Its Branches
- General Generative AI Terms
- Audio-specific Generative AI Terms
- Video-specific Generative AI Terms
- Other Specific Applications
- Frequently Asked Questions (FAQ)
Introduction to Generative AI
So, recently it seems everybody is talking about generative AI. Large language and text-to-image models like ChatGPT, Stable Diffusion or Midjourney have caused much fuss in the tech world, and beyond. Many include them among the most significant recent developments in AI. And whether or not you agree, the general sentiment seems to be that something very all-powerful has appeared.
Broadly speaking, generative AI refers to a class of machine learning models that are capable of creating new content, whether that be text, images, music, or voices. This 'generative' process involves the model learning from existing data and then using its understanding to generate new content. The type of content these models can produce depends on the content they've been trained on.
The groundwork for this explosion of AI capabilities was laid when “deep learning” became popular and the magic mix of vast datasets and powerful computers running neural networks dramatically improved computers’ abilities to recognise images, process audio and play games. So much so that by the late 2010s computers could do many of these tasks better than any human.
At ElevenLabs, we primarily focus on the audio aspect, but generative AI has made significant advancements in various fields:
- Text: Examples include Chat-GPT, Bard.
- Image: Noteworthy technologies are Stable Diffusion, Midjourney, DALL-E.
- Voice: ElevenLabs