Generative AI: Terms and Definitions

Aug 17, 2023 • 8 minutes reading time

Everything to do with the magic mix of vast datasets and powerful computers.

Summary:

Introduction to Generative AI and Its Branches
General Generative AI Terms
Audio-specific Generative AI Terms
Video-specific Generative AI Terms
Other Specific Applications
Frequently Asked Questions (FAQ)

Introduction to Generative AI

So, recently it seems everybody is talking about generative AI. Large language and text-to-image models like ChatGPT, Stable Diffusion or Midjourney have caused much fuss in the tech world, and beyond. Many include them among the most significant recent developments in AI. And whether or not you agree, the general sentiment seems to be that something very all-powerful has appeared.

Broadly speaking, generative AI refers to a class of machine learning models that are capable of creating new content, whether that be text, images, music, or voices. This 'generative' process involves the model learning from existing data and then using its understanding to generate new content. The type of content these models can produce depends on the content they've been trained on.

The groundwork for this explosion of AI capabilities was laid when “deep learning” became popular and the magic mix of vast datasets and powerful computers running neural networks dramatically improved computers’ abilities to recognise images, process audio and play games. So much so that by the late 2010s computers could do many of these tasks better than any human.

At ElevenLabs, we primarily focus on the audio aspect, but generative AI has made significant advancements in various fields:

Text: Examples include Chat-GPT, Bard.
Image: Noteworthy technologies are Stable Diffusion, Midjourney, DALL-E.
Voice: ElevenLabs

Try ElevenLabs Free Today

Music: MusicLM is making waves, and soon, ElevenLabs will be joining the scene.
Video: Gen1 is a notable mention.
Code: Codex is a leader in generative code AI.
Chemistry: AlphaFold is making revolutionary changes in the world of molecular structures.

General AI Terms

Artificial intelligence (AI): The simulation of human intelligence in machines, enabling them to perform tasks that typically require human intelligence, such as visual perception and decision-making.
AI as a service (AIaaS): Offering AI services via cloud computing, allowing individuals and businesses to use AI tools without the associated infrastructure costs.
AI bias: Unwanted and often harmful biases in AI outputs due to biases in data, training, or algorithms.
AI governance: The framework for ensuring AI systems operate within defined ethical and technical bounds.
Data privacy: Ensuring that personal data shared with AI systems remains confidential and is not misused.
Deep learning: A subset of ML that uses neural networks with multiple layers to analyze various factors of data.
Enterprise AI: AI tools and applications specifically designed and implemented for business operations.
Explainability/interpretability: The extent to which a machine's actions and decisions can be understood by humans.
Fine-tuning: The process of refining a pre-trained model on a smaller, specific dataset.
Model: In machine learning, a model is the output of a machine learning algorithm run on data.
Machine learning (ML): An AI method where algorithms allow computers to learn from and act on data.
Neural networks: Systems of algorithms that seek to recognize patterns in datasets.
Supervised learning: A type of machine learning where the model is trained on labeled data.
Training: The process where a machine learning algorithm learns from data
Unsupervised learning: ML where the model looks for patterns in a dataset with no labels.
Robustness: The ability of an AI system to continue functioning accurately under adversarial or changing conditions.
Token: A sequence of characters in text processing that the software treats as a single entity.

Generative Audio AI Terms

Speech Synthesis: This refers to the artificial production of human speech. Typically achieved through computer algorithms, speech synthesis is used in a variety of applications, from voice assistants to screen readers. Speech Synthesis is often used as a synonym for Text to Speech, Voice Generation, Text Reader etc.
Instant Voice Cloning: An advanced feature provided by ElevenLabs, it allows for the rapid replication of a voice based on a small sample. This voice clone can then be used to generate new speech using Speech Synthesis technology.
Professional Voice Cloning (PVC): Developed by ElevenLabs, PVC goes beyond instant voice cloning by creating a detailed and perfect digital replica of a person's voice. It involves a process called fine-tuning which often requires a more comprehensive set of voice samples and training to achieve the highest fidelity.

Voice Design: A voice creation feature developed by ElevenLabs - Voice Design allows for generation of new synthetic voices based on user-chosen parameters, such as age, gender and accent. These voices are produced using complex algorithms that sample voice characteristics at random and do not replicate any person's real voice. Voices created this way remain consistent in speech characteristics across languages supported by the Eleven Multilingual v1 & v2 speech synthesis models.
VoiceLab: A proprietary platform by ElevenLabs that facilitates the creation and manipulation of voice models, especially in the realm of voice cloning and Voice Design.
Voice Library: An initiative by ElevenLabs, the Voice Library is a platform that allows users to share, discover, and collaborate using a vast collection of voices. Users can earn rewards when their shared voices are used by others.

Eleven Multilingual v1: The initial version of ElevenLabs' multilingual model, offering users the capability to generate speech in 8 languages using a singular voice model - English, Polish, German, Spanish, French, Italian, Hindi and Portuguese.
Eleven Multilingual v2: The advanced version of ElevenLabs' multilingual offering, expanding upon the features and languages supported in the v1 model to Chinese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Japanese, Ukrainian, Greek, Czech, Finnish, Romanian, Danish, Bulgarian, Malay, Slovak, Croatian, Classic Arabic and Tamil.

Speaker embedding: A mechanism used for encoding the characteristics of a specific voice. Speaker embeddings act as the identity carrier for a voice model. They provide a vector representation of a speaker's unique voice characteristics, ensuring that generated speech maintains the voice's unique attributes.

Generative Text AI Terms

Chatbot: A computer program designed to simulate human conversation.
Generative pre-trained transformer (GPT): A type of language model used in natural language processing.
Hallucination: When a model generates information not present in its training data.
Language models (LMs): Models that can predict the next word in a sentence.
Large language models (LLMs): Highly advanced and large LMs capable of understanding and generating human-like text.
Natural language processing (NLP): The branch of AI that helps machines understand and respond to human language.
Sentiment analysis: The use of natural language processing to determine the sentiment or mood conveyed in a piece of text.
Transformer models: A type of deep learning model primarily used in NLP tasks.
Self-attention: A mechanism in transformer models allowing them to focus on different parts of the input data.

Other AI Terms

Automated machine learning (AutoML): The process of automating end-to-end the process of machine learning.
Data augmentation: Techniques that increase the amount of training data using information only in the original training set.
Edge AI: AI algorithms that are processed locally on a hardware device.
Reinforcement learning: A type of machine learning where agents learn by interacting with their environment.
Transformer: A model architecture, particularly in NLP, known for its self-attention mechanism.

TEXT TO SPEECH

A blue sphere with a black arrow pointing to the right, next to a white card with a blue and black abstract wave design.

Our AI text to speech technology delivers thousands of high-quality, human-like voices in 70+ languages. Whether you’re looking for a free text to speech solution or a premium voice AI generator for commercial projects, our TTS tools & APIs can meet your needs

Conclusion

As the world continues to evolve at a rapid pace, so does the landscape of artificial intelligence. Generative AI, a cornerstone in this new wave of technology, offers vast potential in transforming how we create, communicate, and consume information. From the intricacies of speech synthesis and voice design to the complexities of large language models and transformers, generative AI is reshaping industries and redefining limits.

At ElevenLabs, we're proud to be at the forefront of this technological surge, especially in the domain of audio AI. With our suite of offerings, from Professional Voice Cloning to the expansive Eleven Multilingual models, we strive to harness the power of generative AI for practical, groundbreaking applications.

Ready to get started? Sign up for ElevenLabs today.

TEXT TO SPEECH

FAQs

Deep learning is a subset of machine learning that utilizes neural networks, especially deep neural networks with many layers, to analyze and process data.

AI bias can result in discriminatory, unfair, or harmful outputs which may perpetuate existing stereotypes or inaccuracies.

Generative AI is specifically designed to create new content, whether it be text, images, voice, or other forms, often resembling or based on its training data.

Yes, AI governance establishes ethical and technical guidelines that AI systems must adhere to, ensuring they operate within responsible and defined bounds.

No, specific AI models are optimized for particular tasks. It's essential to select a model that aligns with the desired application for optimal results.