Avatars | ElevenLabs Documentation

Overview

Avatars are persistent visual identities that combine a person, character, or animal with any ElevenLabs voice to generate talking-head videos with synchronized lip movement. Create reusable identities once and pair them with any voice or script to produce consistent video content at scale.

Avatars are available on all paid plans.

Some Avatar models and reference image upload capabilities are restricted in the United States due to regulatory or provider requirements.

Key capabilities

Persistent identities: Create avatars once and reuse them across unlimited videos
Voice flexibility: Pair any avatar with any voice from your library, including cloned voices
Style variations: Generate multiple styles from a single avatar with different angles, outfits, backgrounds, or lighting
Integrated text to speech: Convert text directly to speech within the avatar interface
Flows integration: Automate avatar video generation at scale using the Avatar node
Multiple lip-sync models: Platform automatically selects the optimal model based on input format and quality requirements

Creating an avatar

Generate a new avatar from reference images, or a text prompt:

Navigate to Avatar creation

Go to Image & Video, and in the Avatar section, click New.

Upload reference images or describe your avatar

Upload multiple reference images of the same person or character from different angles. Higher quality reference images with varied perspectives produce better results.

Upload 3-5 images from different angles for optimal avatar quality.

Alternatively, you can describe your avatar using a text prompt.

Configure and create

Name your avatar and optionally set a default voice. Click Create Avatar to generate the base identity.

Once created, the avatar appears in your library and can be used across projects.

Styles

Styles are variations of an existing avatar that represent different visual contexts:

Camera angles and framing
Outfits and accessories
Backgrounds and environments
Lighting conditions

Creating a style

To create a style for an existing avatar, click View Avatar, then click “New Style”.

You can create styles in two ways:

Prompt it: Describe the new style using a text prompt.

Upload: Upload a reference image to guide the new style while maintaining the core identity.

Styles allow you to maintain brand consistency across different contexts without regenerating the entire avatar.

Generating videos

Generate talking-head videos by pairing any avatar with a voice and script:

Select an avatar

Choose an avatar from your library, and select Create Lip Sync. Choose the style you want to use.

Choose a voice

If you set a default voice for your avatar, this will be pre-selected. You can also use any voice from your library, including community voices, cloned voices, or designed voices.

Voice selection and text to speech interface

Add your script

Enter the text you want the avatar to speak, then click Generate speech. You can listen to the generated speech before moving onto the next step, and regenerate if needed.

You also have the option of selecting a previous Text to Speech generation from your History.

Once you’re happy with your selection, click Use Speech.

Generate

In the next step, you can add an optional prompt to guide the visuals of the lip sync. The platform selects the optimal lip sync model based on your input and quality requirements, but you can also change the model before generating the video. When you’re ready, click Generate to create your lip sync.

Flows integration

The Avatar node in Flows enables automated avatar video generation at scale.

Use cases include:

Personalized video campaigns with dynamic scripts
Batch video generation with consistent branding
Automated content pipelines with voice and visual swapping

Learn more about Flows.

Credit costs

Avatar generation follows the existing Image & Video pricing structure. Costs vary by:

Selected lip-sync model
Output resolution
Video duration

Credit usage is deducted per generation. Check your usage in your usage analytics.

Key facts

Availability: All paid plans
Lip-sync models: Platform automatically selects optimal model
Voice compatibility: Works with all ElevenLabs voices, including cloned voices
Reusability: Avatars and styles persist across unlimited generations
Flows support: Available as an Avatar node for automation
API access: Not available at launch; planned for future release