How to bring Veo 2 videos to life with ElevenLabs voiceovers and sound effects

This article explores how to use ElevenLabs' AI voiceovers and sound effects to enhance Google's Veo 2 photorealistic videos, creating truly immersive viewing experiences.

Google's Veo 2 makes generating photorealistic videos easier than ever, but visuals alone aren't enough. Sound transforms a silent sequence into a fully immersive experience, and that's where ElevenLabs comes in. With ElevenLabs, generating a dynamic AI voiceover in a range of languages or adding sound effects can transform a simple video into a captivating story.

I tried to do just that when I used Veo 2 from Google's DeepMind lab to tell the story of a city that never sleeps. I generated 18 different clips, each around 5-8 seconds long, with a focus on urban settings. The clips feature neon signs, rain, a train, and various street scenes. To bring these fragmented moments together, I added a voiceover and sound effects using ElevenLabs.

Crafting a Captivating Voiceover

A well-crafted AI voiceover brings structure and emotional depth to your video. While it might be the best video generator for realism, Veo 2 clips often lack scene or character consistency, making narration the perfect unifying element.

Instead of leaving the viewer to interpret fragmented visuals, a carefully designed voiceover provides clarity, guiding them through the story. You can either start with the voiceover script then create clips to match, or start with the shots (usually from a storyboard) then write to the shots. For the city video, I created the prompts first.

ElevenLabs' text-to-speech technology ensures professional-grade narration without the need for expensive recording setups. The flexibility to control tone, pacing, and emotion means you can fine-tune your voiceover to fit the mood of your project effortlessly. There are also thousands of voices to choose from to get exactly the right character.

Planning Your Narration

Before generating a voiceover, it's important to plan how narration will complement your video. If, like mine, your Veo 2 sequence is a cinematic urban montage, voiceover can establish setting, add poetic reflection, or enhance the atmosphere.

For example, in my video, I have a scene of neon-lit streets and flickering signs. So I wrote: "The city never sleeps — it barely even blinks. It inhales exhaust fumes and exhales neon light, a beast of steel and glass pulsing with the footsteps of a million restless souls." This bridges together several shots.

Scripting Your Voiceover

Once you've outlined your narration, the next step is scripting for the entire video. A well-written script ensures your voiceover aligns with the timing of your clips. Since Veo 2 scenes are often 5 to 8 seconds long, your narration should be concise and well-paced. A 5-second clip allows for around 12-15 words, while an 8-second clip fits approximately 20-25 words.

The tone of your narration should match your video — poetic for atmospheric visuals, documentary-style for informative sequences, and cinematic for high-energy storytelling. For example, a slow-motion shot of steam rising from a manhole might be narrated with, "The city exhales, steam twisting into the cold night air," while a train pulling into a station might warrant, "A gust of wind. The screech of metal. Another train pulls in, just like the hundreds before it."

Generating Your Voiceover with ElevenLabs

Once your script is polished, it's time to generate your AI voiceover with ElevenLabs. Head over to the text-to-speech page in the ElevenLabs app. Here you can paste your script, or you can write it directly. You can then select a voice on the right, as well as set its speed, stability, and other features. I like to add 10-20% style exaggeration as it improves characterization.

A deep, cinematic tone works well for dramatic urban sequences, while a soft, reflective voice enhances poetic narratives. For fast-paced visuals, an energetic delivery keeps the rhythm engaging. For my video, I used Lamar Lincoln, a premium voice that gave a more natural feel to the story. I wanted it to sound like someone reflecting on something they care about.

After entering your script, fine-tune the speed and emotion of the voiceover to match your visuals. A slower, deliberate pace suits dramatic moments, while a more conversational tone complements an energetic montage. Once you're satisfied with the result, download the audio file and prepare to sync it with your video.

I prefer to use just one or two sentences to fine-tune the voice on, then generate based on the full script. Although in this case, the script was only three paragraphs so using the entire script wasn't so much of an issue. It also worked well from the start.

Syncing Your Voiceover

Syncing the AI voiceover with your Veo 2 clips is a straightforward process using editing software such as Adobe Premiere Pro, DaVinci Resolve, Final Cut Pro, or CapCut.

  • Import your video clips, add the voiceover to the timeline, and adjust the start and end points to align with the visuals.
  • Use crossfades or time-stretching if necessary to ensure a seamless blend between narration and motion.

Enhancing with Sound Effects

Once the voiceover is in place, it's time to enhance your video with sound effects. AI-generated sound effects complete the auditory experience by adding realism and texture. A video clip on its own can be as real as something filmed with a phone, but without sound, it will fall into that unreality chasm that leaves you feeling like something is missing.

Creating Sound Effects with ElevenLabs

ElevenLabs' text-to-sfx generator allows you to create custom audio elements, from ambient city noise to subtle environmental sounds. You can describe a full soundscape with a complex prompt, or generate multiple files each with an individual set of sounds that you then layer in your video editor.

To create the sound effects, head to the ElevenLabs SFX generator. You can explore a list of pre-made sound effects in our library, or create a custom sound using the text-to-sfx generator. You can even simplify the process by trying our video-to-sound experiment. This lets you upload a single clip and it will provide 4 sound effects you can download.

If you want more control over the sounds, head to the sound effects generator. Here you type in a prompt and click generate. You can also customize the duration of the clip from 0.5 and 22 seconds by clicking the Settings button.

Prompting for Sound Effects

While you can give a complex prompt with a fully descriptive explanation of the entire soundscape, I've found it better to create a series of prompts and layer them on top of one another. This lets you control the point different sounds play based on the contents of the video.

A well-placed sound effect makes a scene feel real — footsteps echoing in an alley, the distant honk of a car, or the rhythmic drip of rain on pavement. Pairing these sounds with your visuals enhances immersion, making each frame more impactful.

If your video features a flickering neon sign, a faint electrical buzz in the background reinforces its presence. If a subway train screeches to a halt, layering in metal-on-metal friction adds authenticity.

Prompting Examples:

  • Descriptive Prompt: "Soft ticking of a watch's second hand, faint rustle of coat sleeve adjusting, ambient city noise in the background — muted horns, distant conversation, occasional flicker of neon signs, slight metallic scrape as the wrist turns."
  • Layered Prompts:
    • "Soft ticking of a watch's second hand"
    • "Faint rustle of a coast sleeve adjusting"
    • "Ambient city noise"

You can then stack these on top of each other in your video editor.

Once you have the sound layered and volume for each clip set to create the perfect output, export and share your video.

Whether you're crafting a cinematic montage, a poetic city reflection, or a documentary-style short film, AI-generated audio brings your vision to life. Try ElevenLabs today and transform your Veo 2 video into a fully immersive experience with the power of voice and sound.

Zobacz więcej

ElevenLabs

Twórz z najwyższą jakością dźwięku AI