What makes Fugatto different from other AI audio tools?

While some AI models focus on a single use case, Fugatto's combines and transforms different types of sounds to create entirely new audio experiences.

Can Fugatto replace professional voice actors?

While Fugatto can generate voice content, it's better suited for experimental audio creation rather than professional voiceover work, where specialized tools like ElevenLabs excel.

What hardware is required to run Fugatto?

Fugatto was developed using NVIDIA's advanced hardware, but specific requirements for public use haven't been announced as the tool isn't yet publicly available.

How does ElevenLabs compare to Fugatto for voice generation?

ElevenLabs offers superior voice quality and more precise control for professional content creation, while Fugatto provides broader but less specialized audio capabilities.

What kind of projects is Fugatto best suited for?

Fugatto is ideal for experimental sound design, game development, and creative audio projects where unique and novel sound combinations are desired.

Blog

What is Audio AI Fugatto from NVIDIA?

Sep 17, 2024 • 8 minutes reading time

A cat wearing headphones sitting in front of a computer monitor with colorful lines on the screen.

Key takeaways:

NVIDIA has released a research preview of Fugatto, a new AI model that can generate, transform, and manipulate any combination of music, voices, and sounds using text and audio inputs
The model promises to be a "Swiss Army knife for sound," giving users advanced control over audio creation and manipulation through simple text prompts

Note: Fugatto is currently just a research preview. NVIDIA has not announced plans for a public release. Meanwhile, ElevenLabs' audio AI is available today, delivering production-grade quality.

ElevenLabs Agents

Add conversational agents to your web, mobile or telephony in minutes. Our realtime API delivers low latency, full configurability, and seamless scalability.

NVIDIA has released a research preview of its new AI model that promises to transform how creators generate and manipulate sound. Named Fugatto (short for Foundational Generative Audio Transformer Opus 1).

The research preview states that it can generate music, modify voices, create sound effects, and even produce entirely new sounds that have never been heard before, all through simple text prompts and audio inputs/audio files.

But what could Fugatto potentially be used for, and how does it compare to other leading Text-to-Speech and AI sound generation tools like ElevenLabs?

Use cases for AI Audio Fugatto

Diagram showing a process to create a sound from a text prompt using Fugatto, with input, processing, and audio output.

If the research preview is anything to go by, NVIDIA's foundational generative AI model can be used for audio creation across multiple domains. From enabling video game developers to generate dynamic soundscapes, to helping musicians experiment with unconventional compositions, the model's applications span a wide range of creative and technical possibilities.

Let's explore the key use cases that make this AI model particularly compelling for content creators and audio professionals.

1. Create sounds, speech, and music

Fugatto allows users to transform text and audio files into an array of auditory outputs. Whether you’re crafting sound effects for a game, dialogue for a virtual assistant, or background music for a project, Fugatto makes it easy to produce high-quality audio. This versatility helps creators streamline workflows and explore new artistic directions.

2. Design unexpected sound effects

With Fugatto's generative AI model, users can turn familiar sounds into imaginative and unique effects. For example, a rumbling bass might be combined with high-pitched chirps to create entirely new auditory experiences. This feature is ideal for sound designers looking to push creative boundaries or evoke specific emotional responses.

3. Direct soundscapes

Fugatto excels in creating dynamic soundscapes, blending environmental sounds with music for film or audio productions. For instance, the sound of a train merging seamlessly into a string orchestra can add depth and immersion to storytelling, making it a powerful tool for filmmakers and audio producers.

4. Extract audio elements from audio samples

Fugatto simplifies audio editing by enabling users to isolate specific elements from audio samples. Whether you need to extract a voice track from a song or separate background noise, Fugatto makes this process intuitive and efficient, saving time for editors and musicians.

5. Generate new speech samples

Using text input, Fugatto can produce realistic voice samples. You can also adjust the tone, speed, and emotional delivery to fit the context. For instance, the same sentence can be delivered in a calm or excited tone, making it useful for voiceovers, virtual assistants, or dialogue in media projects.

6. Musical experimentation

Musicians can use Fugatto to create electronic music in just a few clicks. Experiment with existing tracks by adding new instruments or changing the style of a melody. For example, enhance a techno track with drum beats, or transform a simple piano piece into a pop or operatic vocal arrangement. This opens up creative possibilities for reimagining compositions.

7. Combine unusual instrumentations

Fugatto enables users to create a unique music snippet based on a text prompt. For example, by pairing sounds that wouldn’t typically be heard together, like a harp and an electric guitar, creators can craft unique arrangements that stand out and captivate listeners.

8. Produce completely new sounds

For creators exploring uncharted territories, Fugatto can bring abstract concepts to life. It allows users to generate entirely new and imaginative sounds based on their prompts, such as futuristic tones or alien-like noises, making it an invaluable tool for experimental artists and game developers.

How AI Audio Fugatto compares to ElevenLabs

Supporting numerous audio generation use cases, Fugatto looks like a fantastic general-purpose audio AI. It's an impressive research preview – but as things stand, it's only that. ElevenLabs, on the other hand, is available today and is production-grade.

Let's briefly assess how Fugatto's research preview compares against in key areas like Text-to-Speech and sound generation.

Text-to-Speech

ElevenLabs stands as the clear industry leader in Text-to-Speech technology, offering:

Support for 70+ languages with authentic accents and cultural nuances
Advanced emotional intelligence that responds to textual context
Control over voice characteristics
High-quality, human-like speech that maintains consistency across long-form content
An extensive library of natural-sounding voices
The ability to clone and customize voices

While Fugatto can generate speech with different accents and emotions, ElevenLabs' focused development in voice technology delivers more reliable, production-ready output that meets professional standards. Its specialized approach consistently produces more natural-sounding voices that capture the subtle nuances of human speech.

Sound Effects

While Fugatto excels at experimental sound creation by combining different audio elements, ElevenLabs provides a more streamlined and precise approach to sound effect generation. ElevenLabs offers:

Instant generation of four different samples for each prompt
Precise control through detailed text descriptions
High-quality output suitable for commercial projects
A comprehensive library of common sound effects
The ability to create distinctive effects directly from text descriptions

Where Fugatto takes a broad approach to audio manipulation, ElevenLabs delivers specialized excellence in both voice and sound effect generation. As one of the best AI sound effect generators, it produces reliable, production-ready output that better serves professional content creators' needs.

How to use ElevenLabs for Text-to-Speech

Transform your content into professional-quality voiceovers with these simple steps:

Sign up: Create a free or paid account with ElevenLabs
Choose your voice: Select from a diverse library of natural-sounding voices
Input your text: Paste or type your script into the interface
Customize settings: Adjust the speed, tone, and emphasis to match your needs
Preview and generate: Listen to a sample and generate your final audio output
Download: Download your high-quality voiceover

Final thoughts

The emergence of AI audio tools like Fugatto and ElevenLabs marks an exciting evolution in content creation. However, while Fugatto's research preview displays impressive versatility in experimental sound generation and audio manipulation, it's not yet available to use.

ElevenLabs, on the other hand, is available and production-grade. It's also the leading solution currently on the market for AI Text-to-Speech voice and sound effects generation.

Ready to test out ElevenLabs' AI technology? Sign up today to get started.

TEXT TO SPEECH

A blue sphere with a black arrow pointing to the right, next to a white card with a blue and black abstract wave design.

Create human-like voices with our Text to Speech (TTS) system, built for high-quality narration, gaming, video, and accessibility. Expressive voices, multilingual support, and API integration make it easy to scale from personal projects to enterprise workflows.