Image & Video

Overview
Image & Video enables you to create high-quality visual content from simple text descriptions or reference images. Generate static images or dynamic videos in any style, then refine them iteratively with additional prompts, upscale for high-resolution output, and even add lip-sync with audio. Export finished assets as standalone files or import them directly into Studio projects.
Guide
Follow these steps to create your first visual asset:
Select your mode
Use the toggle in the upper right corner of the prompt box to choose between Image or Video generation.
Provide a prompt or reference
Describe your desired output using natural language in the prompt box. For more control, drag existing images or videos from the Explore or History tabs into the reference slots, or upload your own reference images in a wide range of file formats including JPG, PNG, WEBP, and more.
Choose a model and settings
Select the ideal generative model for your goal (e.g., OpenAI Sora 2 Pro, Google Veo 3.1, Kling 2.5, Flux 1 Kontext Pro). See the Models section for detailed information on each model. Adjust settings like aspect ratio, resolution, duration (for video), and the number of variations to generate.
Generate your asset
Click the Generate button. Your assets will be created and displayed in the History tab for review.
Workflow
The creation process moves you from inspiration to finished asset in four stages:
Explore
Discover community creations to find inspiration, study effective prompts, or pull references directly into your own work.
Generate
Use the prompt box to describe what you want to create, select a model, fine-tune your settings, and bring your idea to life.
History
Review your generations in the History tab to iterate and enhance. Recreate variations, reuse prompts, and apply enhancements like upscaling and lip-syncing.
Export
Download finished assets in various formats or send them directly to Studio to use in your projects.
Explore
The Explore tab displays a gallery of community creations for discovering inspiration and finding visuals to use as references.
Search: Use the search bar to find images and videos based on keywords.
Sort: Toggle between Trending and Newest to see what’s popular or recently added.
Drag-and-drop: Pull any result from the grid directly into the prompt box to use as a start frame, end frame, or style reference.
Preview details: Click any tile to see the full prompt and settings used to create it.
Generate

The prompt box is anchored at the bottom of the page and provides all controls for creating visual content.
Set mode and prompt
Select mode: Use the toggle in the upper right corner to switch between Image and Video generation.
Write your prompt: In the main field, describe what you want to generate using natural language. Be clear and descriptive for best results.
Choose models and settings

Select model: Open the model menu to browse available options like OpenAI Sora 2 Pro, Google Veo 3.1, Kling 2.5, or Flux 1 Kontext Pro. Each model has unique strengths and capabilities listed for easy comparison. See the Models section for detailed information.
Adjust settings: Fine-tune your generation with settings that appear below the prompt. These vary by model but often include:
- Aspect Ratio: Control the dimensions of your output
- Resolution: Set the quality level
- Duration: Specify video length (for video mode)
- Number of Generations: Create up to 4 variations at once
Use controls: On supported models, enable Audio, add a Negative Prompt to exclude unwanted elements, or adjust Sound Control.
Add references

For greater control over output, add visual references to guide generation. Availability depends on the selected model. We support a wide range of image file formats including JPG, PNG, WEBP, and more.
Start Frame (Video): Sets the opening image of your video.
End Frame (Video): Sets the final image, influencing the transition.
Image Refs (Image or Video): Provide one or more images to guide overall style and look.
Drag and drop items directly from the Explore or History tabs into reference slots for a faster workflow.
Generate
Before generating, a cost indicator shows the total cost for the number of assets you’ve chosen to create. When ready, click Generate. Your new creations will appear in the History tab.
History

The History tab provides a chronological log of everything you’ve generated and serves as a workspace for refining previous work.
Browse: View all past images and videos.
Inspect: Click any asset to see the original prompt, model, and settings used to create it.
Reuse: Drag items from History back into the prompt box to use as references for new generations.
Iterate: Click Recreate to run the same prompt and settings again for a new variation, or adjust settings to guide generation in a new direction.
Share: Click Share to generate a unique link for your asset. Send it to teammates and collaborators for feedback.
Export: Download your asset as a standalone file or click Edit in Studio to import it directly into Studio.
Export
Once you have a generation you’re satisfied with, use built-in enhancement tools before exporting.
Enhancing your creations
Upscale: Use Topaz Upscale to increase resolution by up to 4x while preserving sharp details.
LipSync: Apply realistic lip-syncing to your visuals:
- Omnihuman 1.5: Animate a static image with an audio track
- Veed LipSync: Dub an existing video with new audio
Exporting your assets

Export finished assets by downloading them locally or sending them directly to Studio.
Edit in Studio: Import the asset directly into a Studio project.
Download: Save the asset to your local machine.
Supported download formats
Video:
- MP4: Codecs H.264, H.265. Quality up to 4K (with upscaling)
Image:
- PNG: High-resolution, lossless output
Models
Image & Video provides access to specialized models optimized for different use cases. Each model offers unique capabilities, from rapid iteration to production-ready quality.
Post-processing models require an existing generated output, though you can also upload your own image or video file.
Video generative models
OpenAI Sora 2 Pro
The most advanced, high-fidelity video model for cinematic results at your disposal.
Generation inputs:
- Text-to-Video
- Start Frame
Features:
- Highest-fidelity, professional-grade output with synced audio
- Precise multi-shot control
- Excels at complex motion and prompt adherence
- Fixed durations: 4s, 8s, and 12s
- Batch creation with up to 4 generations at a time
Output options:
- Resolutions: 720p, 1080p
- Aspect ratios: 16:9, 9:16
Ideal for:
- Cinematic, professional-grade video content
Cost: Starts at 12,000 credits for a generation
End frame is not currently supported. Cannot provide image references. Sound is enabled by default.
OpenAI Sora 2
The standard, high-speed version of OpenAI’s advanced video model, tuned for everyday content creation.
Generation inputs:
- Text-to-Video
- Start Frame
Features:
- Realistic, physics-aware videos with synced audio
- Fine scene control
- Fixed durations: 4s, 8s, and 12s
- Batch creation with up to 4 generations at a time
- Strong narrative and character consistency
Output options:
- Resolutions: 720p, 1080p
- Aspect ratios: 16:9, 9:16
Ideal for:
- Everyday content creation with realistic physics
Cost: Starts at 4,000 credits for default settings
End frame is not currently supported. Cannot provide image references. Sound is enabled by default.
Google Veo 3.1
A professional-grade model for high-quality, cinematic video generation.
Generation inputs:
- Text-to-Video
- Start Frame
- End Frame
- Image References
Features:
- Excellent quality and creative control with negative prompts
- Fully integrated and synchronized audio
- Realistic dialogue, lip-sync, and sound effects
- Fixed durations: 4s, 6s, and 8s
- Batch creation with up to 4 generations at a time
- Dedicated sound control
Output options:
- Resolutions: 720p, 1080p
- Aspect ratios: 16:9, 9:16
Ideal for:
- High-quality, cinematic video generation with full creative control
Cost: Starts at 8,000 credits for default settings
Enabling and disabling sound will change the generation credits.
Kling 2.5
A balanced and versatile model for high-quality, full-HD video generation.
Generation inputs:
- Text-to-Video
- Start Frame
Features:
- Excels at simulating complex motion and realistic physics
- Accurately models fluid dynamics and expressions
- Fixed durations: 5s and 10s
- Batch creation with up to 4 generations at a time
Output options:
- Resolutions: 1080p
- Aspect ratios: 16:9, 1:1, 9:16
Ideal for:
- Realistic physics simulations and complex motion
Cost: Starts at 3,500 credits for default settings
End frame is not currently supported. Cannot provide image references. Sound control not available.
Google Veo 3.1 Fast
A high-speed model optimized for rapid previews and generations, delivering sharper visuals with lower latency.
Generation inputs:
- Text-to-Video
- Start Frame
- End Frame
Features:
- Advanced creative control with negative prompts and dedicated sound control
- Fixed durations: 4s, 6s, and 8s
- Batch creation with up to 4 generations at a time
- Accurately models real-world physics for realistic motion and interactions
Output options:
- Resolutions: 720p, 1080p
- Aspect ratios: 16:9, 9:16
Ideal for:
- Quick iteration and A/B testing visuals
- Fast-paced social media content creation
Cost: Starts at 4,000 credits for default settings
Google Veo 3
Production-ready model delivering exceptional quality, strong physics realism, and coherent narrative audio.
Generation inputs:
- Text-to-Video
- Start Frame
Features:
- Advanced integrated “narrative audio” generation that matches video tone and story
- Granular creative control with negative prompts and dedicated sound control
- Fixed durations: 4s, 6s, and 8s
- Batch creation with up to 4 generations at a time
Output options:
- Resolutions: 720p, 1080p
- Aspect ratios: 16:9, 9:16
Ideal for:
- Final renders and professional marketing content
- Short-form storytelling
Cost: Starts at 8,000 credits for default settings
Google Veo 3 Fast
A high-speed, cost-efficient model for generating audio-backed video from text or a starting image.
Generation inputs:
- Text-to-Video
- Start Frame
Features:
- Granular creative control with negative prompts and dedicated sound control
- Fixed durations: 4s, 6s, and 8s
- Batch creation with up to 4 generations at a time
Output options:
- Resolutions: 720p, 1080p
- Aspect ratios: 16:9, 9:16
Ideal for:
- Rapid iteration and previews
- Cost-effective content creation
Cost: Starts at 4,000 credits for default settings
Seedance 1 Pro
A specialized model for creating dynamic, multi-shot sequences with large movement and action.
Generation inputs:
- Text-to-Video
- Start Frame
- End Frame
Features:
- Highly stable physics and seamless transitions between shots
- Fixed durations: 3s, 4s, 5s, 6s, 7s, 8s, 9s, 10s, 11s, and 12s
- Batch creation with up to 4 generations at a time
- Maximum creative flexibility with numerous aspect ratio options
Output options:
- Resolutions: 480p, 720p, 1080p
- Aspect ratios: 21:9, 16:9, 4:3, 1:1, 3:4, 9:16
Ideal for:
- Storytelling and action scenes requiring stable physics
Cost: Starts at 4,800 credits for default settings
Aspect ratio and resolution do not affect generation credits, but duration does.
Wan 2.5
A versatile model that delivers cinematic motion and high prompt fidelity from text or a starting image.
Generation inputs:
- Text-to-Video
- Start Frame (Image-to-Video)
Features:
- Granular creative control with negative prompts and dedicated sound control
- Fixed durations: 5s and 10s
- Batch creation with up to 4 generations at a time
Output options:
- Resolutions: 480p, 720p, 1080p
- Aspect ratios: 16:9, 1:1, 9:16
Ideal for:
- Cinematic content with strong prompt adherence
Cost: Starts at 2,500 credits for default settings
Generation cost varies based on selected settings.
Image generative models
Google Nano Banana
A high-speed model for quick, high-quality image generation and editing directly from text prompts.
Features:
- Supports multiple image references to guide generation
- Generates up to 4 images at a time
Output options:
- Aspect ratios: 21:9, 16:9, 5:4, 4:3, 3:2, 1:1, 2:3, 3:4, 4:5, 9:16
Ideal for:
- Rapid image creation and iteration
Cost: Starts at 2,000 credits for default settings; varies based on number of generations
Seedream 4
A specialized image model for generating multi-shot sequences or scenes with large movement and action.
Features:
- Excels at creating images with stable physics and coherent transitions
- Supports multiple image references to guide generation
- Generates up to 4 images at a time
Output options:
- Aspect ratios: auto, 16:9, 4:3, 1:1, 3:4, 9:16
Ideal for:
- Action scenes and dynamic compositions
Cost: Starts at 1,200 credits for default settings; varies based on number of generations
Flux 1 Kontext Pro
A professional model for advanced image generation and editing, offering strong scene coherence and style control.
Features:
- Image-based style control requiring a reference image to guide visual aesthetic
- Generates up to 4 images at a time
Output options:
- Aspect ratios: 21:9, 16:9, 4:3, 3:2, 1:1, 2:3, 3:4, 4:5, 9:16, 9:21
Ideal for:
- Professional content with precise style requirements
Cost: Starts at 1,600 credits; varies based on settings and number of generations
Wan 2.5
An image model with strong prompt fidelity and motion awareness, ideal for capturing dynamic action in a still frame.
Features:
- Granular control with negative prompts
- Supports multiple image references to guide generation
- Generates up to 4 images at a time
Output options:
- Aspect ratios: 16:9, 4:3, 1:1, 3:4, 9:16
Ideal for:
- Dynamic still images with motion awareness
Cost: Starts at 2,000 credits; varies based on settings
OpenAI GPT Image 1
A versatile model for precise, high-quality image creation and detailed editing guided by natural language prompts.
Features:
- Supports multiple image references to guide generation
- Generates up to 4 images at a time
Output options:
- Aspect ratios: 3:2, 1:1, 2:3
- Quality options: low, medium, high
Ideal for:
- Creating and editing images with precise, text-based control
Cost: Starts at 2,400 credits for default settings; varies based on settings and number of generations
Lip-sync models
Omnihuman 1.5
A dedicated utility model for generating exceptionally realistic, humanlike lip-sync.
Inputs:
- Static source image
- Speech audio file
Features:
- Animates the mouth on the source image to match provided audio
- Creates high-fidelity “talking” video from still images
- Lip-sync specific tool, not a full video generation model
Ideal for:
- Creating talking avatars
- Adding dialogue to still images
- Professional dubbing workflows
Cost: Depends on generation input
For best results, the image should contain a detectable figure.
Veed LipSync
A fast, affordable, and precise utility model for applying realistic lip-sync to videos.
Inputs:
- Source video
- New speech audio file
Features:
- Re-animates mouth movements in source video to match new audio
- Video-to-video lip-sync tool, not a full video generator
Ideal for:
- High-volume, cost-effective dubbing
- Translating content
- Correcting audio in video clips with realistic results
Cost: Depends on generation input
For best results, the video should contain a detectable figure.
Upscaling model
Topaz Upscale
A dedicated utility model for image and video upscaling, designed to enhance resolution and detail up to 4x.
Features:
- Enhancement tool that processes existing media
- Increases media size while preserving natural textures and minimizing artifacts
- Highly granular upscale factors: 1x, 1.25x, 1.5x, 1.75x, 2x, 3x, 4x
- Video-specific: Flexible frame rate control (keep source or convert to 24, 25, 30, 48, 50, or 60 fps)
Ideal for:
- Improving quality of generated media
- Restoring legacy footage or photos
- Preparing assets for high-resolution displays
Cost: Depends on generation input