Suno + Veo 3.1 Is an INSANE Combo for AI Music Videos
By Isa does AI
Summary
## Key takeaways - **Suno + Veo 3.1 Hollywood Transformation**: A basic song transforms into a Hollywood quality music video using nothing but AI, combining Sunno AI with Open Arts VO 3.1 that creates the most cinematic results I've ever seen. [00:00], [00:20] - **Custom Mode for Maximum Song Control**: We're going with custom mode for maximum control since your prompt might be too large for simple mode. The more specific you are, the better your results. [00:49], [01:16] - **Upscale Crucial for Character Quality**: Upscale using the 2x plus face option. Do not skip this step. It's crucial for quality. [01:49], [01:52] - **Lip-Sync Hack with Audio Mute**: VO3.1 has audio capabilities that let you make your character actually lip sync to specific lyrics. Generate videos with audio enabled so the character lip syncs, then mute that audio in editing and replace it with our Suno track. [02:40], [02:54] - **Consistent Character Across Scenes**: Add maintain the exact same facial features, hairstyle, and hair color as the reference image to your prompts. Now we have our scene images with the same character maintained throughout the entire sequence. [02:01], [03:56] - **Match Shots to Song Structure**: Match intimate verses with close-ups and explosive choruses with wide dynamic shots. The lip movements from VO won't match your Suno lyrics perfectly, but they'll look natural enough to make it seem like your character is actually performing. [05:05], [05:22]
Topics Covered
- Specific Prompts Maximize AI Results
- Upscale Crucial for Character Quality
- VO3.1 Enables Lip-Sync Trick
- Match Shots to Song Dynamics
Full Transcript
You're about to see how a basic song transforms into a Hollywood quality music video using nothing but AI.
[music] >> What you're looking at right now took me less than 10 minutes to create and I'm about to walk you through the exact process. We're combining Sunno AI with
process. We're combining Sunno AI with Open Arts VO 3.1, Google's newest video model that creates the most cinematic results I've ever seen. I'm showing you the advanced method that gives you
complete creative control over every single frame, plus a game-changing technique for making your character actually lip-s sync to your lyrics.
Links for both Sunno and Open Art are in the description below. First, let's
create our music with Sunno AI. Head to
Sunno's homepage and click create. We're
going with custom mode for maximum control since your prompt might be too large for simple mode. For the style description, I'm typing modern upbeat electro pop with glossy synths, punchy
beats, and vibrant modern production.
Bright, expressive female vocals full of charisma and playful swagger. Drive
catchy hooks and a powerful, empowering chorus built on self-confidence and energy. A bold, cinematic pop sound with
energy. A bold, cinematic pop sound with polished edge, glamour, and high energy star power. The more specific you are,
star power. The more specific you are, the better your results. Hit create.
Listen to your options. Pick the one that feels right and download it. Now we
have our soundtrack. Head over to OpenArt. Click on image in the left
OpenArt. Click on image in the left sidebar. Then select create image.
sidebar. Then select create image.
Switch to the Open Art photorealistic model. Type something detailed like
model. Type something detailed like confident young woman with sleek dark hair in a high ponytail, striking features, wearing a leather jacket, bold expression, cinematic lighting,
professional photography style. Turn on
autoenhance. Click create. Then upscale
using the 2x plus face option. Do not
skip this step. It's crucial for quality. Next, switch to seedream 4.0
quality. Next, switch to seedream 4.0 with the omni feature. Use your upscaled image as reference and generate multiple angles. Add maintain the exact same
angles. Add maintain the exact same facial features, hairstyle, and hair color as the reference image to your prompts. Create at least four different
prompts. Create at least four different angles. Maya looking over her shoulder.
angles. Maya looking over her shoulder.
3/4 view. Same features. Dramatic
lighting. Maya facing camera. Close-up
portrait. Same styling. Confident
expression. Maya in profile. Side angle.
Same character. Cinematic mood. Maya
laughing candid moment. Same features.
Natural lighting. Click character in the side panel. Select start with four plus
side panel. Select start with four plus images. Name your character. I'm calling
images. Name your character. I'm calling
mine Maya. Upload your images with the upscaled one first and click create character. Here's where things get
character. Here's where things get really interesting. VO3.1 has audio
really interesting. VO3.1 has audio capabilities that let you make your character actually lip sync to specific lyrics. Here's the workflow. First,
lyrics. Here's the workflow. First,
we'll generate videos with audio enabled so the character lip syncs. Then, we'll
mute that audio in editing and replace it with our pseudo track. This way, it looks like your character is singing your actual song. Now, in the character section, select Maya and click prompt
and reference. Change aspect ratio to
and reference. Change aspect ratio to cinema 16-9. Set prompt adherence to
cinema 16-9. Set prompt adherence to two. Let's create our scene images. I'm
two. Let's create our scene images. I'm
going for an energetic pop vibe. Maya in
a neon lit urban alley at night wearing a leather jacket and ripped jeans.
Confident pose. [music] Vibrant pink and blue neon signs. Cinematic lighting.
Edgy atmosphere. Maya on a rooftop at golden hour. City skyline behind her.
golden hour. City skyline behind her.
Wind in her hair wearing a crop top and high-waisted pants. Energetic vibe. Warm
high-waisted pants. Energetic vibe. Warm
lighting. Maya in a modern loft with floor to ceiling windows. Natural light
streaming in. Dancing pose, contemporary setting, bright and airy. Maya in an underground parking garage, dramatic lighting from overhead lights, leather jacket, confident stance, urban
aesthetic, moody atmosphere. Perfect.
Now we have our scene images with the same character maintained throughout the entire sequence. Click videos in the
entire sequence. Click videos in the left panel. Go to image to video and
left panel. Go to image to video and select Google Vo 3.1 as your model.
Upload your first scene image, the neon alley, as your start frame. For your
video prompt, you can describe the camera movement and include a line that your song has, like I do right here. Hit
create and VO 3.1 will generate your character with mouth movements that match the audio. Now, in your editing software, you'll mute this audio track and replace it with your Suno song. When
we sync it properly to the beat, it'll look like Maya is singing our actual track. For the rooftop scene, upload
track. For the rooftop scene, upload that image as your start frame and prompt. Camera circles around Maya as
prompt. Camera circles around Maya as she sings at golden hour. Hair moving in wind. Energetic performance. City
wind. Energetic performance. City
skyline bokeh in background. Dynamic
movement for the [music] loft scene.
Wide shot pulling back as Maya dances and sings in the bright loft. Natural
light streaming through windows.
Energetic choreography. Contemporary
vibe. For the parking garage. Tracking
shot moving with Maya as she walks and sings through the garage. Dramatic
overhead lighting. Confident
performance. Urban atmosphere. Create as
many clips as you need for your song.
Match intimate verses with close-ups and explosive choruses [music] with wide dynamic shots. Import all your clips
dynamic shots. Import all your clips into any editing software. Here's the
key. Mute the original VO audio from each clip. Then add your Sunno track as
each clip. Then add your Sunno track as the main audio. Align your video cuts to the beat of your music. The lip
movements from VO won't match your Suno lyrics perfectly, but they'll look natural enough to make it seem like your character is actually performing. Quick
cuts on the beat make this even more convincing.
>> Electric glowing. [music and singing] >> Look at that. The character stays perfectly consistent. The lip movements
perfectly consistent. The lip movements look realistic. The camera work is
look realistic. The camera work is cinematic. The energy matches the beat.
cinematic. The energy matches the beat.
This is professional quality content created in minutes. If you want to create professional music videos, hit the links in the description for Sunno and Open Art.
Loading video analysis...