TLDW logo

Animate ANY Reference Image with a Video | WAN 2.2 ANIMATE ComfyUI (+workflow)

By Max Novak

Summary

## Key takeaways - **WAN 2.2 Animates Reference Images**: You can use WAN 2.2 to animate a reference image using a video as a motion capture tool, creating digital doubles with control over AI character performance. [00:37], [00:43] - **Similar Composition Yields Predictable Results**: For best results, keep reference video and image similar in composition; even with differences like samurai looking right vs left, it still turns out very well. [02:58], [03:15] - **AI Blends Details Creatively**: Final generation shows influences from both video and image bleeding over, like combining dancing with hand in pocket or improvising new motions on spaceship steering. [03:23], [03:46] - **Realistic Clothing Physics Emerges**: Clothing reacts in realistic ways to transferred movement, as seen in generations and a Reddit video that emphasized this feature. [04:33], [04:39] - **Simple Workflow: Drag, Load, Run**: Download JSON workflow, drag into ComfyUI, install missing nodes via Model Manager, load WAN 2.2 models into folders, add video/image/prompt, and click run. [04:52], [05:52] - **Extend Beyond 77 Frames**: For long generations past 77 frames, un-bypass context options, set context frames, and match frame window size to loaded video's frame count. [10:32], [10:54]

Topics Covered

  • ComfyUI Invades Traditional Pipelines
  • Ray 3 Reasons Complex Motions
  • Match Compositions for Predictability
  • AI Nails Clothing Physics

Full Transcript

What is going on guys? Today I want to talk about an extremely powerful open- source AI tool, Juan 2.2. If you guys were wondering where I've been for the past few months, I was actually working

on my first movie. It was the Wizard of Oz for the Vegas Sphere. So that was a super fun project. I got a unique look into the beginnings of how we can start to incorporate a tool like Comfy UI into

traditional pipelines. So I figured that

traditional pipelines. So I figured that warranted a tutorial talking about some fun ways that we can use these tools.

There's going to be a lot of different things that you can do with this model in particular. You can use W 2.2 for

in particular. You can use W 2.2 for character replacement with masking tools. You can use it as a style

tools. You can use it as a style transfer tool. You can use it to animate

transfer tool. You can use it to animate a reference image. So essentially almost like a motion capture tool. That in

particular is what we're going to be focusing on today. I have a nice and easy to use workflow that I put together for you guys. So you can download that in the description if you'd like to follow along. Other than that, you're

follow along. Other than that, you're going to need your reference footage and your reference image. I recorded some examples just with my webcam. And for

the rest, I used the new Luma Ray 3 model, which if you're interested in learning more about that, just so happens to be the sponsor of today's video. What you're looking at here is my

video. What you're looking at here is my favorite result from the new Ray 3 video model. Converting an image to a video.

model. Converting an image to a video.

Lumalabs has been on my radar for a while. I was a big fan of the Ray 2

while. I was a big fan of the Ray 2 model in the past. I thought it performed really well with motion, especially when generating realistic human movement. Well, with the Ray 3

human movement. Well, with the Ray 3 model, Luma is introducing new exciting features which improve quality, allow for faster iteration, and are geared for professional workflows. The first new

professional workflows. The first new feature is the reasoning mode, which can understand intent, and help you plan complex compositions. This really

complex compositions. This really impressed me when I tested it out. I

used this image of a bunch of birds with different colors, and I wanted specifically the yellow bird to fly towards the camera. And you can see the reasoning breakdown here. It's analyzing

the scene and it's honing in on those little details such as how the other birds are going to react when this middle one takes off. Again, I used an image which before could really throw off an AI video model and it performed

really well and added in a lot of extra things I didn't think it would. Another

new quality of life feature is a draft mode which allows you to generate a visual concept at 20 times the speed of normal generation. That way you can pick

normal generation. That way you can pick and choose and then refine your concepts instead of just hoping things turn out the way you want and burning through credits. There is also a new HDR

credits. There is also a new HDR capability which is able to transform standard footage into 10, 12, and 16 bit high dynamic range. Again, I tested this on my birdshot and it was really easy to

work with in Premiere. I just had to rightclick on the footage and go to modify and then color and then click on override media color space and switch over to wreck 2020. From there, I can go to my lummetry color panel for further

control over the highlights and shadows as well as vibrant color without blowing out the rest of the image. If you want to check out Ray 3 and try it yourself, click the link at the top of my description. So, before I show you

description. So, before I show you installation and how to use this workflow, here is a quick showcase of some of my results from testing. For the

best results, you want to try to keep the reference video and image similar in composition if possible. You can still get great results if they are different.

Uh, for example, I did this test with a samurai reference image where he's kind of looking to the right and my reference footage, I was looking to the left. It

still turned out very well, but you're going to find that the closer the reference image and video are, the more predictable the outcome will be. In

terms of the final result, you can see a lot of influences from both the video and the image bleed over into the final generation. Sometimes it can give you

generation. Sometimes it can give you some really cool things. For example,

this dancing video, she has her hand in the pocket. I have a dancing video and

the pocket. I have a dancing video and it ended up combining both of those. So,

she's doing the dance, but she still has her hand in the pocket. So again,

sometimes it can do some things like that. Another example, I have these

that. Another example, I have these images of people with their hands kind of like on a steering wheel in these spaceships, and it would kind of create these new motions that I thought look pretty cool. But again, if we're goal is

pretty cool. But again, if we're goal is control here, sometimes you may not like those. Yeah, I was really impressed by

those. Yeah, I was really impressed by that. Those little improvisations or

that. Those little improvisations or details. Uh I also noticed things like

details. Uh I also noticed things like focus poles or or different reflections in some results which were very subtle little details that uh added a lot and really impressed me. Now on the flip

side of this, if there are differences between reference and image, for example, bone structure, you can get some of those unwanted changes transferring over to your results. You

can back this. You can try a different seed or you can experiment with these multiplier values here for the pose or adjust the prompt and the CFG. Another

thing that really blew my mind is clothing physics. I saw this a lot in my

clothing physics. I saw this a lot in my generations, but I found this video on Reddit and I thought it really kind of emphasized this. The clothing can react

emphasized this. The clothing can react in such a realistic way when you're transferring over the movement that was really cool. So, there's my quick

really cool. So, there's my quick showcase and some pros and cons. Let's

hop in and let me show you how to get up and running with this workflow. All

you're going to need is to download Comfy UI and Comfy UI model manager.

I'll leave the links to that down below.

All right. So, go ahead and download the JSON workflow down below, and you can just drag and drop it into Comfy UI to pop it up like this. Uh, you may need to install some custom nodes first. So, you

want to come up here to model manager.

If you're not seeing this, make sure make sure you find Comfy Model Manager.

If you do have model manager, you can just click here and it'll show everything that's missing from the workflow. You can install it from there.

workflow. You can install it from there.

The biggest thing to keep in mind is if we go to Comfy UI folder here, we go down to our custom nodes. Um, if you've used Juan before, you may already have Kiji's one video wrapper custom nodes.

Make sure you update this so that everything will work with Juan 2.2.

Again, if you ever need to update things in model manager isn't working for any reason, just look it up on Google, find the GitHub, and then you can go into your custom nodes folder here, just type cmd, and then you just type git clone

and paste the repository link right into here. So, with this workflow, again,

here. So, with this workflow, again, very simple. It just goes from left to

very simple. It just goes from left to right. On the left here is going to be

right. On the left here is going to be your models that you're going to need to download. They will all be right here.

download. They will all be right here.

Click and go ahead and download these.

So, if we come back into our Comfy folder, we go into the models folder.

This first one here, you're going to want to put into diffusion models. This

Laura here, you can put again into the Laura's folder. We're using light x2V

Laura's folder. We're using light x2V for this. There's also a relight Laura,

for this. There's also a relight Laura, but for some reason, I think this is down right now, so I just set that to none. Uh, and then you also have your

none. Uh, and then you also have your text encoders here. Again, text encoders folder and then your V, which goes into the V folder as well. So, once you have placed all of those models into the

correct folder, you can go ahead and click control 5. That'll just

automatically refresh this without having to completely restart this process. Uh, and if for any reason

process. Uh, and if for any reason you're not seeing that, you can just click over models here and just click refresh. Check if that's in there. And

refresh. Check if that's in there. And

then you should be able to drag and drop or select them from these nodes. So, for

the wand model loader, again, you load it in straight here. You place your Laura here, as well as any other Loras you may want into here. Uh, this is where you put clip. This is where you put Vey. Um, if you don't have anything

put Vey. Um, if you don't have anything that's not listed here, like for example, you don't have Clip Vision H, just Google this Clip Vision H safe tensors, download that, put that in the clip folder. I think that should be

clip folder. I think that should be about it. Down here, you can put in your

about it. Down here, you can put in your text encoder. This will be where we

text encoder. This will be where we place our prompts. Uh, but again, let's just go left to right, and we'll give a quick explanation. So these over here

quick explanation. So these over here are just some optimizations for if you're struggling with VRAM. I actually

don't have Sage Attention on my computer right now. So I'm just doing SDPA. But

right now. So I'm just doing SDPA. But

if you want a little bit of a speed boost, you can install Triton and Sage Attention. And for those, I recommend

Attention. And for those, I recommend you just Google how to install Triton Sage Attention. Um that'll walk you

Sage Attention. Um that'll walk you through that. Moving on to the right,

through that. Moving on to the right, this is where we're going to load in our video as well as our reference image. So

again, this is just going to be whatever we're pulling the motion or the performance from. And this is going to

performance from. And this is going to be what we are animating. You can also set your custom height and width right here for both the video and the image.

And then over here is going to be a little view of the processing that's going on. So we're using DW pose

going on. So we're using DW pose estimator to actually get the motion from the body, the face, or the hands.

Again, if you wanted to pick up like the hand motions, make sure you enable that.

There's also a set images area here. Uh

so it'll show a little bit of like an isolation of the facial performance that'll pop up once you run everything.

And for this workflow in particular, I'm hiding a lot of like the behind-the-scenes processes and just keeping it very simple for people to go in and use this without really understanding like all of the different things that I'm kind of covering up

here. So, essentially what this workflow

here. So, essentially what this workflow is is it is a modified version of KI's uh one 2.2 example workflow. So, let me just pop that up and I'm going to leave

a link to this down below as well if you want to check this out. So this is the original and this essentially uses um SAM segmentation to act as sort of like a character replacement tool or a

compositing tool as well. Um and what I did was I kind of just removed all of that. Uh so if you want to do like

that. Uh so if you want to do like character replacement, you can check that out. I'm really only interested in

that out. I'm really only interested in this part here, which is essentially removing the get background image.

That'll make it so that you only are using this to drive this image here and nothing else. So again, in the future,

nothing else. So again, in the future, I'll do some testing with uh masking, using this to kind of like create different to create character replacements or so on and so forth. I've

seen some really insane results with that, but again, we're only focusing on this for now. So again, after your processing, let's go down to this generation section. And this is where

generation section. And this is where you can see all of the wide video animate embeds. So you can see how

animate embeds. So you can see how things are kind of plugging into this.

Our reference image, our pose images, and our face tracking right here.

There's also this background image section. So I've left this here for now.

section. So I've left this here for now.

What you can do is if you want to add your own custom background image that isn't using this. You can go ahead and just load in a you can create a load image node. You can create a set node

image node. You can create a set node and just name this like set background image. You want to plug and you want to

image. You want to plug and you want to plug that into the set node so that it's calling the background image which you can then plug into this background images socket right here. So again,

that's optional. Of course, we need our prompt right here. So again, what I like to do is I'll just toss my reference image into something like Gemini and say describe this image and then I'll take

that description and just plop it into here. Pretty simple. You can add on to

here. Pretty simple. You can add on to that if you want. For the one video sampler, I just kept the settings default for here and I thought the results were pretty good. You can change that around if you want. I didn't think there was a need for that. And of

course, your final output will pop up right here along with the audio from your original reference if you did have that. Now, an additional little section

that. Now, an additional little section down here. If you want to work with long

down here. If you want to work with long generations, so anything past 77 frames, what you can do is select this one context options node and just click controlB to unbypass that. And then you

can set your context frames to try and push past 77. And they have a little explanation for that as well. So, if you do set your context frames right here so that the quality doesn't degrade over a

longer generation, you can go back over to the animate embeds and you just want to adjust this frame window size so that this frame count is reflective of whatever you have right here. And uh

same for you whenever you load your video originally, you want to try and keep things always the same. So, frame

load cap 45, frame window size 45.

And again, it's nice to have like a similar composition. And that's really

similar composition. And that's really about it. Very, very simple. Just load

about it. Very, very simple. Just load

in your models, load in your video, your image, set your text prompt, and click run. Hope you guys enjoyed the video.

run. Hope you guys enjoyed the video.

Let me know in the comments what you'd like to see next. As always, guys, thank you so much for watching. Thank you for supporting and I'll see you in the next

Loading...

Loading video analysis...