Animate ANY Reference Image with a Video | WAN 2.2 ANIMATE ComfyUI (+workflow)
By Max Novak
Summary
## Key takeaways - **WAN 2.2 Animates Reference Images**: You can use WAN 2.2 to animate a reference image using a video as a motion capture tool, creating digital doubles with control over AI character performance. [00:37], [00:43] - **Similar Composition Yields Predictable Results**: For best results, keep reference video and image similar in composition; even with differences like samurai looking right vs left, it still turns out very well. [02:58], [03:15] - **AI Blends Details Creatively**: Final generation shows influences from both video and image bleeding over, like combining dancing with hand in pocket or improvising new motions on spaceship steering. [03:23], [03:46] - **Realistic Clothing Physics Emerges**: Clothing reacts in realistic ways to transferred movement, as seen in generations and a Reddit video that emphasized this feature. [04:33], [04:39] - **Simple Workflow: Drag, Load, Run**: Download JSON workflow, drag into ComfyUI, install missing nodes via Model Manager, load WAN 2.2 models into folders, add video/image/prompt, and click run. [04:52], [05:52] - **Extend Beyond 77 Frames**: For long generations past 77 frames, un-bypass context options, set context frames, and match frame window size to loaded video's frame count. [10:32], [10:54]
Topics Covered
- ComfyUI Invades Traditional Pipelines
- Ray 3 Reasons Complex Motions
- Match Compositions for Predictability
- AI Nails Clothing Physics
Full Transcript
What is going on guys? Today I want to talk about an extremely powerful open- source AI tool, Juan 2.2. If you guys were wondering where I've been for the past few months, I was actually working
on my first movie. It was the Wizard of Oz for the Vegas Sphere. So that was a super fun project. I got a unique look into the beginnings of how we can start to incorporate a tool like Comfy UI into
traditional pipelines. So I figured that
traditional pipelines. So I figured that warranted a tutorial talking about some fun ways that we can use these tools.
There's going to be a lot of different things that you can do with this model in particular. You can use W 2.2 for
in particular. You can use W 2.2 for character replacement with masking tools. You can use it as a style
tools. You can use it as a style transfer tool. You can use it to animate
transfer tool. You can use it to animate a reference image. So essentially almost like a motion capture tool. That in
particular is what we're going to be focusing on today. I have a nice and easy to use workflow that I put together for you guys. So you can download that in the description if you'd like to follow along. Other than that, you're
follow along. Other than that, you're going to need your reference footage and your reference image. I recorded some examples just with my webcam. And for
the rest, I used the new Luma Ray 3 model, which if you're interested in learning more about that, just so happens to be the sponsor of today's video. What you're looking at here is my
video. What you're looking at here is my favorite result from the new Ray 3 video model. Converting an image to a video.
model. Converting an image to a video.
Lumalabs has been on my radar for a while. I was a big fan of the Ray 2
while. I was a big fan of the Ray 2 model in the past. I thought it performed really well with motion, especially when generating realistic human movement. Well, with the Ray 3
human movement. Well, with the Ray 3 model, Luma is introducing new exciting features which improve quality, allow for faster iteration, and are geared for professional workflows. The first new
professional workflows. The first new feature is the reasoning mode, which can understand intent, and help you plan complex compositions. This really
complex compositions. This really impressed me when I tested it out. I
used this image of a bunch of birds with different colors, and I wanted specifically the yellow bird to fly towards the camera. And you can see the reasoning breakdown here. It's analyzing
the scene and it's honing in on those little details such as how the other birds are going to react when this middle one takes off. Again, I used an image which before could really throw off an AI video model and it performed
really well and added in a lot of extra things I didn't think it would. Another
new quality of life feature is a draft mode which allows you to generate a visual concept at 20 times the speed of normal generation. That way you can pick
normal generation. That way you can pick and choose and then refine your concepts instead of just hoping things turn out the way you want and burning through credits. There is also a new HDR
credits. There is also a new HDR capability which is able to transform standard footage into 10, 12, and 16 bit high dynamic range. Again, I tested this on my birdshot and it was really easy to
work with in Premiere. I just had to rightclick on the footage and go to modify and then color and then click on override media color space and switch over to wreck 2020. From there, I can go to my lummetry color panel for further
control over the highlights and shadows as well as vibrant color without blowing out the rest of the image. If you want to check out Ray 3 and try it yourself, click the link at the top of my description. So, before I show you
description. So, before I show you installation and how to use this workflow, here is a quick showcase of some of my results from testing. For the
best results, you want to try to keep the reference video and image similar in composition if possible. You can still get great results if they are different.
Uh, for example, I did this test with a samurai reference image where he's kind of looking to the right and my reference footage, I was looking to the left. It
still turned out very well, but you're going to find that the closer the reference image and video are, the more predictable the outcome will be. In
terms of the final result, you can see a lot of influences from both the video and the image bleed over into the final generation. Sometimes it can give you
generation. Sometimes it can give you some really cool things. For example,
this dancing video, she has her hand in the pocket. I have a dancing video and
the pocket. I have a dancing video and it ended up combining both of those. So,
she's doing the dance, but she still has her hand in the pocket. So again,
sometimes it can do some things like that. Another example, I have these
that. Another example, I have these images of people with their hands kind of like on a steering wheel in these spaceships, and it would kind of create these new motions that I thought look pretty cool. But again, if we're goal is
pretty cool. But again, if we're goal is control here, sometimes you may not like those. Yeah, I was really impressed by
those. Yeah, I was really impressed by that. Those little improvisations or
that. Those little improvisations or details. Uh I also noticed things like
details. Uh I also noticed things like focus poles or or different reflections in some results which were very subtle little details that uh added a lot and really impressed me. Now on the flip
side of this, if there are differences between reference and image, for example, bone structure, you can get some of those unwanted changes transferring over to your results. You
can back this. You can try a different seed or you can experiment with these multiplier values here for the pose or adjust the prompt and the CFG. Another
thing that really blew my mind is clothing physics. I saw this a lot in my
clothing physics. I saw this a lot in my generations, but I found this video on Reddit and I thought it really kind of emphasized this. The clothing can react
emphasized this. The clothing can react in such a realistic way when you're transferring over the movement that was really cool. So, there's my quick
really cool. So, there's my quick showcase and some pros and cons. Let's
hop in and let me show you how to get up and running with this workflow. All
you're going to need is to download Comfy UI and Comfy UI model manager.
I'll leave the links to that down below.
All right. So, go ahead and download the JSON workflow down below, and you can just drag and drop it into Comfy UI to pop it up like this. Uh, you may need to install some custom nodes first. So, you
want to come up here to model manager.
If you're not seeing this, make sure make sure you find Comfy Model Manager.
If you do have model manager, you can just click here and it'll show everything that's missing from the workflow. You can install it from there.
workflow. You can install it from there.
The biggest thing to keep in mind is if we go to Comfy UI folder here, we go down to our custom nodes. Um, if you've used Juan before, you may already have Kiji's one video wrapper custom nodes.
Make sure you update this so that everything will work with Juan 2.2.
Again, if you ever need to update things in model manager isn't working for any reason, just look it up on Google, find the GitHub, and then you can go into your custom nodes folder here, just type cmd, and then you just type git clone
and paste the repository link right into here. So, with this workflow, again,
here. So, with this workflow, again, very simple. It just goes from left to
very simple. It just goes from left to right. On the left here is going to be
right. On the left here is going to be your models that you're going to need to download. They will all be right here.
download. They will all be right here.
Click and go ahead and download these.
So, if we come back into our Comfy folder, we go into the models folder.
This first one here, you're going to want to put into diffusion models. This
Laura here, you can put again into the Laura's folder. We're using light x2V
Laura's folder. We're using light x2V for this. There's also a relight Laura,
for this. There's also a relight Laura, but for some reason, I think this is down right now, so I just set that to none. Uh, and then you also have your
none. Uh, and then you also have your text encoders here. Again, text encoders folder and then your V, which goes into the V folder as well. So, once you have placed all of those models into the
correct folder, you can go ahead and click control 5. That'll just
automatically refresh this without having to completely restart this process. Uh, and if for any reason
process. Uh, and if for any reason you're not seeing that, you can just click over models here and just click refresh. Check if that's in there. And
refresh. Check if that's in there. And
then you should be able to drag and drop or select them from these nodes. So, for
the wand model loader, again, you load it in straight here. You place your Laura here, as well as any other Loras you may want into here. Uh, this is where you put clip. This is where you put Vey. Um, if you don't have anything
put Vey. Um, if you don't have anything that's not listed here, like for example, you don't have Clip Vision H, just Google this Clip Vision H safe tensors, download that, put that in the clip folder. I think that should be
clip folder. I think that should be about it. Down here, you can put in your
about it. Down here, you can put in your text encoder. This will be where we
text encoder. This will be where we place our prompts. Uh, but again, let's just go left to right, and we'll give a quick explanation. So these over here
quick explanation. So these over here are just some optimizations for if you're struggling with VRAM. I actually
don't have Sage Attention on my computer right now. So I'm just doing SDPA. But
right now. So I'm just doing SDPA. But
if you want a little bit of a speed boost, you can install Triton and Sage Attention. And for those, I recommend
Attention. And for those, I recommend you just Google how to install Triton Sage Attention. Um that'll walk you
Sage Attention. Um that'll walk you through that. Moving on to the right,
through that. Moving on to the right, this is where we're going to load in our video as well as our reference image. So
again, this is just going to be whatever we're pulling the motion or the performance from. And this is going to
performance from. And this is going to be what we are animating. You can also set your custom height and width right here for both the video and the image.
And then over here is going to be a little view of the processing that's going on. So we're using DW pose
going on. So we're using DW pose estimator to actually get the motion from the body, the face, or the hands.
Again, if you wanted to pick up like the hand motions, make sure you enable that.
There's also a set images area here. Uh
so it'll show a little bit of like an isolation of the facial performance that'll pop up once you run everything.
And for this workflow in particular, I'm hiding a lot of like the behind-the-scenes processes and just keeping it very simple for people to go in and use this without really understanding like all of the different things that I'm kind of covering up
here. So, essentially what this workflow
here. So, essentially what this workflow is is it is a modified version of KI's uh one 2.2 example workflow. So, let me just pop that up and I'm going to leave
a link to this down below as well if you want to check this out. So this is the original and this essentially uses um SAM segmentation to act as sort of like a character replacement tool or a
compositing tool as well. Um and what I did was I kind of just removed all of that. Uh so if you want to do like
that. Uh so if you want to do like character replacement, you can check that out. I'm really only interested in
that out. I'm really only interested in this part here, which is essentially removing the get background image.
That'll make it so that you only are using this to drive this image here and nothing else. So again, in the future,
nothing else. So again, in the future, I'll do some testing with uh masking, using this to kind of like create different to create character replacements or so on and so forth. I've
seen some really insane results with that, but again, we're only focusing on this for now. So again, after your processing, let's go down to this generation section. And this is where
generation section. And this is where you can see all of the wide video animate embeds. So you can see how
animate embeds. So you can see how things are kind of plugging into this.
Our reference image, our pose images, and our face tracking right here.
There's also this background image section. So I've left this here for now.
section. So I've left this here for now.
What you can do is if you want to add your own custom background image that isn't using this. You can go ahead and just load in a you can create a load image node. You can create a set node
image node. You can create a set node and just name this like set background image. You want to plug and you want to
image. You want to plug and you want to plug that into the set node so that it's calling the background image which you can then plug into this background images socket right here. So again,
that's optional. Of course, we need our prompt right here. So again, what I like to do is I'll just toss my reference image into something like Gemini and say describe this image and then I'll take
that description and just plop it into here. Pretty simple. You can add on to
here. Pretty simple. You can add on to that if you want. For the one video sampler, I just kept the settings default for here and I thought the results were pretty good. You can change that around if you want. I didn't think there was a need for that. And of
course, your final output will pop up right here along with the audio from your original reference if you did have that. Now, an additional little section
that. Now, an additional little section down here. If you want to work with long
down here. If you want to work with long generations, so anything past 77 frames, what you can do is select this one context options node and just click controlB to unbypass that. And then you
can set your context frames to try and push past 77. And they have a little explanation for that as well. So, if you do set your context frames right here so that the quality doesn't degrade over a
longer generation, you can go back over to the animate embeds and you just want to adjust this frame window size so that this frame count is reflective of whatever you have right here. And uh
same for you whenever you load your video originally, you want to try and keep things always the same. So, frame
load cap 45, frame window size 45.
And again, it's nice to have like a similar composition. And that's really
similar composition. And that's really about it. Very, very simple. Just load
about it. Very, very simple. Just load
in your models, load in your video, your image, set your text prompt, and click run. Hope you guys enjoyed the video.
run. Hope you guys enjoyed the video.
Let me know in the comments what you'd like to see next. As always, guys, thank you so much for watching. Thank you for supporting and I'll see you in the next
Loading video analysis...