ComfyUI Basic Tutorial (3) -- Applying Controlnet to accurately control image generation

I. Preface

Have you ever seen pictures like the one below:

It looks ordinary, but when you look at it from a distance, or squint your eyes, you'll see that there's something special hidden in this picture. This is the use of ControlNet in Ai Painting to achieve relatively more precise control over the image.
The previous post described the basic workflow and the most basic core plugin usage of Text Generator Graphics. Through the prompter can describe the image we want to generate. But it is not possible to accurately describe descriptive images through text. Just as you say "a girl, wearing a pink dress", a hundred people hear this sentence, there are a hundred kinds of information generated in the mind, everyone thinks of different, no matter how you add scenes, details, modifiers, it is impossible to unify the understanding of all people. However, ControlNet, which is going to be introduced today, can guide the generation process of Stable Diffusion images to a certain extent and realize some special effects.

II. Concepts related to ControlNet

2.1 What is ControlNet

ControlNet is a neural network that controls pre-trained image diffusion models such as Stable Diffusion. It allows input of conditioning images and then uses the conditioning images to control image generation. There are many types of conditioning images here, such as graffiti, edge maps, pose keypoints, depth maps, normal maps, segmentation maps, and so on. All these input conditions can be used as conditional inputs to guide the content of the generated image.

To be clear, ControlNet is an algorithm for controlling the image generated by a pre-trained image diffusion model, and can be used with the Stable Diffusion model, but not only in Stable Diffusion.

2.2 ControlNet Usage Scenarios

There are many scenarios in which ControlNet can be used. For example, if you want to control the pose of a character in a generated image, you can give a reference map and use the pose control model to extract the character's pose and control it; for example, if you want to generate an interior design map, you can control it according to a given normal map; for example, you can simply draw a graffiti sketch, and then let AI draw according to this sketch, etc. Since AI drawing is a creative work, we can't list all the scenarios here. Since AI drawing is a creative work, it is impossible to list all the scenarios, here you have a preliminary impression, and then you can try different ControlNet models, and play with your imagination, and use them in the appropriate scenarios.

2.2 ControlNet official address

The official address of ControlNet:/lllyasviel/ControlNet

About the principle of ControlNet, there are some explanations on the official website. My level is limited, so I will try my best here to avoid misleading you. If you are interested, you can check the official website or search for other documents.

How to use ControlNet

For those who paint with AI, it is the application that counts. An example of how to use ControlNet follows.

3.1 Installing the ControlNet plug-in

ComfyUI-Advanced-ControlNet
The first step is to install the ControlNet node. If you are using the Autumn Leaves integration package ComfyUI, it comes with a ControlNet node. If not, you can install the plug-inComfyUI-Advanced-ControlNet. I believe that friends who can see here have mastered the method of plug-in installation, if there will not be friends, you can go back to see my previous article, the method of installing plug-ins previous article has to say.

3.2 Model Download

Emphasis: The ContrlNet model is versioned to correspond to the underlying big model. For example, if you use SD1.5 for your base model, then your ControlNet model should also choose SD1.5 version, it will report error if the model does not match.

Stable Diffusion version 1.5 of the ControlNet model is officially available for download:
/lllyasviel/ControlNet-v1-1/tree/main

There are so many models in here that you can basically tell what the model does by its name.

Download the model and put it in the following path:ComfyUI\models\controlnet

If you need to download SDXL or other versions of ControlNet models, you can do so by searching the model download site.

all-in-one model

controlnet-union-sdxl-1.0
This model is only available for the SDXL version and has the advantage that it integrates 12 types of ControlNet models, eliminating the need to download them one by one, and saving more memory.
Official address:/xinsir/controlnet-union-sdxl-1.0

3.3 ControlNet Usage

This is followed by a practical example showing how ControlNet can be used.

3.3.1 Applying ControlNet Nodes

First of all, it is not necessary to build the workflow completely from scratch, so let's just load the default Vincennes workflow and modify it on top of it.

Then we add an application ControlNet node.
New -> Conditions -> ControlNet There may be several related nodes in this.

Here are the most commonly usedControlNet Applications Advanced This node. Mastering this node, the other nodes are used in much the same way, so you can try them on your own.

ControlNet Applications Advanced cap (a poem)ControlNet Applications Compared to that, there are more negative condition controls, as well as a start time and an end time. We can add both to visualize it more.

The positive and negative conditions in the inputs and outputs are well understood here, but let's talk about the start time and the end time, which refer to the time when ControlNet intervenes to control the entire process of removing noise from the sampler and generating the picture, with a minimum of 0 and a maximum of 1. If the start time is set to 0, and the end time is set to 1, it means that the ControlNet intervenes in the process of generating the picture. ControlNet intervenes from beginning to end.

Look again at the remaining parameters and input conditions:

dissociation: is the strength of ControlNet. The larger the value, the greater the ControlNet reference strength of the generated image, the closer to the reference effect, but in practice, it is often not the more obvious the better the reference effect. For example, in the introduction to the text effect of light and shadow, we use the depth map of the text as a ControlNet reference map to generate landscape photos, when the ControlNet strength is too large, the generated image, the text is very obvious, on the contrary, the landscape photos play too little space, often the effect is not close to people's expectations. In the actual use of the process, we need to adjust the image according to the actual generation.
ControlNet: This input needs to be connected to the ControlNet loader. The models we downloaded earlier need to be selected to use one as needed.
imagery: Here you need an input for an image that has been processed by the ControlNet preprocessor.

3.3.2 ControlNet Model Loader

There are also many different ControlNet model loaders, so here is a selection of the most commonly used ones.

There is only one parameter to select the model we need. As emphasized earlier, the model version here, should match the base big model you choose.

3.3.3 ControlNet Preprocessor

ControlNet has a wide variety of preprocessors, and the choice of which one corresponds to the loaded ControlNet model. For example, the model selected earlier loads theopenpose type, here it is necessary to select theFace and posturepreprocessor below.

Here we choose the DW attitude preprocessor to illustrate.

The parameters here are enabled if the detection item is needed and disabled if it is not. The resolution can be set according to the base big model. Generally SD1.5 set 512, SDXL set 1024. BBox detection and pose prediction are used to detect various parts of the body, and they are also used in other scenarios, the difference between different models, you can understand by yourself.

The first time you use one of the ControlNet preprocessors it will take a little longer to automatically download the required model from the network, so please keep the network open and you can see the download task and progress in the ComfyUI launcher's console. If the model is available locally, it will be loaded soon afterward.

To input an image, you need to go in and out of a reference image.

Aux Integrated Preprocessor

Here is another all-in-one ControlNet preprocessor, the Aux Integrated Preprocessor.

It only has two parameters, the preprocessor, you need to choose your own, for example, choose DW posture preprocessing, and the above effect is basically the same. The same need to enter a reference map.

Image Load Node

The input, the reference image, is an image that is loaded from the local area or the network and output to the ControlNet preprocessor. Here the image loading node is used.

Load Local Image
Load images usingLoadImage Nodes are fine.New->Image->Load Image

strike (on the keyboard)choose file to upload , select a local picture, or just drag a picture onto it, in practice, dragging and dropping a web picture is also possible.
Load Network Image
Sometimes we know the address of a web image, then we can use a node that specializes in loading web images. We can search for a local node and if we don't have one, we need to install it ourselves. I have many plugins here that provide nodes for loading network images.

These are all the nodes needed for ControlNet, just add them all and connect them correctly.

3.4 Demonstration of Examples

3.4.1 Character pose control

Here's a picture locally

The following generates an image of a girl with a pose that is consistent with this image. The workflow is as follows:

As with the nodes explained above, here, note that it is the positive and negative conditions of the Clip text encoder outputs that are connected to the inputs of the ControlNet application node, and the outputs of the ControlNet application node are connected to the inputs of the K sampler, respectively.
In addition, the ControlNet preprocessor is output to a preview image node, so that we can easily view the results after processing with the preprocessor.

large model chosen byxxmix9Realistic Realistic style, positive cue words will simply fill in the1 girl, the final result generates a girl with a pose that is consistent with our reference image. Of course this image is not perfect, it is rather blurry when zoomed in and the details are lacking. If we want to generate a higher quality image, we need very good cue words, and well need some other nodes. Such as adding Lora model (later article explains how to use it), zoom processing nodes, processing of local into details (e.g. hand restoration, face restoration), and so on.

3.4.2 Local redraw

There are many ways to do local repainting, and this example will be a simple workflow for doing it with ControlNet.
The effect you want to achieve is a partial redraw of the image below, such as giving the girl different pants.

First of all, the basic idea is to load the image above, and then create a mask (also known as a mask in PS) of the area that needs to be redrawn, and then redraw the masked area, which is the part of the girl's pants in the picture. At this point, the model using ControlNet isinpaint Type. For this example, proceed next using the SDXL version of the model, realizing the effect step by step.
Picture first:

The workflow is basically similar to the previous one using openpose. I chose the union version of the SDXL model. Here, when loading the ControlNet model, the ControlNet loader cannot connect directly to the application ControlNet node, because we are using the union version of the model, which integrates the functions of 12 types of models, and you need to choose which type to use. This should be easy to understand.
So here's the connection toSetting the UnionControlNet type This node then connects the output to the application ControlNet node.

In setting the UnionControlNet type, based on the repaint, we choose the repaint type. At the same time, the corresponding preprocessor node should select theInpaint internal pre-processor. This preprocessing requires an input image and a mask. As we just said, we need to create the mask region and redraw the mask region.

How do I create an image mask?
Right click on the image and click on "Open in Mask Editor".

Then it will automatically open the Mask Editor, the operation is very simple, the bottom left toolbar is in order to clear the mask, set the brush width, set the mask transparency, set the mask color. If you paint wrong, click clear, or hold down the right button to paint. Finally remember to click the bottom right corner to save.

Also remember to select the SDXL version for the larger models, for SDXL the resolution can be set to around 1024. For the forward prompt, write "1 girl,short red_dress,". Finally, run it:

If nothing else, accidents happen. The generated image is pitch black, the preview preprocessor processed image is clear enough to see the image and the mask.
Don't panic, here, there is a small detail, that is, inpaint internal preprocessing processed image can't be directly input to the ControlNet application node, it needs to be converted to RGB first.

Run it again and you can see that the character in the picture has changed into a red dress.
Ok, no problem this time.

Note that this Image to RGB node, is an image to RGB under the WAS plugin. Do not add as other custom plugins in theConvert Image To RBG I've tested it myself, it doesn't work.

We can further refine this workflow. In the above workflow, the Latent of the K-sampler is passed in as an empty Latent, and the width and height of the image is set by us, so there are two features:

It is possible that the value we set does not match the width and height of the original image
An empty Latent means that the entire diagram is completely redrawn. I'll write an article about this later, about the differences between the various ways of redrawing.

Since this is a partial redraw of the image, we can actually encode the original image using the VAE encoder and convert it to Latent for output to the sampler. Here we use the VAE internal complementary encoder. On the input side, we need to pass in the image, VAE model, mask, and output Latent to connect to the sampler, and the VAE internal complementary encoder also has a parameter, "mask extension", which is similar to the feathering in PS, so that the redrawn part can blend with the original image better. This value can be used in the process of continuous modification to try to achieve their own satisfaction on the effect.

The final complete workflow is as follows:

3.5 ControlNet Heap Usage

Sometimes we want to control multiple images with different ControlNet types, the most primitive way is to add different ControlNet related nodes in sequence, and connect them in sequence. This will look like a lot of nodes and not very intuitive, ComfyUI can use ControlNet heap to unify the process. ComfyUI can use ControlNet heap to unify the process, which is also very simple to use:

If you don't have this custom plugin, you'll need to download it additionally, see this node style, which is very simple to use. It supports up to three ControlNet together, integrates the ControlNet model loader, and provides parameters such as switch, intensity, and intervention time respectively. If 3 is not enough, you can continue to connect ControlNet heap nodes in series. In general.

IV. Concluding remarks

ControlNet is very important in AI painting. In most production environments, images are not generated randomly, and with ControlNet, we can control the images more accurately, so that they can be modified and improved to meet the actual needs. This article only explains how to start using ControlNet, more knowledge needs to be mastered through a lot of practice and exploration.
Going back to the opening image, how can ControlNet be used to generate the effect of integrating text into the image?
Leave it to everyone to learn and explore on their own. Finally, the Mid-Autumn Festival is coming soon, so I wish you all a happy Mid-Autumn Festival in advance!