ComfyUI Basic Tutorial (II) -- Stable Diffusion An introduction to the basic workflow and common nodes of the Vincennes diagram.

The previous article explained that the first time you start ComfyUI, it automatically opens a very basic workflow for the graphic arts. In fact, we can then use the menu options, or the shortcut keyctrl + Dto open this default workflow. The default workflow is as follows:

This is one of the most basic literate graph workflow, this article gives you a basic understanding of ComfyUI workflow by explaining this workflow.

I. Core nodes of the Vincennes diagram workflow

The previous pit padding this article has recommended the principle of the Akiba big brother to explain the video, if you do not understand and have not seen the friends, recommended to see, PortalThe video of Mr. Akiba on B-station。
Those who have a rough understanding of the principle should know that the big model is to process the image in the latent space, and the rendered image that can be recognized by human beings becomes the pixel image. The input condition information, after encoding, into the latent space, in the latent space sampler to remove noise, and finally after decoding, the output pixel image. So the sampler is one of the most central nodes in generating an image, and the following describes the nodes in the default workflow in turn, starting with the sampler.

1.1 [K-sampler]

There are various sampler nodes, with d being the simplest in the default workflowKSample. The parameters of the K sampler are described below:

Random Seed: the value of the graph will be displayed for each generated graph, default is 0; options are randomize every time, fixed, increment every time, decrement every time.
Steps: It is the number of noise removal iterations in the latent space, which is usually set to be around 25-40. The larger the number of steps, the more detailed the generated image is, but the longer it takes to generate the image, and the effect is not so obvious after exceeding 35, while the smaller the number of steps, the poorer the quality of the generated image is in general. It is necessary to keep trying to determine the most suitable value for each large model.
CFG: Cue word relevance. The larger the parameter, the closer the picture effect is to the cue word. The smaller the parameter, the more room for AI to play, and the more difference between the picture effect and the cue word. Generally, it is good to set the parameter around 10, the default is 8.
Sampler and scheduler: the sampler is used in conjunction with the scheduler, which generally uses the optimized/up-to-date samplers: euler_ancestral (euler a for short), the dpmpp_2m series, the dpmpp_3m series, and the scheduler, which generally chooses normal or Karras.
Noise Reduction: Noise Reduction: It has something to do with the number of steps, 1 means we are doing 100% of the steps entered above, 0.1 means 10%, so you don't need to know that, just fill in 1 by default.

1.2 [Master Model Loader]

The left input item of the K sampler node requires an input model, so a master model loader needs to be added thatLoad Checkpoint,。
The master model loader has only one parameter that sets a master model. The master model determines the overall quality of the outgoing map, the main style.
The main model loader has three output items, where the model output, which is the model input to the K sampler, needs to connect these two points with a line.

1.3 [Clip Text Encoder]

K sampler left input items need to be positive conditions and negative conditions, the so-called positive conditions is through the instruction, tell the sampler, I want to generate the graph contains what elements, negative conditions is through the instruction to tell the sampler, I want to generate the graph does not contain what kind of elements. Since it is a graph, of course, through the input text prompt word to send instructions, note that the prompt word here must be in English, (English level is not good students do not worry, later will explain the translation node). However, the normal input text, can not be directly recognized by the sampler to use, you need to go through the coding, converted to the sampler can recognize the instructions. Here is through theCLIP Text Encode (Prompt) conditioning Coding. The two Clip text encoders here serve the same purpose, the text encoder, which needs to be used in conjunction with the model, so here its input should be connected to the Clip point of the main model's output item. the output conditions of the Clip text encoder are connected to the positive and negative conditions of the K-sampler's input, respectively.

1.4 [Empty Latent]

The K-sampler input is now one Latent short, so here is an empty Latent as an input.
Empty Latent Image The node has three parameters, width, height specifies the width and height of the image generated in the latent space, note that the unit here is pixels. Batch size, meaning how many images are generated at a time.

1.5 VAE Decoding

The left input and parameters of the K-sampler are filled out, then the K-sampler needs to output the result after generating the image, as mentioned earlier, the image in the latent space is in the form of a digital signal, which needs to be decoded and converted into a pixel image, which is needed here.VAE Decode。
The VAE decoder has two inputs, the Latent input point is connected to the output point of the K-sampler.
Another model used for AVE, generally the main model is embedded VAE model, connect to the main model can be, in the default workflow, is directly connected to the main model loader output item VAE. of course, you can also take the initiative to load additional VAE model.

1.6 [Save Image

After the VAE decodes the image, it is finally output to theSave Image Nodes can be previewed and saved. The default save path, in the ComfyUI installation directory of theoutput Folder.
This node has a parameter that sets the prefix of the save filename.

II. ComfyUI model knowledge

2.1 Differences in Model Suffix Names

There are many suffix names for models. The common ones are.ckpt、.pt、.pth、.pkl、.safetensorsetc.

CheckPoint is a concept, in the model training process, there is no guarantee that the training will be successful, there may be a variety of factors that lead to failure, and training success is a pseudo-concept, some algorithms model training step is not the more the better, too much training will be the case of overfitting. So what we can do is to train a certain number of steps, save the training results once, similar to the archive. On the one hand, use the results of a certain training, feel the effect meets the expectations, you can stop training, and finally the model can be used for publishing. On the other hand, if the training fails in the middle of the process, we can start the training from a certain CheckPoint.

.ckpt is a shortened version of checkpoint, just a shortened version of the word. Don't think that the model with the .ckpt suffix is the main model, it has nothing to do with it. .ckpt is the format used to store model parameters in TensorFlow, the deep learning framework released by Google.
.pt is the format in which model parameters are stored by PyTorch, the deep learning framework published by Meta (facebook). PyTorch stores models in formats other than .pt, such as .pth and .pkl. There is no fundamental difference between .pt and .pth. .pth and .pkl are just an extra step of serialization using python.
The .safetensors format is more secure as you can see from the name. As mentioned before, if you can recover training from a CheckPoint, you need to save some training information in the .ckpt model, such as model weights, optimizer state, Python code, etc. This will easily lead to information leakage, and it is easy to be implanted with malicious code. Therefore, .ckpt models are not safe, and the size of .ckpt models is relatively large. .safetensors is a new model storage format introduced by huggingface, specially designed for StableDiffusion, it only contains model weight information, the model size is smaller, more secure, and faster loading, usually used for the final version of the model. The .ckpt model is suitable for fine-tuning and retraining the model.

2.2 Classification of models

There are many kinds of models used in ComfyUI, in order to facilitate understanding and learning, we can categorize the models, which can be roughly divided into the following categories:
1. checkpoint Main model, big model, bottom model, base model, etc. A comprehensive algorithmic ensemble that maintains the characteristics of the domain is eventually formed through specific training, which can accurately match your needs. For example, photorealistic models, 3D models, secondary models, interior design models, etc. Checkpoint's training is difficult, requires a large dataset, and generates a large volume, taking up several gigabytes of disk space.
2. Lora Lora model is a kind of supplement to the big model, which can fine-tune a single feature of the generated images, such as generating character images with the same facial features, wearing a specific clothing, having a specific painting style, etc. Lora model is small in size, usually a few tens of hundreds of megabytes, and individual hosts can train their own needs of Lora model.
3. VAE VAE models can be used as filters, and there are currently two mainstream VAE models for Stable Diffusion, which are used in the secondaries.kl-f8-anime2VAE, Realistic style usevae-ft-mse-840000-ema-pruned。
4. EMBEDDING The text embedding model, applied to cue words, is a class of trained ensembles of cue words that are mainly used to improve the picture quality and avoid some bad images. For example:badhandv4、Bad_picture、bad_prompt、NG_Deep Negative、EasyNegative, these are frequently used reverse cue word embedding models that can be used alone or in combination.
5. Other For example, controlnet model, ipadapter model, faceswap model, zoom model, and so on.

There are many versions of the Stable Diffusion model. The common versions are SD1.5, SD2.0/2.1 and SDXL, among which the SDXL model generates the highest quality images, and accordingly has relatively high hardware requirements, while SD1.5 is the most ecologically sound, and these two versions are the most used.
It is important to emphasize that the versions of the different types of models need to correspond to each other, e.g., if the main model is selected as SDXL version, Lora will not work properly with SD1.5 model.

2.3 Model storage path

The installation directory of ComfyUI has a special folder for storing models. You only need to put the downloaded models into the corresponding folder.

2.4 Where to download the model

Recommend two mainstream model download addresses:
Station C:/
Hold the face:/

Any model you want can be downloaded from these two sites.
C The station lookup model can be filtered for categorization

2.5 Architecture of the model

Architectural categorization of common models:

Latent Diffusion Model Architecture Classification

	text encoding	denoising spread	codec	morph
SD1.5	CLIP	UNET	VAE
SDXL	ClipL/ClipG	UNET	VAE	Kolors/Pony/Playground
SD3	ClipL/ClipG/T5	Dit	VAE	DIT Hybrid/AuraFlow/Flux

SD Fast Model

LCM	Turbo	Lightning
Latent Consistency Models (LCM), a new generation of generative models developed by Tsinghua University, with 5-10 times faster image generation, LCM-SD1.5, LCM-SDXL, LCM-Lora, Animatediff LCM, SVD LCM.	Official new distillation scheme based on SDXL1.0, 1-4 steps to generate high quality images, network architecture consistent with SDXL. Only for SDXL, SD1.5 does not have this type of model.	The diffusion distillation method refined by ByteDance on the basis of SD1.0 in combination with progressive antidistillation is also essentially the SDXL model

The knowledge about the model architecture is complex and I have only a superficial understanding of it. Not knowing this part does not have much impact on drawing with ComfyUI, but knowing this architectural information will give you some insight into the compatibility aspect of the model, and will help to solve the problems caused by the model version and compatibility later on in the process of using it.

III. Brief introduction of other commonly used nodes

The installation of the plugin has been described in the previous article, if you are not sure, you can go ahead and check it out.

3.1 Chinese Plug-in AIGODLIKE-COMFYUI-TRANSLATION

With the Autumn Leaves Integration Pack, the Chineseization node is installed by default. The purpose is to Chineseize the menu, settings and other interface elements. After installing the Chinese node, restart ComfyUI , and then find the language bar in the settings, switch to Chinese can be.

3.2 Cue Translation Plugin ComfyUI_Custom_Nodes_AlekPet

As mentioned before, Clip Text Encoder requires English prompts. If your English is not good, you can install the Clip Translator plug-in.

Usage:

Mouse over the Clip text encoder node and right click -> Convert to input -> Convert text to input.
Convert parameters to input nodes.
Create a new Prompt Word Translation node.
In the margin, right click -> New node -> Alek node -> Text -> Translate text (advanced)

Alternatively, double left-click in a blank space and typetanslate It will automatically match out the nodes and select the correct one.
Connecting cue word translation nodes to the Clip text encoder

This way we can fill in the Chinese prompts in the prompt translation node. It should be noted that the prompt translation plugin uses Google Translate and requires an extranet environment.

Tips for use:

Nodes can be copied and pasted. For example, if you need to add the same node for a negative cue word, just copy one.

3.3 Metanodes primitive

The diagrams represent decimals, text, booleans, integers, and multi-line text, respectively.
A meta-node, in simple terms, outputs the original input data as is. So what does this do, most typically reuse.

Important note: A node can have only one input point, but multiple output points.

Tips for use:
All node parameters and forward node component inputs are interconvertible. Many times it is possible to take the same parameters, convert them to input nodes, and then use common meta-component inputs.

3.4 Preview Image node Preview Image

As I mentioned earlier, when we save an image node, it is automatically saved to the output folder, but sometimes we just want to see what the image looks like, and we don't want to save it. Sometimes we just want to see what the image looks like, but don't want to save it. For example, we want to see an image of the intermediate process, or we are not sure if the image is what we want. In this case, we can use the preview image node.

postscript

In this paper, we introduce the core nodes of the default Vincennes graph workflow. It also introduces the model classification in Stable Diffusion. More nodes will be introduced in the following articles, please stay tuned.