AI image magnification tools, such as ESRGAN, are critical to improving the quality of AI images generated by Stable Diffusion. They are so widely used that many Stable Diffusion graphical user interfaces (GUIs) have built-in support for them.
Here we will learn what image amplifiers are, how they work, and how to use them.
Why do we need an image amplifier?
The default image size for Stable Diffusion v1 is512 x 512 pixels。That's pretty low by today's standards.Take the iPhone 12, for example. Its camera produces a 12 megapixel image - that's 4032 x 3024 pixels. Its screen displays 2,532 x 1,170 pixels, so the quality of a Stable Diffusion that isn't zoomed in is poorer and not suitable for use in modern applications.
Also, if the resolution of the image is too low, the image lacks a lot of details and is inconvenient to use.
Why can't we use conventional amplifiers?
Conventional amplifiers can certainly be used, but the results won't be as good.
Traditional algorithms used to resize images, such as Nearest Neighbor Interpolation and Lanczos Interpolation, have been criticized for using only the pixel values of an image. They expand the canvas and fill in new pixels by performing mathematical operations using only the pixel values of the image. However, if the image itself is corrupted or distorted, these algorithms cannot accurately fill in the missing information.
How do AI amplifiers work?
AI amplifiers are neural network models trained using large amounts of data. They can fill in details while enlarging an image.
During training, images are artificially corrupted to simulate real-world degradation. The AI amplifier model is then trained to recover the original image.
A large amount of prior knowledge is embedded in the model. It fills in missing information. It is just like how humans can remember a person's face without studying it in detail. We focus on a few key features.
How do I use an AI amplifier?
Let's learn how to use the AI amplifier in the AUTOMATIC1111 WebUI for Stable Diffusion.
switch toExtraspage and selectSingle Image。
Upload the image you want to enlarge tosource。
set upResizeFactor. Many AI amplifiers can natively magnify an image by a factor of 4. So 4 is a good choice. If you don't want the image to be that large, you can set it to a lower value, such as 2.
If your image is 512 x 512 pixels, 2x zoom is 1024 x 1024 pixels and 4x zoom is 2048 x 2048 pixels.
optionR-ESRGAN 4x+, which is an AI amplifier for most images.
check or refer toGenerateStart zooming in.
When finished, the enlarged image will appear in the output window on the right. Right-click on the image to save it.
As you can see on the page there is also an option for upscaler 2, which means you can mix the two amps.
The Upscaler 2 visibility on the back is used to control the scale of the zoom using the upscaler 2 model.
0 means not used at all, 1 means upscaler 2 only.
AI Amplifier Options
Let's take a look at some noteworthy AI amplifier options.
LDSR
Latent Diffusion Super Resolution (LDSR) amplifier was originally released with Stable Diffusion 1.4. It is a potential diffusion model trained to perform amplification tasks. Although it provides excellent quality, it is very slow. I do not recommend its use.
ESRGAN 4x
Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) is an amplification network that won the 2018 Perceptual Image Recovery and Manipulation Challenge. It is an enhanced version of the previous SRGAN model. It tends to preserve details and produce clear and sharp images.ESRGAN is the base model for many other amplifiers.
R-ESRGAN 4x
Real-ESRGAN (R-ESRGAN) is an enhancement to ESRGAN that recovers a variety of real-world images. It simulates various degrees of distortion from camera footage and digital compression.
It tends to produce smoother images compared to ESRGAN.
R-ESRGAN performs best when processing realistic photo images.
Installing a new amplifier
To install a new amplifier in the AUTOMATIC1111 GUI, simply download a model from the amplification model database and place it in the folder.
stable-diffusion-webui/models/ESRGAN
Restart the GUI. your amp should now be selectable from the amp drop-down menu.
Example of enlarged image
Using Extras in AUTOMATIC1111 is just an example of a very simple picture enlargement.
If you want to be using zoom in along with some detail enhancement, then we can consider using SD Upscale for this complex case of modification.
SD Upscaleis a script that comes with the AUTOMATIC 1111 that uses an amplifier to zoom in and then uses image-to-image to enhance detail.
Specific steps for use below:
Step 1. Navigate toImg2imgPage.
Step 2. Upload the image to the img2img canvas. (Alternatively, use theSend to Img2imgbutton sends the image to the img2img canvas)
Step 3. At the bottom of theScriptIn the drop-down menu, selectSD Upscale。
Step 4. commander-in-chief (military)Scale factorSet to 4 to zoom in to 4 times the original size.
Step 5. Set the denoising intensity between 0.1 and 0.3. The higher it is, the more the image changes.
Step 6. commander-in-chief (military)sampling stepsThe number of steps is set to 100. higher steps improve detail.
Step 7. You can use both original and negative tips. If not, use "highly detailed" as a hint.
Step 8. check or refer toGenerate。
Take a look at the comparison results again:
It works well and adds some detail to the effect.
SD Upscale scripts help improve detail and reduce magnification artifacts.
Hires Fix on txt2img page
Another way to zoom in on an image is the Hires Fix in txt2img.
You can choose to zoom in on each generated image on the txt2img page. To do this, you just need to check theHires fix。
Additional options will appear under the checkbox. These options are similar to using theSD UpscaleScript.
The whole Hires. fix process can be understood as we zoom in on an image and then generate a secondary image based on that image.
This Hires steps is the number of steps in our secondary generation, if the value is kept as 0, it means the same number of steps as the original image generation.
Hires fix will zoom in on all generated images as you generate them.
So it slows down the generation of images.
Suggested practice is to generate images first, then send the selected images to img2img for SD enlargement.
ControlNet Tile Upscale
The SD Upscale above can also be used with ControlNet Tile to achieve better detail in the zoom.
Specifically, turn on controlNet in img2img:.
Select Tile for Control type.
preprocessor selects tile_resample.
Model selects control_***tile.
Also turn on SD Upscale and click Generate.
It may take longer, but the results should be the best.
Click on me for more highlights.