RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs(IEEE,2023,8)
Paper
GitHub
Motivation:It is believed that previous models have focused only on the texture information of the image and neglected the detail information of the face.In this paper,we adopt a multi-scale and cross-attention approach to introduce the semantic information of the model.
It can be divided into two main parts.
- Encoder and Decoder part of the Encoder and Decoder part of the overall similar to the transformer, except that QKV is not used Linear, but Conv2D to map the
-
VQVAE part.VQVAE is performed in the latent space between Encoder and Decoder, and the output of VQVAE is used as Decoder's Q for Cross Attention.The authors believe that the Facial Component Dictionary constructed in between does not contain enough semantic information, and that the ROHQD encoded by the The authors believe that the Facial Component Dictionary constructed between them does not contain enough semantic information, and the ROHQD encoded by VQVAE can contain more detailed information.
-
EDM, Extending Degraded Model, In order to construct blurred datasets similar to the real world, it is necessary to use a model that simulates the degradation process of real world images. In this paper, this EDM stitching and Gaussian noise, fogging and other processes.
The other point: this paper uses a large number of auxiliary loss function, seems to enhance the index, specifically perceptual loss, discriminator loss, identity loss, and discriminator loss is not only used in the image, but also used in the key parts of the face on the discrimination.
TODO: You can pay attention to this EDM construction of datasets when you do super-scored tasks later.