VAE model brief analysis and essence (principle and code)

1. Preface

This blog is mainly used to recordVAEprinciple part.
On the one hand, it will facilitate your own review and study in the future, and on the other hand, it will also facilitate everyone’s learning and communication.
If there is something wrong, please point it out in the comment area so that you and I can make progress and learn together!

Figures are quoted from4part of the blog! ! ! ! ! ! !

2. Text

This blog combines the strengths of various blogs and is relatively concise and easy to understand: because some blogs explain the principles clearly, but the loss function part is more confusing, and some have clearer formulas, but the principles are more confusing. From my personal perspective, I put what I think Let’s make a summary in a more intuitive place.
AE（Auto-Encoder)autoencoder
VAE（Variational Auto-Encoder) variational autoencoder
Where is the variation?

2.1 Overall structure

The encoder just wants to throw an object into the latent space, which is equivalent to the encoding process. It extracts the input features and represents them in the form of vectors to facilitate calculations.

The structure of a normal encoder:

VAEThe structure of:

2.2 Main purpose

Assuming that any portrait picture can be uniquely determined by the values of several features such as expression, skin color, gender, hairstyle, etc., then after we input a portrait picture into the autoencoder, we will get the expression, skin color and other features of this picture. The vector X' of the value, and then the decoder will reconstruct the original input portrait picture based on the value of these features.

But if you enterMona Lisaof photos, setting the smile features to a specific single value (It’s equivalent to deciding whether Mona Lisa smiled or not.) is obviously not as good as setting the smile feature to a certain value range (for example, setting the smile feature to a number in the range from x to y, and there are already values in this rangeIt can mean that Mona Lisa is smiling and there are numerical values that can mean that Mona Lisa is not smiling.) is more appropriate, so:

A certain event can be described as a probability distribution:

Then finally sample the so-called latent variableZ

2.3 Loss function

Let’s take a look at the network structure:

The loss function of VAE is two items, and the reconstruction loss (reconstruct loss) and the kl divergence regularization term (kl loss), respectively corresponding to the two purposes that the model training process hopes to achieve.

2.3.1

reconstruction loss (reconstruct loss) hopes that the difference between the results generated by VAE and the input will be small.

2.3.2

kl divergence regular term (kl loss) hopes that the latent variables generated by the encoder conform to the standard normal distribution as much as possible.
Why?~~For details, please check the formula derivation on other blogs. Since this article focuses on simplicity, the formula will not be repeated.~~

This is probably the picture below:

2.4 Code implementation

This ispytorchThe code implementation process inside:

class VAE():
    def __init__(self):
        super(VAE, self).__init__()

        self.fc1 = (784, 400)
        self.fc21 = (400, 20)
        self.fc22 = (400, 20)
        self.fc3 = (20, 400)
        self.fc4 = (400, 784)

    def encode(self, x):
        h1 = (self.fc1(x))
        return self.fc21(h1), self.fc22(h1)

    def reparameterize(self, mu, logvar):
        std = (0.5*logvar)
        eps = torch.randn_like(std)
        return mu + eps*std

    def decode(self, z):
        h3 = (self.fc3(z))
        return (self.fc4(h3))

    def forward(self, x):
        mu, logvar = ((-1, 784))
        z = (mu, logvar)
        return (z), mu, logvar

Give a simple calculation graph:

If you compare the code and the calculation diagram and eat them together, the effect will be better!
Note: The code part and the diagram part of the reparam correspond to this part of my previous structure diagram:

3. Postscript

That’s it for this blog. I will continue to add more in the future to ensure that you will roughly understand the principle after reading it and will not be confused like other blogs, because the author has extracted the essence of many blogs.
zsy 2025.1.21

4. Acknowledge

The blogs referenced in this article are as follows:
/p/64485020
/p/578619659
/p/345360992
/A2321161581/article/details/140632339
The following blog post is very detailed:
/archives/5253