preamble
This article focuses on the first half of neural networks.
Also, I realized that there are still quite a lot of ambiguities in what I wrote in my previous post, although, I've already changed some of them, but, there may still be some that I didn't realize, so try to read it with more understanding.
With the goal being to learn to use neural network development, as far as the math part goes, just get through it if you can.
neural network
Let's learn an example first
First, understand an example in the context of your previous knowledge, and once you understand that example, it's much easier to understand neural networks later.
class NeuralNet1():
def __init__(self, input_size, hidden_size):
super(NeuralNet1,self).__init__()
self,linear1 = (input_size, hidden_size) # xcolumns convert 隐藏层columns
= () #UsedReLU(Rectified Linear Unit) as an activation function
self.linear2 = (hidden_size,1) #隐藏层columnsconvert1columns
def forward(self, x):
out = self.linear1(x)
out = (out)
out = self.linear2(out)# sigmoid at the end
y_pred = (out)
return y_pred
model=NeuralNet1(input_size=28*28,hidden_size=5)
criterion =()
Combined with our previous knowledge, the above code is defining a class that inherits from Module.
Then the initialization function accepts two parameters, the two parameters are the x column, and the hidden layer column, and then define three objects linear1, linear2, relu.
Then forward is the implementation, the middle of the conversion logic is x columns into hidden columns, hidden columns to 1 columns, which in the middle of another activation function, we first do not care what the activation function, anyway, the code structure, probably this logic.
criterion =() is to define the loss function, the full name of BCELoss is Binary Cross Entropy Loss.
ps: Have you noticed that since we started using the model, we haven't used requires_grad to turn on tensor calculations anymore, this is because the model itself turns on [(0.0, requires_grad=True)] during the calculation.
activation function
The activation function is actually a function, it's just a little data conversion of x.
We've already used Sigmoid in our last post to convert data into percentages.
See what the most popular activation functions are below:
# Most popular activationfunctions
# 1. Step function
# 2. Sigmoid
# 3. TanH
# 4. ReLU (I don't know what to use.,Just use this.)
# 5. Leaky ReLU6. Softmax
The pattern of conversion of each activation function x into y
The activation function is used with reference to the following code
import torch
import as nn
import numpy as np
import as F #nnfail,Look for the activation function here.
# methodologies1 (create nn modules)
class NeuralNet():
def __init__(self, input_size, hidden_size):
super(NeuralNet, self)._init ()
self.linear1 =(input_size,hidden_size)
= ()
self.linear2 =(hidden_size,1)
= ()
def forward(self, x):
out = self.linear1(x)
out = (out)
out = self.linear2(out)
out = (out)
return out
# methodologies2 (use activation functions directly in forward pass)
class NeuralNet():
def __init__(self, input_size, hidden_size):
super(NeuralNet,self).__init__()
self.linear1 =(input_size,hidden_size)
self.linear2 =(hidden_size,1)
def forward(self,x):
# F.leaky_relu() #leaky_relu使用methodologies
out = (self.linear1(x))
out = (self.linear2(out))
return out
function (math.)
Let's start with a logical understanding of the following function.
sigmoid,MSELoss,BCELoss
Earlier we used MSELoss as a loss function, and his logic is to find the mean of the y-prediction and the square of the y-difference, as follows:
Later in our example, we replaced MSELoss with BCELoss, at that time we just treat him as a function to find the value of the loss, did not study his logic.
The full name of BCELoss is Binary Cross Entropy Loss His formula is this.
After this conversion the value range of y_pred is already (0, 1).
Later, when writing the example, it was added to do the data transformation during the forward propagation.
The formula for sigmoid looks like this.
softmax and cross_entropy
cross_entropy is the cross-entropy loss function, his formula is like this.
Understanding in the context of the code.
loss = ()
Y= ([0]) #This y is a matrix of one row and one column, but the value 0 indicates the class, e.g. 0=cat, 1=dog, 2=rabbit
#nsamples x nclasses=1x3 1 row and 3 columns
Y_pred_good = ([[2.0,1.0, 0.1]]) # In this predicted y, 2 is the largest, and the index of 2 is 0. So, this predicted y is most likely to be cat
Y_pred_bad = ([[0.5,2.0,0.3]]) # In this predicted y, 2 is the largest, and the index of 2 is 1. So, this predicted y is most likely a dog
11 = loss(Y_pred_good, Y)
12 = loss(Y_pred_bad, Y)
print(())
print(())
_,predictions1 = (Y_pred_good, 1)
_,predictions2 = (Y_pred_bad, 1)
print(predictions1)
print(predictions2)
Projections for multiple categories are shown below:
loss = ()
Y= ([2,0,1]) #This y is a one row, three column matrix, but the values represent the meaning of the classes, e.g. 2,0,1=cat, 0,1,2=dog, 2,1,0=rabbit
#nsamples x nclasses = 3x3 3 rows and 3 columns
Y_pred_good = ([[2.0,1.0, 2.1],[2.0,1.0, 0.1],[2.0,3.0, 0.1]]) # In this prediction y, the indexes of the maxima of the three tensors are 2,0,1, which is the same as the category of cats above, so it's a category of cats because the value of y represents cats.
Y_pred_bad = ([[0.5,2.0,0.3],[0.5,2.0,0.3],[0.5,2.0,0.3]]) # This prediction doesn't match with Y, so it's a bad prediction
11 = loss(Y_pred_good, Y)
12 = loss(Y_pred_bad, Y)
print(())
print(())
_,predictions1 = (Y_pred_good, 1) #values, indices = (input, dim)
_,predictions2 = (Y_pred_bad, 1)
print(predictions1)
print(predictions2)
Softmax activation function
Suppose you have a vector of model outputs [2.0, 1.0, 0.1], which you can convert to [0.7, 0.2, 0.1] by applying the Softmax function, representing the probability distributions of the individual categories.
The formula is as follows:
Understanding in the context of the code:
# Previously the predicted y's were converted to probability values between 0 and 1, which can now be handled with softmax
# softmax
def softmax(x).
return (x)/((x), axis=0)
x = ([2.0, 1.0, 0.1])
outputs = softmax(x)
print('softmax numpy:', outputs)
softmax use of torch.
x= ([2.0,1.0,0.1])
outputs = (x, dim=0)
print(outputs)
CrossEntropyLoss will internally request the execution of the Softmax function first, before calling its own computational logic (which is the logarithmic computation one).
Portal:
Learning Artificial Intelligence from Zero - Python-Pytorch Learning (I)
Learning Artificial Intelligence from Zero - Python-Pytorch Learning (II)
Learning Artificial Intelligence from Zero - Python-Pytorch Learning (III)
Learning Artificial Intelligence from Zero - Python-Pytorch Learning (IV)
Learning Artificial Intelligence from Zero - Python-Pytorch Learning (V)
Learning Artificial Intelligence from Zero - Python-Pytorch Learning (VI)
Note: This post is original, please contact the author for authorization and attribution for any form of reproduction!
If you think this article is still good, please click [Recommend] below, thank you very much!
/kiba/p/18369584