Detailed example of a simple convolutional network
Suppose there is an image and you want to do image classification or image recognition, define this image input as\(x\), which then identifies whether there is a cat in the picture, denoted by 0 or 1, is a classification problem to build a convolutional neural network suitable for this task. For this example, a smaller picture was used, with a size of 39 x 39 x 3, which was set up to make some of the numbers work better. So\(n_{H}^{[0]} = n_{W}^{[0]}\)that is, both height and width are equal to 39.\(n_{c}^{[0]} =3\), i.e., the number of channels in layer 0 is 3.
Assuming that the first layer uses a 3×3 filter to extract features, the\(f^{[1]} = 3\), because of the 3×3 matrix at the time of the filter.\(s^{[1]} = 1\),\(p^{[1]} =0\)So the height and width usevalidConvolution. If there are 10 filters, the activation value of the next layer of the neural network is 37 x 37 x 10. 10 is written because 10 filters were used, and 37 is the equation\(\frac{n + 2p - f}{s} + 1\)The result of the calculation of the\(\frac{39 + 0 - 3}{1} + 1 = 37\), so the output is 37 × 37, which is availdconvolution, which is the size of the output result. The first layer is labeled\(n_{H}^{[1]} = n_{W}^{[1]} = 37\),\(n_{c}^{[1]} = 10\),\(n_{c}^{[1]}\)is equal to the number of filters in the first layer, which (37 × 37 × 10) is the dimension of the activation values in the first layer.
Suppose there is another convolutional layer, this time using a filter that is a 5×5 matrix. In the labeling method, the next layer of the neural network\(f=5\)namely\(f^{\left\lbrack 2 \right\rbrack} = 5\)The step size is 2, i.e.\(s^{\left\lbrack 2 \right\rbrack} = 2\)。paddingis 0, i.e.\(p^{\left\lbrack 2 \right\rbrack} = 0\), and there are 20 filters. So its output will be a new image, this time the output is 17×17×20, because the step size is 2, the dimension shrinks very fast, the size reduces from 37×37 to 17×17, which is more than half the size, and the filters are 20, so the number of channels is also 20, and the 17×17×20 that is the activation values\(a^{\left\lbrack 2 \right\rbrack}\)dimensions. Therefore\(n_{H}^{\left\lbrack 2 \right\rbrack} = n_{W}^{\left\lbrack 2 \right\rbrack} = 17\),\(n_{c}^{\left\lbrack 2 \right\rbrack} = 20\)。
to construct the last convolutional layer, assuming that the filter is still 5 × 5 with a step size of 2, i.e.\(f^{\left\lbrack 2 \right\rbrack} = 5\),\(s^{\left\lbrack 3 \right\rbrack} = 2\), the computational process is skipped and the final output is 7 × 7 × 40, assuming 40 filters are used.paddingis 0, 40 filters, and the final result is 7 × 7 × 40.
At this point, this 39×39×3 input image is processed and 7×7×40 features are extracted for the image, which is calculated to be 1960 features. This convolution is then processed and can be smoothed or expanded into 1960 units. The smoothing process can output a vector whose padding islogisticReturn unit orsoftmaxThe return unit depends entirely on whether one is trying to recognize whether there is a cat on the picture or whether one is trying to recognize the\(K\)One of the different kinds of objects with\(\hat y\)represents the predicted output of the final neural network. To be clear, this last step is to process all the numbers, i.e. all 1960 numbers, by expanding them into a very long vector. To predict the final output, this long vector is populated with thesoftmaxin the regression function.
This is a typical example of a convolutional neural network, and determining these hyperparameters is more work when designing a convolutional neural network. One has to decide the size of the filter, the step size,paddingand how many filters are used.
And one thing to grasp is that as the neural network computational depth continues to deepen, the image usually starts out a bit larger, with initial values of 39×39, and the height and width stay the same for a while, and then taper off as the network depth deepens, from 39 to 37, to 17, and finally to 7. And the number of channels increases, from 3 to 10 to 20, and finally to 40. This trend can be seen in many other convolutional neural networks, this trend can also be seen. This is the first example of a convolutional neural network that talks about how to determine these parameters.
A typical convolutional neural network usually has three layers, a convolutional layer, often used as aConvto label. The previous example, using theCONV. There are two other common types of layers, one is the pooling layer calledPOOL. The last one is a fully-connected layer withFCIndication. While it is possible to build good neural networks using only convolutional layers, most neural lookout architects still add pooling and fully connected layers. Fortunately, pooling and fully connected layers are easier to design than convolutional layers.