Detailed edge detection example
Convolutional operations are the most basic components of a convolutional neural network, using edge detection as an introductory sample. In this blog, it will be seen how convolution is operated.
In the previous blog, it was said how the first few layers of a neural network detect edges, and then, the later layers have the potential to detect parts of an object, and some of the layers further back may detect the complete object, which in this case is the face. In this blog, it will be seen how edge detection can be done in a single image.
Let as an example, given a picture like this and asked the computer to figure out what objects are in this picture, the first thing it might do is detect vertical edges in the picture. For example, the railings in this picture correspond to vertical lines, and at the same time, these pedestrian contour lines are somehow also vertical lines, and these lines are the output of the vertical edge detector. Similarly, one might want to detect horizontal edges as well, for example, these railings are clearly horizontal lines, and they can be detected as well, and the results are here. So how do you detect these edges in an image?
Look at an example that is a 6×6 grayscale image. Because it's a grayscale image, it's a 6×6×1 matrix, not 6×6×3, because there's noRGBThree channels. To detect vertical edges in an image, a 3×3 matrix can be constructed. In the shared idiom, in the terminology of convolutional neural networks, it is called a filter. To construct a 3×3 filter, like this\(\begin{bmatrix}1 & 0 & -1\\ 1 & 0 & -1\\ 1 & 0 & -1\end{bmatrix}\). In papers it is sometimes referred to as a kernel rather than a filter, but in this blog the term filter will be used. Perform a convolution operation on this 6×6 image, the convolution operation is performed with "\(*\)" to represent it and convolve it with a 3×3 filter.
There are some questions about symbolic representation, in math "\(*\)" is the standard sign of convolution, but in thePythonin which this identifier is often used to indicate multiplication or elemental multiplication. So this "\(*\)" has multiple layers of meaning, it is an overloaded symbol, in this blog, when "\(*\)" indicates that the convolution is specified.
The output of this convolution operation will be a 4×4 matrix, which can be thought of as a 4×4 image. Here's an explanation of how the calculation was done to get this 4×4 matrix. To compute the first element, the one in the upper left corner of the 4×4 is overlaid on the input image using a 3×3 filter, as shown below. Then multiply the elements (element-wise products) operation, so\(\begin{bmatrix} 3 \times 1 & 0 \times 0 & 1 \times \left(1 \right) \\ 1 \times 1 & 5 \times 0 & 8 \times \left( - 1 \right) \\ 2 \times1 & 7 \times 0 & 2 \times \left( - 1 \right) \\ \end{bmatrix} = \begin{bmatrix}3 & 0 & - 1 \\ 1 & 0 & - 8 \\ 2 & 0 & - 2 \\\end{bmatrix}\), and then each element of this matrix is summed to get the top-leftmost element, i.e.\(3+1+2+0+0 +0+(-1)+(-8) +(-2)=-5\)。
Add the 9 numbers to get -5. Of course, you can add the 9 numbers in any order, just write the first column, then the second, and the third.
Next, to figure out what the second element is, take the blue square, move it one step to the right, like so, and remove those green markings:
Keep doing the same multiplication of elements and add them up, so it's $0×1+5×1+7×1+1×0+8×0+2×0+2×(-1) + 9×(-1) + 5×(-1) = -4 $.
The next step is the same, continue to move one step to the right and add up the dot product of the 9 numbers to get 0.
Keep moving to get 8 and verify:\(2×1+9×1+5×1+7×0+3×0+1×0+4×(-1)+ 1×(-1)+ 3×(-1)=8\)。
Next in order to get the next row of elements, now move the blue block down, now the blue block is in this position:
Repeat the multiplication of the elements and then add them up. By doing this you get -10. then shift it to the right to get -2, followed by 2, 3. and so on, and so on, after calculating the other elements of the matrix.
To make it a little clearer, this -16 is obtained by the 3×3 area in the bottom right corner.
Thus a 6×6 matrix and a 3×3 matrix are convolved to get a 4×4 matrix. These images and filters are matrices of different dimensions, but the left matrix is easily understood as a picture, this one in the middle is understood as a filter, and the image on the right can be understood as another picture. This one is the vertical edge detector.
Before moving on, one more thing, if you are going to use a programming language to implement this operation, there are different functions for different programming languages, rather than using "\(*\)" to represent convolution. So in a programming exercise, one would use a program calledconv_forwardfunction. If a function is specified in thetensorflowUnder this, the function is calledtf.conv2d. In other deep learning frameworks, as will be seen in later blogsKerasThis framework, in which theConv2DImplementing Convolutional Operations. All programming frameworks have some function to implement convolution operations.
Why does this do vertical edge detection? Let's look at another example. A simple example will be used to make it clear. This is a simple 6×6 image with the left half being a 10 and the right generally being a 0. If you think of it as a picture, the left portion looks white with the pixel value of 10 being the brighter pixel value and the right pixel value being darker, using gray to represent the 0, even though it could be drawn as black as well. In the picture, there is a particularly distinct vertical edge in the middle of the image, and this vertical line is the transition line from black to white, or from white to darker colors.
So, when a convolution operation is performed with a 3×3 filter, this 3×3 filter is visualized as the following, with bright pixels on the left, and then there's a transition, with 0 in the middle, and then dark colors on the right. After the convolution operation, you get the matrix on the right. This can be verified by math operations if you wish. For example, the element 0 in the top leftmost corner is obtained from this 3×3 block (marked by the green box) by an elemental product operation and then summing that\(10×1+10×1+10×1+10×0+10×0+10×0+10×(-1)+10×(-1)+10×(-1)=0\)
. Instead this 30 is obtained from this (marked by the red box), the
\(10×1+10×1+10×1+10×0+10×0+10×0+0×(-1)+0×(-1)+ 0×(-1)=30\)。
If you think of the rightmost matrix as an image, it looks like this. There is a brighter section in the middle that corresponds to a vertical edge checked into the middle of this 6×6 image. The dimensions seem a bit incorrect here, and the detected edges are too thick. This is because the image is too small in this example. If a 1000×1000 image is used instead of a 6×6 image, it will be found that it will detect the vertical edges in the image very well. In this example, the bright spot in the middle of the output image indicates that there is a particularly noticeable vertical edge in the middle of the image. The takeaway from vertical edge detection is that because a 3×3 matrix (filter) is used, the vertical edge is a 3×3 region with bright pixels on the left side, the middle ones don't need to be considered, and dark pixels on the right side. In the middle part of this 6×6 image, the bright pixels on the left and the dark pixels on the right are considered as a vertical edge, and the convolution operation provides a convenient way to discover vertical edges in an image.