Convolutional neural networks: more edge detection

More on edge detection in detail

Already seen vertical edge detection implemented with convolutional operations, in this blog, will see how to differentiate between positive and negative edges, which is actually the difference between light to dark and dark to light, i.e., the transition of edges. Also learn about other types of edge detection and how to go about implementing these algorithms without always thinking of writing an edge detection program yourself.

This 6×6 image, which is brighter on the left and darker on the right, is convolved with a vertical edge detection filter, and the detection results are shown in the center portion of this image on the right.

Now what has changed in this picture? It's colors have been flipped to be darker on the left and brighter on the right. Now points with a brightness of 10 run to the right and points with a brightness of 0 run to the left. If you convolve it with the same filter, you'll end up with -30 in the middle of the plot instead of 30. If you convert the matrix to a picture, it will look like the picture below that matrix. Now the transition in the middle is flipped, and the previous 30 is flipped to -30, indicating a dark-to-light transition instead of a light-to-dark transition.

If you don't care about the difference, you can take out the absolute value of the matrix. But this particular filter does serve to distinguish the difference between these two light and dark variations.

Coming back to more examples of edge detection, have seen this 3×3 filter which detects vertical edges. So, seeing this filter on the right, thought I should have guessed that it allows detection of horizontal edges. As a reminder, a vertical edge filter is a 3×3 area that is relatively bright on the left and relatively dark on the right. Similarly, this horizontal edge filter on the right is a 3×3 area that is relatively bright at the top and relatively dark at the bottom.

Here's a more complicated example, the top left and bottom right are both points with a brightness of 10. If you plot it as a picture, the top right is the darker area, and this side is all points with a brightness of 0. Add shadows to all of these darker areas. And the top left and bottom right would both be relatively bright. If you convolve this image with a horizontal edge filter, you get this matrix on the right.

As another example, the 30 here (the green box-marked element in the right matrix) represents this 3×3 area on the left (the green box-marked portion of the left matrix), which is really brighter at the top and darker at the bottom, so it finds a positive edge here. And here -30 (the purple box-marked element in the right matrix) represents another area on the left (the purple box-marked portion of the left matrix), which is indeed brighter on the bottom and darker on the top, so it's a negative edge here.

Again, what is being used is a relatively small image, just 6 x 6. But these intermediate values, say this 10 (the yellow box marked element in the matrix on the right) represent this area on the left (the part marked by the yellow box in the 6 x 6 matrix on the left). The left two columns of this area are positive edges, the right column is negative, and the values of the positive and negative edges are added together to get an intermediate value. But if this were a very large 1000×1000 checkerboard style image like this one, there wouldn't be these transition bands with a brightness of 10, because the image size is so large that these intermediate values would be very small.

In a nutshell, by using different filters, you can find vertical or horizontal edges. But in fact, for this 3×3 filter, one of these number combinations is used.

But historically, there has been a fair debate in the computer vision literature about what combination of numbers is best, so it is still possible to use this:\(\begin{bmatrix}1 & 0 & - 1 \\ 2 & 0 & - 2 \\ 1 & 0 & - 1 \\\end{bmatrix}\)It's calledSobelfilter, which has the advantage of increasing the weight of the elements in the middle row, which makes the results a bit more robust.

But computer vision researchers also routinely use other combinations of numbers, such as this one:\(\begin{bmatrix} 3& 0 & - 3 \\ 10 & 0 & - 10 \\ 3 & 0 & - 3 \\\end{bmatrix}\)This is called the Scharr filter, and it has completely different characteristics than the previous one, and it's actually a vertical edge detection as well, and if you flip it 90 degrees, you get the corresponding horizontal edge detection.

One of the things that's been learned with deep learning is that when really trying to detect edges in complex images, you don't necessarily have to go to these nine numbers that the researchers have chosen, but you can benefit a lot from it. Think of the nine numbers in this matrix as nine parameters, and you can learn to use backpropagation algorithms later on, and the goal is to understand those nine parameters.

When getting this 6×6 image on the left, convolving it with this 3×3 filter will result in an excellent edge detection. By treating these 9 numbers as parameters of the filter, and by backpropagating them, one can learn this\(\begin{bmatrix}1 & 0 & - 1 \\ 1 & 0 & - 1 \\ 1 & 0 & - 1 \\\end{bmatrix}\)filters, orSobelFilters andScharrFilter. There is another filter, and this one can even outperform any of these previous handwritten filters for capturing data. It can detect edges at 45° or 70° or 73° or even any angle compared to this purely vertical and horizontal edges. So setting all the numbers of the matrix as parameters, feeding them through the data, and letting the neural network learn them automatically, it will be found that the neural network can learn some of the low-level features, such as these edges. It is indeed possible to write these things hands-on, albeit with a bit more effort than the researchers. But the foundation that forms the basis of these calculations is still convolutional operations, allowing the backpropagation algorithm to allow the neural network to learn whatever 3×3 filter it needs and go ahead and apply it across the entire image. Here, here, and here (the blue box marked part of the matrix on the left), go ahead and output these, any features it detects, whether it's vertical edges, horizontal edges, and edges at other weird angles, or even other filters that don't even have names.

So this idea of treating these nine numbers as parameters has become one of the most effective ideas in computer vision.