Location>code7788 >text

Designing Convolutional Neural Networks CNNs Why Not Programming?

Popularity:893 ℃/2024-10-30 10:04:53

Preface: Now let's get into the world of Convolutional Neural Networks (CNNs). Unlike traditional programming, in AI program code, you don't see explicit algorithmic rules, you see only the configuration of the neural network. The code here doesn't go into the specific implementation of each function as in traditional programming. For example, if you want a computer to distinguish between a cat and a dog, you don't need to write code that explains what the cat and the dog look like, but rather describes the configuration of the neural network, so that it can learn by itself through the data in the training process. Description is like art. How do you add just the right touch to bring about extraordinary results? This is the essence of designing AI!

(Follow us to avoid being notified of future updates)

In the previous knowledge, we created a neural network capable of recognizing fashion images. For convenience, here is the complete code:

import tensorflow as tf

data = .fashion_mnist

(training_images, training_labels), (test_images, test_labels) = data.load_data()

training_images = training_images / 255.0

test_images = test_images / 255.0

model = ([

(input_shape=(28, 28)),

(128, activation=),

(10, activation=)

])

(optimizer='adam',

loss='sparse_categorical_crossentropy',

metrics=['accuracy'])

(training_images, training_labels, epochs=5)

To convert this to a convolutional neural network, we simply use a convolutional layer in the model definition. Also, we will add the pooling layer.

To implement a convolutional layer, you will use the .Conv2D type. It accepts a number of parameters, such as the number of convolutions to be used in the layer, the size of the convolution, the activation function, and so on.

For example, here is a convolutional layer that serves as an input layer to a neural network:

.Conv2D(64, (3, 3), activation='relu',

input_shape=(28, 28, 1)),

Here, we want this layer to learn up to 64 convolutions. It will randomly initialize these convolutions and, over time, learn the filter values that are best suited to match the input values to their labels. (3, 3) denotes the size of the filter. I showed the 3 × 3 example earlier.

Here, we specify a filter size of 3 × 3, which is the most common filter size; you can change it as needed, but you'll usually see odd axes like 5 × 5 or 7 × 7 because the filter removes pixels from the edges of the image; you'll see the exact effect later.

The activation and input_shape parameters are the same as before. Because we are using Fashion MNIST in this example, the shape is still 28 × 28. Note, however, that because the Conv2D layer is designed for multicolor images, we specify the third dimension as 1, so the input shape is 28 × 28 × 1. The third parameter for color images is usually 3, because they are stored as the values of the R, G, and B channels.

The next step is to use the pooling layer in a neural network. Usually you will use it immediately after the convolutional layer:

.MaxPooling2D(2, 2),

In the example in Figure 3-4, we divide the image into 2 × 2 chunks and pick the maximum value in each chunk. This operation can be parameterized to define the size of the pool. Here (2, 2) means that our pool size is 2 × 2.

Now, let's take a look at the complete code for processing Fashion MNIST using CNN:

import tensorflow as tf

data = .fashion_mnist

(training_images, training_labels), (test_images, test_labels) = data.load_data()

training_images = training_images.reshape(60000, 28, 28, 1)

training_images = training_images / 255.0

test_images = test_images.reshape(10000, 28, 28, 1)

test_images = test_images / 255.0

model = ([

.Conv2D(64, (3, 3), activation='relu',

input_shape=(28, 28, 1)),

.MaxPooling2D(2, 2),

.Conv2D(64, (3, 3), activation='relu'),

.MaxPooling2D(2,2),

(),

(128, activation=),

(10, activation=)

])

(optimizer='adam',

loss='sparse_categorical_crossentropy',

metrics=['accuracy'])

(training_images, training_labels, epochs=50)

(test_images, test_labels)

classifications = (test_images)

print(classifications[0])

print(test_labels[0])

Here are some things to keep in mind. Remember earlier when I said that the shape of the input image must match the expectations of the Conv2D layer? Did we update it to a 28 × 28 × 1 image? The data needs to be reshaped accordingly. 28 × 28 represents the number of pixels in the image, while 1 represents the number of color channels. For a grayscale image, this is usually 1; for a color image, it is 3 because it has three channels (red, green, and blue) and the number indicates the intensity of that color.

Therefore, before normalizing the images, we also need to reshape each array into a shape with additional dimensions. The following code changes our training dataset from 60,000 images of 28 × 28 each (and thus a 60,000 × 28 × 28 array) to 60,000 images of 28 × 28 × 1 each:

training_images = training_images.reshape(60000, 28, 28, 1)

We then perform the same operation on the test dataset.

Note also that in the original Deep Neural Network (DNN), we used a Flatten layer before passing the input to the first Dense layer. In contrast, here in the input layer, we omit this layer and simply specify the input shape. Note that after convolution and pooling, the data will be flattened before going to the Dense layer.

Training this network on the same data for 50 cycles, we can see a significant improvement in accuracy compared to the network shown in Chapter 2. The previous example achieved 89% accuracy on the test set after 50 cycles, while this network achieves 99% in about half the cycles (24 or 25). So we can see that adding a convolutional layer really improves the neural network's ability to categorize images. Next let's take a look at how the image is passed through the network to gain a deeper understanding of how it works.

Exploring Convolutional Networks

You can check your model with the command. When you run it on the Fashion MNIST convolutional network we've been working with, you'll see something like this:

Model: "sequential"


Layer (type) Output Shape Param #

=================================================================

conv2d (Conv2D) (None, 26, 26, 64) 640


max_pooling2d (MaxPooling2D) (None, 13, 13, 64) 0


conv2d_1 (Conv2D) (None, 11, 11, 64) 36928


max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0


flatten (Flatten) (None, 1600) 0


dense (Dense) (None, 128) 204928


dense_1 (Dense) (None, 10) 1290

=================================================================

Total params: 243,786

Trainable params: 243,786

Non-trainable params: 0

Let's take a look at the Output Shape column to see what's going on here. Our first layer processes a 28 × 28 image and applies 64 filters. However, since the filters are 3 × 3, we lose 1 pixel of border around the image, reducing the overall information to 26 × 26 pixels. Referring to Figure 3-6, if we consider each box as a pixel in the image, the first possible filtering operation would start in the second row and column. The same would happen on the right side and bottom.

                              Figure 3-6: Pixel Loss When Running Filters

Thus, an image the size of an A × B pixel becomes (A-2) × (B-2) pixels after a 3 × 3 filter. Similarly, a 5 × 5 filter will turn it into (A-4) × (B-4), and so on. Since we are using a 28 × 28 image and a 3 × 3 filter, our output is now 26 × 26.

Next, the pooling layer is 2 × 2, so the image is shrunk by half on each axis to (13 × 13). The next convolutional layer shrinks further to 11 × 11, and the next pooling layer rounds down again to reduce the image to 5 × 5.

In this way, when the image is convolved in two layers, the result will be multiple small 5 × 5 images. So how many will there be? We can see that in the parameter array (Param #).

Each convolution is a 3 × 3 filter with a bias. Remember earlier in the dense layer, where the formula was Y = mX + c? Where m is our parameter (i.e., weight) and c is the bias. The principle here is similar, except that the filter is 3 × 3, so 9 parameters need to be learned. Since we have defined 64 convolutions, there are 640 parameters in total (each convolution has 9 parameters plus a bias, for a total of 10, and then there are 64 convolutions).

The MaxPooling layer doesn't learn anything, it just reduces the size of the image, so there are no parameters to learn, so it shows up as 0.

The next convolutional layer has 64 filters, but each filter is multiplied across the previous 64 filters, each with 9 parameters. We add a bias to each of the new 64 filters, so the total number of parameters should be (64 × (64 × 9)) + 64, yielding 36,928 parameters to be learned by the network.

If there is some complexity here, you can try changing the number of convolutions in the first layer to a different value, such as 10. You will notice that the number of parameters in the second layer changes to 5,824, which is (64 × (10 × 9)) + 64.

When we finish the second convolution, the image becomes 5 × 5 and we have 64 such images. Multiplying these gives us 1,600 values which are passed into a dense layer of 128 neurons. Each neuron has a weight and a bias, for a total of 128 neurons, so the number of parameters to be learned by the network is ((5 × 5 × 64) × 128) + 128, yielding 204,928 parameters.

The last dense layer has 10 neurons and receives the output of the previous 128 neurons, so the number of parameters learned is (128 × 10) + 10, or 1,290.

The total number of parameters is the sum of these: 243,786.

Training this network requires us to learn the best combination of these 243,786 parameters to match the input image to the label. The training process is slower due to more parameters, but as you can see from the results, it also builds more accurate models!

Of course, in this dataset we are still limited by the fact that the images are 28 × 28, monochromatic and centered. Next we'll use convolution to explore a more complex dataset containing color images of horses and people, and we'll try to determine whether a horse or a person is in the image. In this case, the subject won't necessarily be centered as in Fashion MNIST, so we'll have to rely on the convolution to capture the discriminative features.

In this post, we took it upon ourselves to describe a convolutional neural network, and in the next post, we'll use this CNN neural network to distinguish humans from animals (horses).