Previous: "Exploring Vocabulary Size and Dimensionality of Models for Training Artificial Intelligence Models
Preface:Dropout is a technique in the field of neural network design, which is usually translated as stochastic deactivation or dropout. If a neural network is trained without Dropout, the model will easily "read dead", i.e., overfitting, which may result in project failure.
What does Dropout do? It's actually very simple: when training the model, it randomly turns off some neurons in the hidden layer and doesn't let them output results. It's nothing fancy, it's that straightforward. For example, in each round (epoch) of training, will randomly pick some neurons "closed wheat", so that they temporarily rest, the output value is set to 0. But need to pay attention to is, which neurons will be turned off is random, each time is not the same, rather than each time off a fixed batch of neurons. The advantage of this is that the model has to rely on all the neurons working together to learn more general patterns, rather than just rote memorization of a few specific features. So Dropout is a good solution to the model's problem of "learning by rote", making it more flexible, smarter, and more capable of recognizing new knowledge that it has never seen before.
Using Dropout
A common technique for minimizing overfitting is to incorporate Dropout into a fully connected neural network, and we explored its application to convolutional neural networks in Chapter 3. It may be tempting to use Dropout directly to see how it works against overfitting, but here I've chosen not to rush into it until I've adjusted the vocabulary size, embedding dimensions, and architectural complexity. After all, these tweaks tend to have a bigger impact on the model's results than using Dropout, and we've already seen good results from these tweaks.
Now that our architecture has been simplified to only 8 neurons in the middle fully-connected layer, the role of Dropout may be minimized, but let's give it a try. Here's the updated model code, with a Dropout of 0.25 added (this is equivalent to dropping 2 of our 8 neurons):
model = ([
(vocab_size, embedding_dim),
.GlobalAveragePooling1D(),
(8, activation='relu'),
(0.25),
(1, activation='sigmoid')
])
Figure 6-14 shows the accuracy results after 100 cycles of training. This time we see that the accuracy of the training set is starting to exceed the previous threshold, while the accuracy of the validation set is slowly decreasing. This indicates that we are again entering the region of overfitting.
This is verified by the loss curve in Figure 6-15.
Figure 6-14: Accuracy after adding Dropout
Figure 6-15: Losses after adding Dropout
As you can see here, the model's validation loss is again starting to show the same trend of increasing over time that it did before. It's not as bad as before, but it's clearly going in the wrong direction.
In this case, adding Dropout may not be an appropriate choice due to the very small number of neurons. Still, Dropout is a great tool to remember to put in your toolbox and use it in more complex architectures than this one.
Summary: The examples in this section demonstrate the effect of introducing Dropout in a network. We can see from the experiments that Dropout is an effective tool, but its usefulness depends on the model architecture and the specific scenario. For a simplified model like the one in this example, Dropout has less impact. However, in more complex models, it is often a key tool to prevent overfitting. In the next section, we will introduce several optimization techniques to help further solve the problem of model overfitting.