Artificial intelligence model training techniques, regularization!

Previous: "Artificial Intelligence Model Training Techniques: Stochastic Deactivation, Dropout Method, Dropout

Preface:One of the ways to make AI models "smarter" is to reduce the problem of "overfitting" (reading dead books), thus improving the model's "generalizability", that is, its ability to adapt to new problems. That is, its ability to adapt to new problems. In the previous section, we explained the most commonly used method, "random discard", and in this section, we will introduce you to another important method, "regularization".

So what does regularization mainly do? To use a real-life case as an analogy, regularization works like a teacher in a school, aiming to guide students in the right direction rather than memorizing textbooks or getting lucky with high scores because certain exam questions are familiar. The teacher will make sure that students learn generalized rules and problem-solving skills, rather than deviating from the normal track or learning useless information, by disciplining their behavior, such as regulating learning methods and reducing bias. Similarly, regularization helps a model avoid overfitting and improves its performance in the face of new data by constraining its weights and other mechanisms in artificial intelligence.

Using Regularization

Regularization is a technique to prevent overfitting by reducing the polarization of weights. Regularization "punishes" certain neurons if their weights are too large. In general, there are two main types of regularization: L1 and L2.

- L1 regularization is often referred to as lasso regularization (lasso, the least absolute contraction and selection operator). It serves to help us ignore values with zero or near-zero weights, which are effectively "thrown away" when calculating the results of the layer.

- L2 regularization is often referred to as ridge regression because it amplifies the difference between non-zero and zero (or near-zero) values by calculating the squares of the weights, thus creating a "ridge effect".

These two methods can also be combined to form a technique called elastic regularization (ER).

L2 regularization is most commonly used for natural language processing problems like the one we're working on. You can add it to the Dense layer via the kernel_regularizer attribute, which accepts a floating point value as a regularization factor. This is another hyperparameter that can be used to optimize the model and is worth trying!

Below is a sample code:

model = ([

(vocab_size, embedding_dim),

.GlobalAveragePooling1D(),

(8, activation='relu',

kernel_regularizer=.l2(0.01)),

(1, activation='sigmoid')

])

In a simple model like this, adding regularization doesn't have a particularly large impact, but it does smooth out the training loss and validation loss curves a bit. It may be a bit of an "overkill" in this scenario, but as with Dropout, understanding how to use regularization to prevent the model from being overspecialized is a very important skill.

Other Optimization Considerations

Although we have obtained a model with less overfitting and better performance with our previous modifications, there are other hyperparameters to experiment with. For example, we previously set the maximum sentence length to 100, but this value was chosen purely at random and may not be optimal. A good idea is to explore the corpus to see if there is a more appropriate sentence length.

Below is a code snippet to check the length of sentences and graph them after sorting them from shortest to longest:

xs = []

ys = []

current_item = 1

for item in sentences:

(current_item)

current_item += 1

(len(item))

newys = sorted(ys)

import as plt

(xs, newys)

()

Figure 6-16 illustrates the results of this code.

Figure 6-16: Exploring Sentence Length

In the entire 26,000+ corpus, there are less than 200 sentences of 100 words or more in length. Therefore, setting the maximum sentence length to 100 introduces a lot of unnecessary padding, which affects the performance of the model. If we reduce the maximum length to 85, we can still cover more than 99% of the 26,000 sentences in the corpus with almost no padding.

Summary:Regularization is also a way to make models smart, i.e., to enhance their generalization ability. Its role is like the teacher we grew up with, responsible for guiding and standardizing the framework and direction of our learning. In the next section, we will take the model, which has been trained through various methods of optimization, and use it for a real-world application - classifying sentences in the news and making predictions.