Location>code7788 >text

Pytorch Handwriting Digit Recognition Deep Learning Basics Sharing

Popularity:416 ℃/2024-12-10 16:30:16

image

This is an internal sharing to share what is deep learning to the project development colleagues. The simplest handwritten digit recognition is used as an example to explain the general principle.

handwriting recognition

Demonstrate the use of the first digital identification project. Project realization process.

  1. Train the model.
  2. Prepare html handwriting tablet
  3. flask framework to build a simple backend



Introduction to Deep Learning Essential Knowledge

Concepts of Machine Learning

layman's explanation
One of the key connotations of machine learning lies in using the computing power of computers to discover a pattern from a large amount of data, and using this pattern to achieve the function of prediction or judgment.

Classification of deep learning algorithms

Distinguishing deep learning applications by their algorithms, the algorithm categories can be divided into three main groups:

  • Commonly used for analyzing and processing image dataconvolutional neural network
  • of text analytics or natural language processingrecurrent neural network
  • Commonly used for data generationconfrontational neural network

The main applications of Convolutional Neural Networks (CNN) can be categorized into image classification, target detection, semantic segmentation

The Nature of Image Preservation

Pictures are stored in a computer as a digital matrix.
/doc/

Pictures are saved:

Generic steps for model training

The idea of model training:

  1. Preparing the dataset
  2. Constructing a neural network model (a class defined in Object-Oriented)
  3. Selection of loss functions and optimizers
  4. training model
    • Values derived from model training
    • Gap between predicted and actual values obtained by loss function
    • Adjust the parameters in the model with the optimizer to make the results more and more accurate
    • Cycle through the above steps

loss function: A function that measures the deviation between the training results and the actual deviation. Larger values represent larger gaps
optimizer: Algorithms to optimize the model so that the loss function is small

Q&A


Pytorch Handwriting Digit Recognition Explained

Model training is done using pytorch framework and the same can be implemented also by tensorflow, keras.

Data Set Acquisition

Handwriting recognition uses the MNIST dataset, handwritten digit images.The MNIST dataset consists of handwritten digit images from 0 to 9 with pixels of 28 × 28.There are a total of 70,000 images, of which 60,000 are the training set and 10,000 are the test set. Each image is in the form of white characters on a black background.

pytorch provides the torchvision package, which can be used to download datasets.

import torchvision
import as plt

# Training dataset
train_data = (
    root="data", # means save MINST in the data folder
    download=True, # Indicates that it needs to be downloaded from the web. Once downloaded, it will not be downloaded again.
    train=True, # means this is the training dataset.
    transform=()
                    # To transform the data in the dataset into a Tensor type that pytorch can use.
)

# The test dataset
test_data = (
    root="data", # means save MINST in the data folder
    download=True, # Indicates that it needs to be downloaded from the network. After downloading once, it will not be repeated next time
    train=False, # means this is a test dataset.
    transform=()
                    # To transform the data in the dataset into a Tensor type that pytorch can use.
)

demonstrations


Model Definition

The model uses theconvolutional neural network model. The defined neural network model is as follows:

import as nn


# Define the convolutional neural network class
class RLS_CNN().
    def __init__(self).
        super(RLS_CNN, self). __init__()
         = (
            nn.Conv2d(in_channels=1, out_channels=16, # number of input and output channels, the number of output channels can be interpreted as the number of features extracted
                      kernel_size=(3, 3), # convolution kernel size
                      stride=(1, 1), # How many pixels the convolution kernel moves each time.
                      padding=1), # How many blank pixels are added to the edges of the original image
                                                        # Input image size is 1×28×28
                                                        # First convolution, size 16×28×28
            nn.MaxPool2d(kernel_size=2), # First pooling, size 16×14×14
            nn.Conv2d(16, 32, 3, 1, 1), # second convolution, size 32×14×14
            nn.MaxPool2d(2), # Second pooling, size 32 × 7 × 7
            (), # Turn 3-dimensional array into 1-dimensional array
            (32*7*7, 16), # Turn into 16 convolution kernels, each one is 1*1, finally output 16 numbers
            (), # activate function x<0 y=0 x>0 y=x, used in reverse backward conduction
            (16, 10) # turn 16 into 10, predict probability values between 0-9
        )

    def forward(self, x).
        return (x)

Convolutional Neural Network Model Components

Convolutional neural networks are usually composed of 3 parts: a convolutional layer, a pooling layer, and a fully connected layer. The function of each part:

  • Convolutional Layer: responsible for extracting features from an image and can output many kinds of features from a single image.
  • Pooling layer: used to reduce the size, drastically reduce the parameter order of magnitude, and reduce the computational effort
  • Fully connected layer: merge features and output results

The principle of beauty camera is to extract the features of the picture, as below the second picture blurs the outline and the third one highlights the outline.

convolution

Functions of Convolution: Extract multiple feature information from images
Principle of Convolution:A new matrix is obtained by multiplying a convolution kernel with the matrix of the picture. The new matrix is a new feature.
convolution kernel (math.)
The convolution kernel is also a matrix, usually 3A matrix of 3, or 55 of the matrix. The procedure of convolution operation is as follows:

Image Edge Extraction
The edge profile features of an image can be extracted using the following convolution kernel

look after
The convolutional kernel matrix consists of 3*3 a total of 9 parameters, these parameters are automatically generated by the model, the so-called parameterization, part of which refers to adjusting the parameters of the convolutional kernel matrix, so that its extracted features can make more accurate predictions

pooling

Functions of Pooling: Pooling is the process of reducing the size of the matrix, thus reducing the number of parameters for subsequent operations. A pooling layer is usually added between adjacent convolutional layers.
Principles of Pooling: The pooling algorithm: a 4The matrix of 4 is maximally pooled into 22 of the matrix is to take the largest value in the corresponding region of the 4*4 matrix.

There are usually two types of pooling:

  • Maximum pooling (max pooling): select the maximum value of the image region as the value of the region after pooling.
  • Average pooling: Calculate the average value of an image region as the pooled value of the region.

full connection

full connectivity: The role of full connectivity is toPortfolio Characteristicscap (a poem)categorization
In the previous two steps multiple features are extracted from a single image and the feature matrix is compressed. When the data reaches the fully connected layer get is multiple features from one image.
A particular feature does not say what the whole picture is, otherwise it is a blind man's elephant. Then the full connectivity layer is to combine multiple features to form a complete feature, and calculate the probability that the picture is of a certain type based on the feature.
The final output of the fully connected layer is the probability. For example, handwritten digit recognition, the final output of the fully connected layer is the probability that a particular handwritten digit is on 0~9.

tensor([[ 0.949,  3.032,  0.771, -2.173, -0.038, -0.236,  0.013,  0.614, -1.125, -2.6991]])

The principle of full connectivity
The fully-connected layer implements feature combination, which is similar in principle to convolution, i.e., a convolution kernel is used to do operations on matrices, and you end up with a one-dimensional array, i.e., probabilities from 0-9.

look after: The implementation of full connectivity also requires the involvement of the convolutional kernel, so the convolutional kernel matrix is also part of the parameters, and the tuning parameter includes that part of the parameters.

Model Definition of Handwritten Digit Recognition

Convolutional neural networks for handwritten digit recognition, analyzed belowconvolution+pooling+full connectionThe process:

Q&A


Selection of loss functions and optimizers

Loss function function: a function that measures the deviation between training results and actual deviation. Larger values represent larger gaps
Optimizer function: a way to keep the model optimized so that the loss function is reduced

The loss function and optimizer used in handwritten digit recognition are as follows:

# Cross-entropy loss function, choose a method to calculate the error value
loss_func = ()

# Optimizer, stochastic gradient descent algorithm
optimizer = ((), lr=0.2)

loss function

Selected in Handwriting RecognitionCross Entropy Loss FunctionThe pytorch has a total of 19 loss functions that can be used, the better understood one is the squared difference loss function

optimizer

Handwriting recognition selectedstochastic gradient descent algorithm, which is used to implement backpropagation parameter modifications. there are a total of 11 optimizers available in pytorch.

model training

The process of model training:

  1. Define the number of training
  2. Iterate through the training set, call the model class to pass in the images and get the probability results
  3. Calculate the loss value through the loss function
  4. Parameter tuning via optimizer
  5. Save the model when training is complete
# Define the number of training cycles
cnt_epochs = 5 # train 5 cycles

# Train for 5 cycles
for cnt in range(cnt_epochs).
    # Train the data in the training set once
    for imgs, labels in train_dataloader.
        outputs = model(imgs) # Output probabilities of outcomes from 0 to 9.
        loss = loss_func(outputs, labels) # Compare to inputs to get an error
        optimizer.zero_grad() # Initialize the gradient, clear the gradient. Be careful to clear the optimizer's gradient to prevent accumulation
        () # Direction propagation calculation
        () # Accumulate 1, executed once

# Save the results of training (including model and parameters)
(model, "my_cnn.nn")

Points to note:

  • Laws of training
  • my_cnn.nn Model saved content

Q&A

model validation

  • Accuracy of the model on the test set
  • A batch of model accuracy demonstrations

summarize

  1. The dataset is very important. html Problems encountered in handwriting recognition and how to solve them. Color, size
  2. Math knowledge. Knowledge of data encountered during training: matrix multiplication
  3. Why do you need GPUs and how do you use them?
  4. The process of model training.convolution + pooling + full connection + loss function + optimizer
  5. How does the training process for target checking differ from handwriting recognition?
    Image classification: LeNet, AlexNet, VGG, GoogLeNet
    Target detection: RCNN, Fast RCNN, Faster RCNN, YOLO, YOLOv2, SSD