This is an internal sharing to share what is deep learning to the project development colleagues. The simplest handwritten digit recognition is used as an example to explain the general principle.
handwriting recognition
Demonstrate the use of the first digital identification project. Project realization process.
- Train the model.
- Prepare html handwriting tablet
- flask framework to build a simple backend
Introduction to Deep Learning Essential Knowledge
Concepts of Machine Learning
layman's explanation
One of the key connotations of machine learning lies in using the computing power of computers to discover a pattern from a large amount of data, and using this pattern to achieve the function of prediction or judgment.
Classification of deep learning algorithms
Distinguishing deep learning applications by their algorithms, the algorithm categories can be divided into three main groups:
- Commonly used for analyzing and processing image data
convolutional neural network
- of text analytics or natural language processing
recurrent neural network
- Commonly used for data generation
confrontational neural network
The main applications of Convolutional Neural Networks (CNN) can be categorized into image classification, target detection, semantic segmentation
The Nature of Image Preservation
Pictures are stored in a computer as a digital matrix.
/doc/
Pictures are saved:
Generic steps for model training
The idea of model training:
- Preparing the dataset
- Constructing a neural network model (a class defined in Object-Oriented)
- Selection of loss functions and optimizers
- training model
- Values derived from model training
- Gap between predicted and actual values obtained by loss function
- Adjust the parameters in the model with the optimizer to make the results more and more accurate
- Cycle through the above steps
loss function
: A function that measures the deviation between the training results and the actual deviation. Larger values represent larger gapsoptimizer
: Algorithms to optimize the model so that the loss function is small
Q&A
Pytorch Handwriting Digit Recognition Explained
Model training is done using pytorch framework and the same can be implemented also by tensorflow, keras.
Data Set Acquisition
Handwriting recognition uses the MNIST dataset, handwritten digit images.The MNIST dataset consists of handwritten digit images from 0 to 9 with pixels of 28 × 28.There are a total of 70,000 images, of which 60,000 are the training set and 10,000 are the test set. Each image is in the form of white characters on a black background.
pytorch provides the torchvision package, which can be used to download datasets.
import torchvision
import as plt
# Training dataset
train_data = (
root="data", # means save MINST in the data folder
download=True, # Indicates that it needs to be downloaded from the web. Once downloaded, it will not be downloaded again.
train=True, # means this is the training dataset.
transform=()
# To transform the data in the dataset into a Tensor type that pytorch can use.
)
# The test dataset
test_data = (
root="data", # means save MINST in the data folder
download=True, # Indicates that it needs to be downloaded from the network. After downloading once, it will not be repeated next time
train=False, # means this is a test dataset.
transform=()
# To transform the data in the dataset into a Tensor type that pytorch can use.
)
demonstrations
Model Definition
The model uses theconvolutional neural network model
. The defined neural network model is as follows:
import as nn
# Define the convolutional neural network class
class RLS_CNN().
def __init__(self).
super(RLS_CNN, self). __init__()
= (
nn.Conv2d(in_channels=1, out_channels=16, # number of input and output channels, the number of output channels can be interpreted as the number of features extracted
kernel_size=(3, 3), # convolution kernel size
stride=(1, 1), # How many pixels the convolution kernel moves each time.
padding=1), # How many blank pixels are added to the edges of the original image
# Input image size is 1×28×28
# First convolution, size 16×28×28
nn.MaxPool2d(kernel_size=2), # First pooling, size 16×14×14
nn.Conv2d(16, 32, 3, 1, 1), # second convolution, size 32×14×14
nn.MaxPool2d(2), # Second pooling, size 32 × 7 × 7
(), # Turn 3-dimensional array into 1-dimensional array
(32*7*7, 16), # Turn into 16 convolution kernels, each one is 1*1, finally output 16 numbers
(), # activate function x<0 y=0 x>0 y=x, used in reverse backward conduction
(16, 10) # turn 16 into 10, predict probability values between 0-9
)
def forward(self, x).
return (x)
Convolutional Neural Network Model Components
Convolutional neural networks are usually composed of 3 parts: a convolutional layer, a pooling layer, and a fully connected layer. The function of each part:
- Convolutional Layer: responsible for extracting features from an image and can output many kinds of features from a single image.
- Pooling layer: used to reduce the size, drastically reduce the parameter order of magnitude, and reduce the computational effort
- Fully connected layer: merge features and output results
The principle of beauty camera is to extract the features of the picture, as below the second picture blurs the outline and the third one highlights the outline.
convolution
Functions of Convolution
: Extract multiple feature information from imagesPrinciple of Convolution
:A new matrix is obtained by multiplying a convolution kernel with the matrix of the picture. The new matrix is a new feature.convolution kernel (math.)
The convolution kernel is also a matrix, usually 3A matrix of 3, or 55 of the matrix. The procedure of convolution operation is as follows:
Image Edge Extraction
The edge profile features of an image can be extracted using the following convolution kernel
look after
:
The convolutional kernel matrix consists of 3*3 a total of 9 parameters, these parameters are automatically generated by the model, the so-called parameterization, part of which refers to adjusting the parameters of the convolutional kernel matrix, so that its extracted features can make more accurate predictions
pooling
Functions of Pooling
: Pooling is the process of reducing the size of the matrix, thus reducing the number of parameters for subsequent operations. A pooling layer is usually added between adjacent convolutional layers.Principles of Pooling
: The pooling algorithm: a 4The matrix of 4 is maximally pooled into 22 of the matrix is to take the largest value in the corresponding region of the 4*4 matrix.
There are usually two types of pooling:
- Maximum pooling (max pooling): select the maximum value of the image region as the value of the region after pooling.
- Average pooling: Calculate the average value of an image region as the pooled value of the region.
full connection
full connectivity
: The role of full connectivity is toPortfolio Characteristicscap (a poem)categorization。
In the previous two steps multiple features are extracted from a single image and the feature matrix is compressed. When the data reaches the fully connected layer get is multiple features from one image.
A particular feature does not say what the whole picture is, otherwise it is a blind man's elephant. Then the full connectivity layer is to combine multiple features to form a complete feature, and calculate the probability that the picture is of a certain type based on the feature.
The final output of the fully connected layer is the probability. For example, handwritten digit recognition, the final output of the fully connected layer is the probability that a particular handwritten digit is on 0~9.
tensor([[ 0.949, 3.032, 0.771, -2.173, -0.038, -0.236, 0.013, 0.614, -1.125, -2.6991]])
The principle of full connectivity
The fully-connected layer implements feature combination, which is similar in principle to convolution, i.e., a convolution kernel is used to do operations on matrices, and you end up with a one-dimensional array, i.e., probabilities from 0-9.
look after
: The implementation of full connectivity also requires the involvement of the convolutional kernel, so the convolutional kernel matrix is also part of the parameters, and the tuning parameter includes that part of the parameters.
Model Definition of Handwritten Digit Recognition
Convolutional neural networks for handwritten digit recognition, analyzed belowconvolution
+pooling
+full connection
The process:
Q&A
Selection of loss functions and optimizers
Loss function function: a function that measures the deviation between training results and actual deviation. Larger values represent larger gaps
Optimizer function: a way to keep the model optimized so that the loss function is reduced
The loss function and optimizer used in handwritten digit recognition are as follows:
# Cross-entropy loss function, choose a method to calculate the error value
loss_func = ()
# Optimizer, stochastic gradient descent algorithm
optimizer = ((), lr=0.2)
loss function
Selected in Handwriting RecognitionCross Entropy Loss FunctionThe pytorch has a total of 19 loss functions that can be used, the better understood one is the squared difference loss function
optimizer
Handwriting recognition selectedstochastic gradient descent algorithm, which is used to implement backpropagation parameter modifications. there are a total of 11 optimizers available in pytorch.
model training
The process of model training:
- Define the number of training
- Iterate through the training set, call the model class to pass in the images and get the probability results
- Calculate the loss value through the loss function
- Parameter tuning via optimizer
- Save the model when training is complete
# Define the number of training cycles
cnt_epochs = 5 # train 5 cycles
# Train for 5 cycles
for cnt in range(cnt_epochs).
# Train the data in the training set once
for imgs, labels in train_dataloader.
outputs = model(imgs) # Output probabilities of outcomes from 0 to 9.
loss = loss_func(outputs, labels) # Compare to inputs to get an error
optimizer.zero_grad() # Initialize the gradient, clear the gradient. Be careful to clear the optimizer's gradient to prevent accumulation
() # Direction propagation calculation
() # Accumulate 1, executed once
# Save the results of training (including model and parameters)
(model, "my_cnn.nn")
Points to note:
- Laws of training
- my_cnn.nn Model saved content
Q&A
model validation
- Accuracy of the model on the test set
- A batch of model accuracy demonstrations
summarize
- The dataset is very important. html Problems encountered in handwriting recognition and how to solve them. Color, size
- Math knowledge. Knowledge of data encountered during training: matrix multiplication
- Why do you need GPUs and how do you use them?
- The process of model training.
convolution
+pooling
+full connection
+loss function
+optimizer
- How does the training process for target checking differ from handwriting recognition?
Image classification: LeNet, AlexNet, VGG, GoogLeNet
Target detection: RCNN, Fast RCNN, Faster RCNN, YOLO, YOLOv2, SSD