Big Model Principles for Everyone

The development of Artificial Intelligence started in 1950, during which time it has experienced various milestones and changes, and the related neural network technology has also innovated and changed all the way from the initial single-layer perception to complex hierarchical and convolutional neural networks, which have continued to push the field of Artificial Intelligence forward until the introduction of ChatGPT in 2022, which has completely triggered the public's attention.

Artificial Intelligence (AI) technology has gone through a long iterative process, and no matter how it changes, it cannot be separated from the enlightenment of the earliest neural network model "Perceptron", as an important milestone in the field of AI, this article focuses on the realization of "Perceptron" to explain. This article focuses on the realization of the Perceptron.

Let's start with understanding perceptual machines and thoroughly figure out what exactly a neural network is.

Disclaimer of originality, the original address of the article:Big Model Principles for Everyone - Neural Networks

01. Neural Networks - Perceptual Machines

Neural network technology draws its inspiration from neuroscience and tries to capture some of the unconscious thought processes that humans hide behind so-called "fast perception", such as the human brain's automatic recognition of faces or speech.

In the late 1950s, psychologist Frank Rosenblatt was inspired by the way neurons in the human brain process information; a neuron is a cell in the brain that receives electrical or chemical input signals from other neurons connected to it.

Simply put, a neuron adds up all the input signals it receives from other neurons and is activated if it reaches a particular threshold level.

Importantly, a given neuron has different strengths for different connections (synapses) to other neurons, and when calculating the sum of the signal inputs, a given neuron assigns less weight to the weaker connections and more weight to the inputs of the stronger connections.

For computer scientists, on the other hand, the processing of information in neurons can be simulated by a computer program with multiple inputs and one output (a perceptual machine).

Analogy between a neuron and a perceptual machine: a neuron in the brain (A) and a simple perceptual machine (B)

A neuron and its dendrites (the structures that bring input signals to the cell), cytosol and axon (i.e., the output channel) are shown in Figure (A); and a simple structure of a perceptual machine is shown in Figure (B).

Similar to neurons, the perceptron adds the input signals it receives and if the resulting sum is equal to or greater than the perceptron's threshold, the perceptron outputs 1 (activated), otherwise the perceptron outputs 0 (not activated).

In order to model the different connection strengths of neurons, Rosenblatt suggests assigning a weight to each input to the perceptual machine, and multiplying each input by its weight before adding it to the sum during summation.

The threshold of the perceptron, on the other hand, is a value set by the programmer, which can also be obtained by the perceptron through its own learning. This will be explained again later in this post.

In short, a perceptron is a simple program that makes a yes or no (output 1 or 0) decision based on whether the sum of weighted inputs satisfies a threshold.

And in life, you may make some decisions in ways like the following.

For example, you will learn from some of your friends how much they like a particular movie, but you believe that a few of them have a higher taste in movies than others, and therefore, you will give them a higher weight.

If the total number of friend favorites is large enough (i.e., greater than some unconscious threshold), you will decide to see the movie. If the perceptual machine has friends, then it will decide whether to see a movie in this way.

Disclaimer of originality, the original address of the article:Big Model Principles for Everyone - Neural Networks

02、Picture Recognition

Inspired by neuronal networks in the brain, Rosenblatt proposed that a network of perceptual machines could be applied to perform visual tasks such as face and object recognition. To understand how perceptron networks work, we will next explore how a perceptron can perform a specific visual task, for example, recognizing a handwritten number as shown below.

We design the perceptron as an "8" detector, i.e., it outputs 1 if its input is an image of the number 8, and 0 if the content of the input image is any other number.

Designing such a detector requires that we first figure out how to convert an image into a set of numerical inputs, and then determine the weight assignments and thresholds for the perceptron to be able to produce the correct outputs (1 for 8 and 0 for the other numbers)

Note: Each pixel in the 18×18 pixel image corresponds to one input to the perceptron, which has a total of 324 (18×18) inputs

The above figure (A) shows an enlarged handwritten figure 8, where there are 18 network squares on the X-axis and 18 network squares on the Y-axis, so there are a total of 18x18 for a total of 324 network squares.

Each grid square (pixel) in the figure has an intensity value that can be expressed as a number - pixel intensity. In a black and white image, a pure white square has a pixel intensity of 255; a pure black square has a pixel intensity of 0; and a gray square has a pixel intensity somewhere in between.

The perceptron has 324 (18 × 18) inputs, each corresponding to a pixel intensity in the grid, while each input has its own weight.

So how do we accurately set the right weights and thresholds for a given task? Once again, Rosenblatt has a brain-inspired answer:The perceptron should acquire these values through its own learning.

In terms of behavioral psychology, giving rats and pigeons positive and negative reinforcement can be used to train them to perform tasks.

And the perceptual machine should be trained similarly on samples: rewarding when the right behavior is triggered and punishing when mistakes are made. Today, this form of conditional computation is known in the field of artificial intelligence asSupervised learning。

Disclaimer of originality, the original address of the article:Big Model Principles for Everyone - Neural Networks

03. Perceptual machine learning algorithms

During training, given a sample, the learning system produces an output, and is then given a "supervisory signal" that indicates how much the output deviates from the correct output, and the system adjusts its weights and thresholds accordingly.

Supervised learning usually requires a large number of positive samples (e.g., a collection of the number 8 written by different people) and negative samples (e.g., a collection of other handwritten numbers that do not include 8). Each sample is labeled by a person with a category - in this case, the categories "8" and "not 8" - and these labels are used as supervised signals.

The positive and negative samples used to train the system are called the training set, and the remaining set of samples, the test set, is used to evaluate the performance of the system after training to see how well the system answers questions in general, not just on the training samples. The remaining set of samples, the test set, is used to evaluate the performance of the system after training, in order to see how well the system can answer correctly in general, not just on the training samples.

One of the most important terms in computer science is algorithm, which refers to a "recipe" of steps that a computer takes to solve a particular problem. Rosenblatt's primary contribution to artificial intelligence was his design of a specific algorithm, the perceptron-learning algorithm, by which a perceptron can be trained from samples to determine the weights and thresholds that will produce the correct answer.

Initially, the weights and thresholds of the perceptron are set to random numbers between -1 and 1. In our case, the weight of the first input can be set to 0.2, the weight of the second input to -0.6, and the threshold to 0.7, and these initial values can be easily generated using a random number generator (it is sufficient that the weights and thresholds of the individual inputs are generated randomly until training is started).

Next the training can begin, first the first training sample is fed into the perceptron, which at this point, does not know the correct classification labeling. The perceptron multiplies each input by its weight and sums all the results, then compares the resulting sum to a threshold and outputs either 1 or 0, where outputting 1 means that its input is 8, and outputting 0 means that its input is not 8.

Next, the output of the perceptron is compared to the human-labeled correct answer ("8" or "not 8"). If the answer given by the perceptron is correct, the weights and thresholds do not change, but if the perceptron is wrong, its weights and thresholds change to bring the answer given by the perceptron on this training sample closer to the correct answer.

In addition, the amount of change in each weight depends on the value of the input to which it relates, i.e., the assignment of "blame" for the error depends on which input is more or less influential.

Of the 324 pixel blocks in "8" above, a pure black square with a pixel intensity of 0 has a large effect, while a pure white square with a pixel intensity of 255 has no effect.

Readers interested in this mathematical principle can consult the details of the calculations below:

From a mathematical point of view，The perceptron learning algorithm is as follows。For each weightwj：wj ← wj + η (t + y) xj，included among thesetIndicates the correct output（1maybe0）；For a given input，yis the actual output of the sensing machine；xjis associated with weightswjRelevant inputs；ηis the learning rate given by the programmer，Arrows indicate updates。Thresholding is accomplished by creating an additional inputx0combine。x0a constant (math.)1，Their corresponding weightsw0=-threshold（thresholds）。For a given additional input and weight（Known as deviation），Only if the product of the input and the weight，即输入向量与权重向量之间的点积大于maybe等于0hour，It's the sensors that are triggered.。in general，输入值会被缩小maybe者应用其他变换以防止权重过大。

The next training will repeat the entire process described above. The perceptron will run this training process many times on all the training samples, and each time it makes an error, the perceptron will modify the weights and thresholds slightly.

As the behavioral psychologist Skinner discovered when training pigeons: learning incrementally through a large number of trials works better, and if the weights and thresholds are altered too much in a single trial, the system can end up learning the wrong thing.

For example, the system is overly concerned with the fact that the top and bottom halves of 8 are always exactly equal in size. After many iterations of training on each training sample, the system will eventually obtain a set of weights and thresholds that will yield the correct answer on all training samples. At this point, we can evaluate the perceptron with test samples to see how it performs on untrained images.

By providing the system with enough training samples, the recognition accuracy of the perceptron will be higher and higher until the correct recognition rate of the correct "8" digit reaches a certain level, and the input weights and thresholds of the system for the "8" digit will be changed less and less until a corresponding equilibrium value is maintained. The input weights and thresholds of the system for the number "8" will be changed less and less until they are maintained at a corresponding equilibrium value.

At this point we can define that the system has been trained to recognize "8" with XX% accuracy.

Disclaimer of originality, the original address of the article:Big Model Principles for Everyone - Neural Networks

04. The "sub-symbol" faction

Rosenblatt mathematically proved that for a given class of tasks, a perceptron can, in principle, learn to perform these tasks accurately if it is adequately trained. However, the fact that a perceptron's "knowledge" consists of a learned pair of weights and thresholds means that it is very difficult to discover the rules that a perceptron uses to perform a recognition task.

The weights and thresholds of a perceptual machine do not represent specific concepts, and these numbers are difficult to translate into human-understandable meanings. This situation is further complicated in contemporary neural networks with millions of weights.

One might draw a rough analogy between a perceptual machine and the human brain. If I could turn on your brain and look at some of the hundreds of billions of neurons in it, I might not have a clear idea of what you think or the rules you use to make a particular decision.

However, the human brain has produced language, which allows you to use symbols (words and phrases) to communicate to me what you think, or what you are doing something for.

In this sense, our neural stimuli can be thought of as subsymbolic, and based on them, our brain somehow creates symbols. Analogous to the sub-symbolic neural networks in the brain, perceptual machines, as well as more complex networks of analog neurons, have also been called "sub-symbolic".

Proponents of this school of thought argue that if artificial intelligence is to be realized, it must emerge from neuron-like structures in a manner similar to the way that intelligent symbolic processing emerges from the brain.

And it's not enough to just use a single neuron structure like a perceptual machine - can a machine learn to translate languages and texts with enough training samples? How can machines also walk, talk, and recognize humans?

And so the models of hierarchical neural networks, convolutional neural networks, recurrent neural networks, deep belief networks, etc. came whizzing in.

Disclaimer of originality, the original address of the article:Big Model Principles for Everyone - Neural Networks

05. What is a big model?

Understanding the principles of perceptron and single neuron is the foundation for a better understanding of Deep Neural Network-based Machine Learning.

The big models we often hear about have tens and hundreds of billions and billions of parameters, where the parameters are actually the sum of the input weights and output thresholds of the neural network.

Let us assume that a neuron has 9 input weights, and 1 output threshold, then we can say that the neuron has 10 parameters.

When we have 10 billion such neurons, at this point we can say that our neural network model has 100 billion parameters, which is known as a large model with 100 billion parameters.

Is not thief la simple? The original official has been mentioned by various tens of billions, hundreds of billions of parameters of the large model, the original is the meaning of it.

Let's do a little extension here~

The perceptual machine (single neuron) that we mentioned above can, through certain learning algorithms, equip this single neuron with certain simple intelligent effects, such as recognizing individual numbers.

Then you imagine, our single neuron through certain learning algorithms, can appear simple intelligent effect, at this time if there are 10 billion neurons, 10 billion neurons together have the intelligent effect, this will be a strong intelligent effect of the existence of what.

Each neuron needs to remember just a little bit of its own rules, and can have the ability to recognize a very small one, and at that point expand that number of neurons to 10 billion, 100 billion.This leads to what we now often hear as group intelligence, i.e., intelligent emergence!

A typical example of "emergence of intelligence" in nature is the ant, which is a very simple intelligent creature, but when a group of ants get together, they can build a very complex nest structure. (If you are interested, you can search for yourself)

What about the human brain? According to scientific statistics, there are about 100 billion neurons in the human brain, and these huge neurons form a very complex neural network, which is also the basis of human brain intelligence.

So you know why sometimes you don't exercise, but you still have to eat a lot of food on a work day? Because the operation of your huge neurons also need to consume energy ah. AI consume electricity to replenish energy, and we consume food to replenish energy (strange and strange knowledge added 🤔)

Disclaimer of originality, the original address of the article:Big Model Principles for Everyone - Neural Networks

06. Leave a thought-provoking question

Let's say we now have 1 billion neurons in the neural network we need to train, at which point we train that neuron to recognize a dog/ or cat animal.

Each neuron is only responsible for recognizing a very small piece can be, for example, there is a part of the neuron is responsible for recognizing the cat's face outline, there is a part of the neuron is responsible for recognizing the cat's eyes, and ultimately this part of the recognition results will be uniformly passed to the lower layer of the other part of the neuron, which is used to recognize the cat's outline results and the eyes of the results of the cat's identification of the cat's identification of the judgement of whether or not it is a cat.

Imagine that out of so many billions of neurons, if the final recognition result is wrong, how to pinpoint which class of neurons is misrecognized, which ultimately leads to the whole recognition result being wrong?

We can take the above picture as an example, each ⚪️ in the above picture represents a neuron, we can see that the recognition result of a neuron will be pushed to the downstream among all the neurons, and the downstream neurons will judge and recognize the inputs from the upstream neurons, and then output them to the downstream all the neurons again.

Layers upon layers of neural network execution are carried out until finally the neuron responsible for the output outputs a final result based on the input information from upstream.

At this point, if the final output is an incorrect value, how do we locate which neuron in the middle recognized the result incorrectly, thus leading to the final recognition result being incorrect?

We can only improve the correctness of the final recognition result by localizing which neurons are responsible for the abnormal result after the final result has been incorrectly recognized, so that this part of neurons can be individually adjusted to the weights of the recognition.

How can it be done? We can all think about it. But don't be in a hurry to give the answer, because the answer is very hard, so hard that at one time it affected the whole process of neural networks to be delayed for years.

Imagine a group of the world's top human quality (male/female) sexes delaying the process of AI for years because of a delayed breakthrough on this issue, and you get an idea of what level of problem this is.

And of course, I will reveal the answer to the question and popularize the knowledge in my next post.Interested readers do not forget to follow our public number at the bottom of the article, more exciting content waiting for you yo!

Disclaimer of originality, the original address of the article:Big Model Principles for Everyone - Neural Networks

07. Conclusion

AI development history (Image source: Wisdom Source Institute)

The development of AI from its introduction in 1950 to what it is today is actually only a few short decades of development.

For the general public, our use of the AI field has only become popular in the last 20 years with the growth of the internet in the country.

The initial applications are mainly chatbots and customer service robots based on NLP technology, and this is also the field that the blogger has been working on in recent years. (In the past, people were always complaining that robots were stupid, but now it has become that AI is about to replace human beings, and public opinion is changing really fast. haha)

Subsequently, breakthroughs have been made in Chinese and English translation, voice recognition, face recognition and other technologies, and these breakthroughs are widely used in our daily lives, such as voice assistants, intelligent translation devices, face recognition payment systems and so on.

However, most of these previous technological breakthroughs are still limited to a specific field, the application of the model is relatively narrow, for example, translation robots can only do translation after training is completed, customer service robots can only give you an accurate response in a specific context.

However, with the breakthrough of large-scale language models such as OpenAI ChatGPT, it shows a new route of development, where a wide range of intelligent applications can emerge through large-scale pre-training of models, with a single model capable of chatting, translating, solving math problems, programming code, writing tutoring, and even emotional counseling, among other capabilities.

This multi-functional, human-like AI model provides a new technological direction for the future development of AI and at the same time brings new anxieties to human beings:

The deal was that AI would help humans sweep the floors and wash the dishes, while humans would write poems and paintings! How come AIs are writing poems and painting while we humans are still sweeping floors and washing dishes 😂

The blogger thought this online banter was pretty funny when he first saw it, too, but when you think about it you realize that"Man-machine symbiosis" Almost an inevitability of human development.

What young person goes out without a cell phone these days? Who doesn't use a computer at work? Aren't cell phones and computers just machines.

We have been living with these machines for decades without realizing it, and instead of lowering the standard of living for humans, life has become much more convenient.

The popularity of computers at the beginning of the replacement of some repetitive occupations at the same time, but also for the rise of the software industry to provide a new impetus, those familiar with the rules of computer operation of the software engineers, good use of computers to gain insight into the needs of human product managers, as well as good use of computer software to improve the efficiency of the workplace white-collar workers, have gained a new level of promotion after the popularization of computers.

So the birth of new AI technology will inevitably go through such a process, rather than worrying about the present, we should bet on the future.Human-machine symbiosis must be the inevitable trend of the future, and a large part of personal value may come from the tacit cooperation with intelligent machines.

Only through a deep understanding of the principles of AI can we better master AI, so let's learn AI together, walk hand in hand, and better meet the arrival of the intelligent era!

Disclaimer of originality, the original address of the article:Big Model Principles for Everyone - Neural Networks

Follow us and don't miss out on every bit of knowledge 🔔

If you liked this post, please click follow, like, in view, and forward it to more friends!