Mathematical Foundations of Machine Learning

An important difference between machine learning and traditional programming is that machine learning involves more math than traditional programming.
However, with the rapid development of machine learning and the various frameworks that have emerged, it seems that knowledge of math is becoming less and less necessary to use off-the-shelf libraries and frameworks when using machine learning in applications such as data analytics, which is becoming the norm.

In fact, off-the-shelf libraries and frameworks just help us simplify the task of developing machine learning that
If you want to adjust and optimize the results of model training, transforming and filtering the training data, understanding the underlying mathematics will definitely help us.

Machine learning models may seem like a bunch of pie-in-the-sky symbols and formulas, but they're actually not that complex in nature, and perhaps most people just give up because they don't have the patience to understand the mathematical notation involved.
I think a minimal understanding of linear algebra is enough to read most of these formulas.

In this paper, we try to present in a simple way the two most basic structures in machine learning (vectorscap (a poem)matrices), and their basic rules of arithmetic.

1. Vector

1.1 Definitions

Machine learning is confronted with training data that almost never has only a single attribute (i.e., the data contains only a numerical value or a string).
Rather, each piece of data contains multiple attributes, such as meteorological data (containing temperature, humidity, wind direction, etc.), financial data (opening price, closing price, trading volume, etc.), and sales data (price, inventory, quantity sold, etc.).

In order to represent this multi-attribute data, or multidimensional data, thevectorsMost appropriate.
vectorsThat is, there are several numbers arranged horizontally or vertically, and each number represents an attribute.
vectorssimilar to a one-dimensional array in a programming language.numpyIt is also saved in the same way.

1.2 Transfers

vectorswhich can be represented in rows or columns.
For example:\(\begin{bmatrix}1,2,3\end{bmatrix}\)or\(\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}\)。
vectors are represented byclassifier for objects in rows such as wordsneverthelesscolumnsto indicate that, depending largely on what calculations are subsequently performed, there is little essential difference.
vectorreprovisionoperation is the operation that is used on therow vector (math.)respond in singingcolumn vectorfor interconversion.

1.3 Addition and subtraction

vectorsWhen performing addition or subtraction operations betweenLength of the vectormust be the same and must be the samerow vector (math.)or the samecolumn vector。

In a nutshell.vectorsThe addition and subtraction betweenElement at the corresponding positionAddition and subtraction between.

1.4. Product operations

There are two product operations for vectors, one between vectors and numeric values, also known as thescalar product (of vectors)；
The other is the product operation between vectors and vectors, also known as theinner product。

scalar product (of vectors)After the operation, the vector is still a vector, and each element of the vector is multiplied by the scalar separately.

inner productAfter the operation, the vector becomes a numeric value (aka scalar):

The computation rule is to multiply the values of the corresponding positions of the vectors, and finally add the results of the computation for each position.

1.5. modulo

The vector also has amodal operation，modal operationis a way of quantizing a vector by converting it to a numerical value.
Thus, the magnitudes of different vectors can be easily compared.

The symbols for modulo operations are two vertical lines:\(||x||\), the arithmetic rule is equivalent to computing the inner product of a vector with itself and then squaring it.

2. Matrix

2.1 Definitions

matricescan be viewed as a collection of row vectors or column vectors of the same length.
It is similar to the programming languagetwo-dimensional array。

matricesis structured as follows, where the data is organized according to therectangular arrayThe structural arrangement of thematricesThe origin of the name.

It's a3x4The matrix of the3 rows and 4 columnsof the matrix.
Note that the number of rows and columns of a matrix does not have to be the same; when the number of rows and columns is the same, the matrix is also calledsquare-shaped formation (military)。

Similar to vectors, matrices can be transposed, and the transpose of a matrix is also a row-column swap:

2.2 Addition and subtraction

respond in singingvectorsSimilarly.matricesof addition and subtraction is also the operation of adding and subtracting elements in corresponding positions.
This requires that the two operations involved in the addition and subtraction ofmatricesMust have the same number of rows and columns.

Matrix subtraction operations are similar.

Matrices with different numbers of rows or columns cannot be added or subtracted.

2.3. Product operations

The product operation of matrices is also divided intoscalar product (of vectors)cap (a poem)inner product。
scalar product (of vectors)is computed similarly to a vector, with each element of the matrix multiplied by a scalar.

inner productThe operation is slightly more complex and has requirements on the matrices involved in the operation, requiring the first matrix of thecolumn numberis equal to the second matrix ofNumber of rows。
i.e., the matrix\(A\)respond in singing\(B\)If the inner product operation can be performed, the\(A\)respond in singing\(B\)consist of\(M\times N\)respond in singing\(N\times K\)The matrix of the
Their inner product is a\(M\times K\)Matrix. For example:

2.4. Unit matrices and inverse matrices

There is an extremely important special kind of matrix known as theunit matrix。
The unit matrix is first a square matrix and, except for the elements on the diagonal of the1Other than that, all other elements are0. For example:

unit matrixAlthough simple, it is not trivial, and has an important role in matrix factorization and doing operations such as Gaussian elimination.

If for the matrix\(A\), there exists a matrix\(B\)makes\(AB=I\)which\(I\)is the unit matrix.
So.\(B\)just like\(A\)(used form a nominal expression)inverse matrixat the same time\(A\)also\(B\)(used form a nominal expression)inverse matrix，\(B\)Generally expressed as\(A^{-1}\)。
That is:\(AA^{-1}=A^{-1}A=I\)

3. Summary

vectorscap (a poem)matricesare two of the most used structures in machine learning, for example:

Linear regression modeling:\(f(x) = w_0+w_1x_1+w_2x_2+...+w_nx_n=w^Tx\)
Clustering function Euclidean distance:\(d(X_i, C_j) = ||X_i - C_j||^2\)
L2 paradigm for data regularization:\(\parallel x \parallel_2=\sqrt{\sum_{i=1}^m \mid x_{i}^2\mid}\)
Wait... ...

Taking a closer look at the various formulas involved in machine learning models, most of them are a number ofvectorscap (a poem)matricesoperations, including addition and subtraction, scalar products and inner products, and more.
The reason why I find it difficult is that almost all the calculations I use in my normal life are scalar operations, and for thevectorsrespond in singingmatricesunfamiliar with the arithmetic of the
Add to that a lack of familiarity with the various mathematical symbols, and the mix feels like skywriting.

Actually.vectorsIt can be thought of as a specialmatrices, we can think of the row vectors as\(1\times N\)of the matrix;
Think of the column vectors as\(N\times 1\)of the matrix.

Finally, to leave you with a little question.
Q: The operations of vectors and matrices have the operations of addition, subtraction and multiplication, but there is no operation related to division, why?