Location>code7788 >text

Forward Differentiation and Inverse Automatic Differentiation Computation of Composite Functions

Popularity:710 ℃/2024-09-19 11:48:30

Forward Differentiation and Inverse Automatic Differentiation Computation of Composite Functions

with respect to

  • First published: 2024-09-13
  • Reference:
    • /2016-12-30/reverse-mode-automatic-differentiation
    • Calculus Early Transcendentals 9e - James Stewart (2020)
    • /wiki/Automatic_differentiation
  • Please do not hesitate to point out any mistakes.

Forward and backward automatic differentiation: mathematics

Let's review the calculus derivation rules first

Calculus Derivative Laws Review

multiplication rule

\[f(x) = u(x) \times v(x) \]

\[\begin{aligned} \frac{dy}{dx} &= \frac{du}{dx} \times v + \frac{dv}{dx} \times u \\ f'(x) &= u'v + v'u \end{aligned} \]

\[\begin{aligned} f(x)&=(3 x-5) \times(4 x+7) \\ u&=3 x-5 \quad v=4 x+7 \\ u^{\prime}&=3 \quad v^{\prime}=4 \\ f^{\prime}(x)&=3(4 x+7)+4(3 x-5) \\ &=12 x+21+12 x-20=24 x+1 \\ &=24 x+1 \end{aligned} \]

division rule

\[f(x) = \frac{u(x)}{v(x)} \]

\[\begin{aligned} f'(x) &= \frac{u'v - v'u}{v^2} \\ \frac{dy}{dx} &= \frac{\frac{du}{dx}v - \frac{dv}{dx}u}{v^2} \end{aligned} \]

\[\begin{aligned} f(x)&=\frac{3 x-5}{4 x+7} \\ u&=3 x-5 \quad v=4 x+7 \\ u^{\prime}&=3 \quad v^{\prime}=4 \\ f^{\prime}(x)&=\frac{3(4 x+7)-4(3 x-5)}{(4 x+7)^2} \\ &=\frac{12 x+21-12 x+20}{(4 x+7)^2} \\ &=\frac{41}{(4 x+7)^2} \end{aligned} \]

Derivation of cos and sin

\[\begin{aligned} y &= \sin(x) \\ \frac{dy}{dx} &= \cos(x) \end{aligned} \]

\[\begin{aligned} y = \cos(x) \\ \frac{dy}{dx} = -\sin(x) \end{aligned} \]

Chain rule (univariate composite function)

\[y = f(u) \quad u = f(x) \]

\[\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} \]

\[\begin{aligned} y&=(2 x+4)^3 \\ y&=u^3 \text { and } u=2 x+4 \\ \frac{d y}{d u}&=3 u^2 \quad \frac{d u}{d x}=2 \\ \frac{d y}{d x}&=3 u^2 \times 2=2 \times 3(2 x+4)^2 \\ &=6(2 x+4)^2 \end{aligned} \]

Multivariate chain rule (Case 1)

\[\begin{aligned} z &= f(x,y) \\ x &= g(t) \\ y &= h(t) \\ \end{aligned} \]

\[\frac{d z}{d t}=\frac{\partial f}{\partial x} \frac{d x}{d t}+\frac{\partial f}{\partial y} \frac{d y}{d t} \]

Multivariate chain rule (Case 2)

\[\begin{aligned} z &= f(x,y) \\ x & = g(s,t) \\ y &= h(s,t) \end{aligned} \]

\[\frac{\partial z}{\partial s}=\frac{\partial z}{\partial x} \frac{\partial x}{\partial s}+\frac{\partial z}{\partial y} \frac{\partial y}{\partial s} \quad \frac{\partial z}{\partial t}=\frac{\partial z}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial z}{\partial y} \frac{\partial y}{\partial t} \]

When calculating\(\frac{\partial z}{\partial s}\)When we hold\(t\) Fixed and calculated\(z\) treat (sb a certain way)\(s\) of ordinary derivatives, i.e., applying the multivariate chain rule (Case 1). Calculating the\(\frac{\partial z}{\partial t}\)The same reasoning applies at the time.

Multivariate chain rule (generalized version)

\[\begin{aligned} u &= f(x_1, x_2, \ldots, x_n) \\ x_k &= g(t_1, t_2, \ldots, t_m) \qquad \text{for } 1 \leq k \leq n \end{aligned} \]

\[\begin{aligned} &\frac{\partial u}{\partial t_i}=\frac{\partial u}{\partial x_1} \frac{\partial x_1}{\partial t_i}+\frac{\partial u}{\partial x_2} \frac{\partial x_2}{\partial t_i}+\cdots+\frac{\partial u}{\partial x_n} \frac{\partial x_n}{\partial t_i} \end{aligned} \qquad \text{for } 1 \leq i \leq m \]

Composite functions, partial differentiation, chain rule, forward and backward automatic differentiation

Order of calculation of forward and reverse

For combinatorial functions:

\[\begin{aligned} y & =f(g(h(x)))=f\left(g\left(h\left(w_0\right)\right)\right)=f\left(g\left(w_1\right)\right)=f\left(w_2\right)=w_3 \\ w_0 & =x \\ w_1 & =h\left(w_0\right) \\ w_2 & =g\left(w_1\right) \\ w_3 & =f\left(w_2\right)=y \end{aligned} \]

The chain rule will be given:

\[\begin{aligned} \frac{\partial y}{\partial x}&=\frac{\partial y}{\partial w_2} \frac{\partial w_2}{\partial w_1} \frac{\partial w_1}{\partial x}=\frac{\partial f\left(w_2\right)}{\partial w_2} \frac{\partial g\left(w_1\right)}{\partial w_1} \frac{\partial h\left(w_0\right)}{\partial x} \end{aligned} \]

Calculation order:

  • The forward differentiation is calculated by first calculating\(\partial w_1 / \partial x\)Then calculate\(\partial w_2/\partial w_1\)Final calculations\(\partial y / \partial w_2\)
  • The backward differentiation is calculated by first calculating\(\partial y / \partial w_2\)Then calculate\(\partial w_2/\partial w_1\)Final calculations\(\partial w_1 / \partial x\)

(math.) forward differentiation

For combinatorial functions:

\[\begin{aligned} r &= ? \\ s &= ? \\ t &= ? \\ x &= g(r,s,t) \\ y & = h(r,s,t) \\ z &= i(r,s,t) \\ u &= f(x,y,z) \end{aligned} \]

Forward differential calculus:

\[\begin{aligned} \frac{\partial r}{\partial v} &= ? \\ \frac{\partial s}{\partial v} &= ? \\ \frac{\partial t}{\partial v} &= ? \\ \\ \frac{\partial x}{\partial v} &= \frac{\partial x}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial x}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial x}{\partial t}\frac{\partial t}{\partial v} \\ \frac{\partial y}{\partial v} &= \frac{\partial y}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial y}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial y}{\partial t}\frac{\partial t}{\partial v} \\ \frac{\partial z}{\partial v} &= \frac{\partial z}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial z}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial z}{\partial t}\frac{\partial t}{\partial v} \\ \\ \frac{\partial u}{\partial v}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial v}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial v}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial v} \end{aligned} \]

(coll.) fail (a student)\(v=r\)upcoming\(r\)as an independent variable and set\(s\)cap (a poem)\(t\)When fixed, this gives

\[\begin{aligned} \frac{\partial r}{\partial v} &= 1 \\ \frac{\partial s}{\partial v} &= 0 \\ \frac{\partial t}{\partial v} &= 0 \\ \frac{\partial u}{\partial r}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial r}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial r}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial r} \end{aligned} \]

(coll.) fail (a student)\(v=s\)upcoming\(s\)as an independent variable and set\(r\)cap (a poem)\(t\)When fixed, this gives

\[\begin{aligned} \frac{\partial r}{\partial v} &= 0 \\ \frac{\partial s}{\partial v} &= 1 \\ \frac{\partial t}{\partial v} &= 0 \\ \frac{\partial u}{\partial s}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial s}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial s}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial s} \end{aligned} \]

(coll.) fail (a student)\(v=t\)upcoming\(t\)as an independent variable and set\(s\)cap (a poem)\(r\)When fixed, this gives

\[\begin{aligned} \frac{\partial r}{\partial v} &= 0 \\ \frac{\partial s}{\partial v} &= 0 \\ \frac{\partial t}{\partial v} &= 1 \\ \frac{\partial u}{\partial t}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial t}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial t} \end{aligned} \]

(math.) inverse differential

For combinatorial functions:

\[\begin{aligned} u_1 &= r(x_1, x_2) \\ u_2 &= s(x_1, x_2) \\ y_1 &= f(u_1, u_2) \\ y_2 &= g(u_1, u_2) \\ y_3 &= h(u_1, u_2) \end{aligned} \]

Reverse differential calculus:

\[\begin{aligned} \frac{\partial s}{\partial y_1} &= ? \\ \frac{\partial s}{\partial y_2} &= ? \\ \frac{\partial s}{\partial y_3} &= ? \\ \\ \frac{\partial s}{\partial u_1} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_1} + \frac{\partial s}{\partial y_2}\frac{\partial y_2}{\partial u_1} + \frac{\partial s}{\partial y_3}\frac{\partial y_3}{\partial u_1} \\ \frac{\partial s}{\partial u_2} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_2} + \frac{\partial s}{\partial y_2}\frac{\partial y_2}{\partial u_2} + \frac{\partial s}{\partial y_3}\frac{\partial y_3}{\partial u_2} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_1} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_x} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_x} \end{aligned} \]

One can imagine having a function\(s=function(y_1,y_2,y_3)\)

(coll.) fail (a student)\(s=y_1\)upcoming\(y_1\)as an independent variable and set\(y_2\)cap (a poem)\(y_3\)When fixed, this gives

\[\begin{aligned} \frac{\partial s}{\partial y_1} &= 1 \\ \frac{\partial s}{\partial y_2} &= 0 \\ \frac{\partial s}{\partial y_3} &= 0 \\ \\ \frac{\partial s}{\partial u_1} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_1}\\ \frac{\partial s}{\partial u_2} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_2} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_1} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_x} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_x} \end{aligned} \]

Calculation of automatic differentiation by example

(for) instance

Suppose there are 2 input variables (\(x_1\), \(x_2\)) and 2 output variables (\(y_1\), \(y_2\)):

\[\begin{aligned} m_1 &= x_1 \cdot x_2 + \sin(x_1) \\ m_2 &= 4x_1 + 2x_2 + \cos(x_2) \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{1} \]

To wit:

\[\begin{aligned} y_1 &= x_1 \cdot x_2 + \sin(x_1) + 4x_1 + 2x_2 + \cos(x_2) \\ y_2 &= (x_1 + x_2 + \sin(x_1)) \cdot (4x_1 + 2x_2 + \cos(x_2)) \end{aligned} \]

Among them:

\[\begin{aligned} \frac{\partial y_1}{\partial x_1} &= x_2 + \cos(x_1) + 4 \\ \frac{\partial y_1}{\partial x_2} &= x_1 + 2 - \sin(x_2) \\ \frac{\partial y_2}{\partial x_1} &= (x_2 + \cos(x_1)) \cdot m_2 + m_1 \cdot 4 \end{aligned} \]

Next, we will illustrate how to perform forward autodifferentiation and backward autodifferentiation with this example

Forward automatic differentiation

We will use the following chain rule:

\[\begin{align} \frac{\partial w}{\partial t} &= \sum_i \left(\frac{\partial w}{\partial u_i} \cdot \frac{\partial u_i}{\partial t}\right) \\ &= \frac{\partial w}{\partial u_1} \cdot \frac{\partial u_1}{\partial t} + \frac{\partial w}{\partial u_2} \cdot \frac{\partial u_2}{\partial t} + \cdots \end{align} \]

Among them:

  • \(w\)represent an output
    • In the example, for the\(y_1\)or\(y_2\)
  • \(u_i\)Expressing direct impact\(w\)input variable
    • In the example, for the\(a\)cap (a poem)\(b\)
  • \(t\)indicateTo be giveninput variable
    • In the example, for the\(x_1\)or\(x_2\)one of them

Before calculating it, we break down Eq. (1) into simple arithmetic calculations:

\[\begin{aligned} x_1 &= ? \\ x_2 &= ? \\ \\ a &= x_1 \cdot x_2 \\ b &= \sin(x_1) \\ \\ c &= 4x_1 + 2x_2 \\ d &= \cos(x_2) \\ \\ m_1 &= a + b \\ m_2 &= c + d \\ \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{2} \]

Now we are interested in the variables to be given\(t\)Ask for guidance:

\[\begin{aligned} \frac{\partial x_1}{\partial t} &= ? \\ \frac{\partial x_2}{\partial t} &= ? \\ \\ \frac{\partial a}{\partial t} &= x_2\frac{\partial x_1}{\partial t} + x_1 \frac{\partial x_2}{\partial t} \\ \frac{\partial b}{\partial t} &= \cos(x_1) \frac{\partial x_1}{\partial t} \\ \\ \frac{\partial c}{\partial t} &= 4\frac{\partial x_1}{\partial t} + 2 \frac{\partial x_2}{\partial t} \\ \frac{\partial d}{\partial t} &= -\sin(x_2)\frac{\partial x_2}{\partial t} \\ \\ \frac{\partial m_1}{\partial t} &= \frac{\partial a}{\partial t} + \frac{\partial b}{\partial t} \\ \frac{\partial m_2}{\partial t} &= \frac{\partial c}{\partial t} + \frac{\partial d}{\partial t} \\ \\ \frac{\partial y_1}{\partial t} &= \frac{\partial m_1}{\partial t} + \frac{\partial m_2}{\partial t} \\ \frac{\partial y_2}{\partial t} &= \frac{\partial m_1}{\partial t} \cdot m_2 + \frac{\partial m_2}{\partial t} \cdot m_1 \end{aligned} \]

As mentioned earlier\(t\)is yet to be given, and now is the time to give it:

  • commander-in-chief (military)\(t=x_1\)Substituting into the above equation, the\(\frac{\partial x_1}{\partial t} = 1\)(indicates contrast)\(\frac{\partial x_2}{\partial t}=0\)The following can then be computed\(\frac{\partial y_1}{\partial x_1}\)cap (a poem)\(\frac{\partial y_2}{\partial x_1}\)

\[\begin{aligned} \frac{\partial x_1}{\partial t} &= 1 \\ \frac{\partial x_2}{\partial t} &= 0 \\ \\ \frac{\partial a}{\partial t} &= x_2\frac{\partial x_1}{\partial t} + x_1 \frac{\partial x_2}{\partial t} = x_2 \\ \frac{\partial b}{\partial t} &= \cos(x_1) \frac{\partial x_1}{\partial t} = \cos(x_1) \\ \\ \frac{\partial c}{\partial t} &= 4\frac{\partial x_1}{\partial t} + 2 \frac{\partial x_2}{\partial t} = 4 \\ \frac{\partial d}{\partial t} &= -\sin(x_2)\frac{\partial x_2}{\partial t} = 0\\ \\ \frac{\partial m_1}{\partial t} &= \frac{\partial a}{\partial t} + \frac{\partial b}{\partial t} = x_2 + \cos(x_1) \\ \frac{\partial m_2}{\partial t} &= \frac{\partial c}{\partial t} + \frac{\partial d}{\partial t} = 4 \\ \\ \frac{\partial y_1}{\partial t} &= \frac{\partial m_1}{\partial t} + \frac{\partial m_2}{\partial t} = x_2 + \cos(x_1) + 4 \\ \frac{\partial y_2}{\partial t} &= \frac{\partial m_1}{\partial t} \cdot m_2 + \frac{\partial m_2}{\partial t} \cdot m_1 = (x_2 + \cos(x_1)) \cdot m_2 + 4 \cdot m_1 \end{aligned} \]

  • commander-in-chief (military)\(t=x_2\)Substituting into the above equation, the\(\frac{\partial x_1}{\partial t} = 0\)(indicates contrast)\(\frac{\partial x_2}{\partial t}=1\)The following can then be computed\(\frac{\partial y_1}{\partial x_2}\)cap (a poem)\(\frac{\partial y_2}{\partial x_2}\)

It can be inferred:

  • just at (a time or place)\(n\)In the case of two input variables (there are two in this example), it is necessary to compute the\(n\)times the above formula.
  • Assuming that the input in the neural network is a 1280 x 720 picture and the output is 51 floating point numbers, the forward differentiation method requires 921600 computations.

Reverse auto-differentiation

We will use the following chain rule:

\[\begin{align} \frac{\partial s}{\partial u} &= \sum_i \left(\frac{\partial w_i}{\partial u} \cdot \frac{\partial s}{\partial w_i}\right) \\ &= \frac{\partial w_1}{\partial u} \cdot \frac{\partial s}{\partial w_1} + \frac{\partial w_2}{\partial u} \cdot \frac{\partial s}{\partial w_2} + \cdots \end{align} \]

Among them:

  • \(u\) Indicates an input variable
  • \(w_i\) express reliance on\(u\) output variable
  • \(s\) indicateTo be givenvariable

Recall that the disassembled simple arithmetic computes (2):

\[\begin{aligned} x_1 &= ? \\ x_2 &= ? \\ \\ a &= x_1 \cdot x_2 \\ b &= \sin(x_1) \\ \\ c &= 4x_1 + 2x_2 \\ d &= \cos(x_2) \\ \\ m_1 &= a + b \\ m_2 &= c + d \\ \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{2} \]

Now compute the backward differential:

\[\begin{aligned} \frac{\partial s}{\partial y_1} &= ? \\ \frac{\partial s}{\partial y_2} &= ? \\ \\ \frac{\partial s}{\partial m_1} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_1} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_1} \\ \frac{\partial s}{\partial m_2} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_2} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_2} \\ \\ \frac{\partial s}{\partial a} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial a} \\ \frac{\partial s}{\partial b} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial b} \\ \frac{\partial s}{\partial c} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial c} \\ \frac{\partial s}{\partial d} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial d} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial b}\frac{\partial b}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} + \frac{\partial s}{\partial d}\frac{\partial d}{\partial x_1} \end{aligned} \]

(coll.) fail (a student)\(s=y_1\)Hours:

\[\begin{aligned} \frac{\partial s}{\partial y_1} &= 1 \\ \frac{\partial s}{\partial y_2} &= 0 \\ \\ \frac{\partial s}{\partial m_1} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_1} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_1} = 1 \\ \frac{\partial s}{\partial m_2} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_2} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_2} = 1 \\ \\ \frac{\partial s}{\partial a} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial a} = 1 \\ \frac{\partial s}{\partial b} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial b} = 1 \\ \frac{\partial s}{\partial c} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial c} = 1 \\ \frac{\partial s}{\partial d} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial d} = 1 \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial b}\frac{\partial b}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} = 1 \cdot x_2 + 1 \cdot \cos(x_1) + 1 \cdot 4 = x_2 + \cos(x_1) + 4 \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_2} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_2} + \frac{\partial s}{\partial d}\frac{\partial d}{\partial x_2} = 1 \cdot x_1 + 1 \cdot 2 + 1 \cdot (-\sin(x_2)) = x_1 + 2 -\sin(x_2) \end{aligned} \]

Similarly one can calculate when\(s=y_2\)Hours.

It can be inferred:

  • just at (a time or place)\(n\)In the case of two output variables (there are two in this example), it is necessary to compute the\(n\)times the above formula.
  • Assuming that the input in the neural network is a 1280 x 720 image and the output is 51 floating point numbers, the backward differentiation method would need to be computed 51 times.