\( \newcommand{\N}{\mathbb{N}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\P}{\mathcal P} \newcommand{\B}{\mathcal B} \newcommand{\F}{\mathbb{F}} \newcommand{\E}{\mathcal E} \newcommand{\brac}[1]{\left(#1\right)} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\matrixx}[1]{\begin{bmatrix}#1\end {bmatrix}} \newcommand{\vmatrixx}[1]{\begin{vmatrix} #1\end{vmatrix}} \newcommand{\lims}{\mathop{\overline{\lim}}} \newcommand{\limi}{\mathop{\underline{\lim}}} \newcommand{\limn}{\lim_{n\to\infty}} \newcommand{\limsn}{\lims_{n\to\infty}} \newcommand{\limin}{\limi_{n\to\infty}} \newcommand{\nul}{\mathop{\mathrm{Nul}}} \newcommand{\col}{\mathop{\mathrm{Col}}} \newcommand{\rank}{\mathop{\mathrm{Rank}}} \newcommand{\dis}{\displaystyle} \newcommand{\spann}{\mathop{\mathrm{span}}} \newcommand{\range}{\mathop{\mathrm{range}}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\innerr}[1]{\left\langle #1 \right \rangle} \newcommand{\ol}[1]{\overline{#1}} \newcommand{\toto}{\rightrightarrows} \newcommand{\upto}{\nearrow} \newcommand{\downto}{\searrow} \newcommand{\qed}{\quad \blacksquare} \newcommand{\tr}{\mathop{\mathrm{tr}}} \newcommand{\bm}{\boldsymbol} \newcommand{\cupp}{\bigcup} \newcommand{\capp}{\bigcap} \newcommand{\sqcupp}{\bigsqcup} \newcommand{\re}{\mathop{\mathrm{Re}}} \newcommand{\im}{\mathop{\mathrm{Im}}} \newcommand{\comma}{\text{,}} \newcommand{\foot}{\text{。}} \)

Sunday, September 27, 2020

Formulas Revisit

When I study Andrew Ng course I am used to formulas like $dZ^{[\ell]}, dA$ , etc notations. Recently I revisit the topic and I am then used to using the notation \[u_i^{[\ell]} = W^{[\ell]}_{i:}\cdot y^{[\ell-1]} + b^{[\ell]}\quad\text{and}\quad y^{[\ell]} = \Phi^{[\ell]}(u),\] so I want to record the corresponding formulas for computation. Since the notation $dW$ doesn't look any cleaner than $\displaystyle\frac{\partial \mathcal L}{\partial W}$, in the sequel I write everything explicitly. \[ \boxed{\frac{\partial \mathcal L}{\partial W^{[\ell]}} = \frac{1}{m}\cdot \frac{\partial \mathcal L }{\partial U^{[\ell]}}\cdot Y^{[\ell-1]T}} \] \[ \boxed{\frac{\partial \mathcal L}{\partial U^{[\ell]}} = \brac{ W^{[\ell +1]T} \cdot \frac{\partial \mathcal L}{\partial U^{[\ell+1]}} }* \Phi^{[\ell]}{}'(U^{[\ell]})} \] where $*$ denotes entrywise product of matrices. \[ \boxed{\frac{\partial \mathcal L}{ \partial Y^{[\ell - 1]}} = W^{[\ell]T}\cdot \frac{\partial \mathcal L}{\partial U^{[\ell]}}} \] The last two yield the following for $\ell<L$ the hidden layer and for $\Phi^{[\ell]}:\R\to \R$ the activation function at $\ell$-th layer. \[ \boxed{\frac{\partial \mathcal L}{\partial U^{[\ell]}} = \frac{\partial \mathcal L}{\partial Y^{[\ell]}} * \Phi^{[\ell]}{}'(U^{[\ell]})} \] and finally \[ \boxed{\frac{\partial \mathcal L}{\partial b^{[\ell]} } =\frac{1}{m}\cdot \sum_{i=1}^m \frac{\partial \mathcal L}{\partial u^{[\ell](i)}} =\frac{1}{m}\cdot \text{np.sum}\brac{\frac{\partial \mathcal L}{\partial U^{[\ell]}},\text{axis = 1}}} \] For derivation of these formulas one can visit my another post: https://checkerlee.blogspot.com/2019/11/important-formulas-in-backward.html#more

No comments:

Post a Comment