格仔 Blog: Formulas Revisit

Sunday, September 27, 2020

Formulas Revisit

When I study Andrew Ng course I am used to formulas like $dZ^{[\ell]}, dA$ , etc notations. Recently I revisit the topic and I am then used to using the notation \[u_i^{[\ell]} = W^{[\ell]}_{i:}\cdot y^{[\ell-1]} + b^{[\ell]}\quad\text{and}\quad y^{[\ell]} = \Phi^{[\ell]}(u),\] so I want to record the corresponding formulas for computation. Since the notation $dW$ doesn't look any cleaner than $\displaystyle\frac{\partial \mathcal L}{\partial W}$, in the sequel I write everything explicitly. \[ \boxed{\frac{\partial \mathcal L}{\partial W^{[\ell]}} = \frac{1}{m}\cdot \frac{\partial \mathcal L }{\partial U^{[\ell]}}\cdot Y^{[\ell-1]T}} \] \[ \boxed{\frac{\partial \mathcal L}{\partial U^{[\ell]}} = \brac{ W^{[\ell +1]T} \cdot \frac{\partial \mathcal L}{\partial U^{[\ell+1]}} }* \Phi^{[\ell]}{}'(U^{[\ell]})} \] where $*$ denotes entrywise product of matrices. \[ \boxed{\frac{\partial \mathcal L}{ \partial Y^{[\ell - 1]}} = W^{[\ell]T}\cdot \frac{\partial \mathcal L}{\partial U^{[\ell]}}} \] The last two yield the following for $\ell<L$ the hidden layer and for $\Phi^{[\ell]}:\R\to \R$ the activation function at $\ell$-th layer. \[ \boxed{\frac{\partial \mathcal L}{\partial U^{[\ell]}} = \frac{\partial \mathcal L}{\partial Y^{[\ell]}} * \Phi^{[\ell]}{}'(U^{[\ell]})} \] and finally \[ \boxed{\frac{\partial \mathcal L}{\partial b^{[\ell]} } =\frac{1}{m}\cdot \sum_{i=1}^m \frac{\partial \mathcal L}{\partial u^{[\ell](i)}} =\frac{1}{m}\cdot \text{np.sum}\brac{\frac{\partial \mathcal L}{\partial U^{[\ell]}},\text{axis = 1}}} \] For derivation of these formulas one can visit my another post: https://checkerlee.blogspot.com/2019/11/important-formulas-in-backward.html#more

格仔 Blog

Pages

Sunday, September 27, 2020

Formulas Revisit

No comments:

Post a Comment