Sunday, September 27, 2020
Formulas Revisit
When I study Andrew Ng course I am used to formulas like $dZ^{[\ell]}, dA$ , etc notations. Recently I revisit the topic and I am then used to using the notation \[u_i^{[\ell]} = W^{[\ell]}_{i:}\cdot y^{[\ell-1]} + b^{[\ell]}\quad\text{and}\quad y^{[\ell]} = \Phi^{[\ell]}(u),\] so I want to record the corresponding formulas for computation.
Since the notation $dW$ doesn't look any cleaner than $\displaystyle\frac{\partial \mathcal L}{\partial W}$, in the sequel I write everything explicitly.
\[
\boxed{\frac{\partial \mathcal L}{\partial W^{[\ell]}} = \frac{1}{m}\cdot \frac{\partial \mathcal L }{\partial U^{[\ell]}}\cdot Y^{[\ell-1]T}}
\]
\[
\boxed{\frac{\partial \mathcal L}{\partial U^{[\ell]}} = \brac{ W^{[\ell +1]T} \cdot \frac{\partial \mathcal L}{\partial U^{[\ell+1]}} }* \Phi^{[\ell]}{}'(U^{[\ell]})}
\]
where $*$ denotes entrywise product of matrices.
\[
\boxed{\frac{\partial \mathcal L}{ \partial Y^{[\ell - 1]}} = W^{[\ell]T}\cdot \frac{\partial \mathcal L}{\partial U^{[\ell]}}}
\]
The last two yield the following for $\ell<L$ the hidden layer and for $\Phi^{[\ell]}:\R\to \R$ the activation function at $\ell$-th layer.
\[
\boxed{\frac{\partial \mathcal L}{\partial U^{[\ell]}} = \frac{\partial \mathcal L}{\partial Y^{[\ell]}} * \Phi^{[\ell]}{}'(U^{[\ell]})}
\]
and finally \[
\boxed{\frac{\partial \mathcal L}{\partial b^{[\ell]} } =\frac{1}{m}\cdot \sum_{i=1}^m \frac{\partial \mathcal L}{\partial u^{[\ell](i)}} =\frac{1}{m}\cdot \text{np.sum}\brac{\frac{\partial \mathcal L}{\partial U^{[\ell]}},\text{axis = 1}}}
\]
For derivation of these formulas one can visit my another post: https://checkerlee.blogspot.com/2019/11/important-formulas-in-backward.html#more
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment