格仔 Blog: Derive the Formula of $\displaystyle \frac{\partial \mathcal L}{\partial W^{[\ell]}}$

Tuesday, September 29, 2020

Derive the Formula of $\displaystyle \frac{\partial \mathcal L}{\partial W^{[\ell]}}$

Wikipedia record the following formula without proof:

I accidentally found that by the formulas in the previous post, we can already derive the following

Theorem. For every

$\ell<L-1$ , we let

$\Phi^{[\ell]}:\R\to \R$ denote the activation function in the hidden layer, then we have

$\frac{\partial \mathcal L }{\partial W^{[\ell]}}=\underbrace{\frac{1}{m}\Phi^{[\ell]}{}'(U^{[\ell]}) * \left[\prod_{i=\ell +1}^{L-1} (\Phi^{[i]}{}'(U^{[i]}) * W^{[i]T}\right]\cdot \frac{\partial \mathcal L}{\partial Y^{[L-1]}} }_{:=\delta_\ell} \cdot Y^{[\ell-1]T} = \delta_{\ell}\cdot Y^{[\ell-1]T}.$ Here

$*$ denotes the entrywise multiplication. Since

$\displaystyle \frac{\partial \mathcal L}{\partial W^{[L]}}=\frac{1}{m}\cdot \frac{\partial \mathcal L}{\partial U^{[L]}}\cdot Y^{[L-1]T}$ , we also define

$\boxed{\delta_L = \frac{1}{m}\cdot \frac{\partial \mathcal L}{\partial U^{[L]}}}$ and since

$\frac{\partial \mathcal L}{\partial W^{[L-1]}} =\frac{1}{m}\Phi^{[L-1]}{}'(U^{[L-1]})* \left( W^{[L]T} \cdot \frac{\partial \mathcal L}{\partial U^{[L]}}\right) Y^{[L-2]T}=\delta_{L-1} Y^{[L-2]T}$ with

$\delta_{L-1} :=\frac{1}{m}\cdot \left( \Phi^{[L-1]}{}'(U^{[L-1]}) * W^{[L]T}\right)\cdot \frac{\partial \mathcal L}{\partial U^{[L]}}$ , by the definition of

$\delta_\ell$ for

$\ell<L-1$ above, we obtain for every

$\ell\leq L-1$ ,

$\boxed{ \delta_{\ell} = \frac{1}{m}\cdot \Phi^{[\ell]}{}'(U^{[\ell]}) * \left[W^{[\ell+1]T} \cdot \delta_{\ell+1}\right]\quad \text{with}\quad \frac{\partial \mathcal L}{\partial W^{[\ell]}} = \delta_\ell Y^{[\ell-1]T}.}$ And as a side consequence of our computation, since

$\displaystyle\frac{1}{m}\cdot \frac{\partial \mathcal L}{\partial U^{[\ell]}} = \delta_\ell$ ,

$\boxed{\frac{\partial \mathcal L}{\partial b^{[\ell]}} = \text{np.sum}(\delta_\ell,\text{axis=1}).}$

The last two formulars are computationally very useful. Note that in the definition of

$\delta_\ell$ , the multiplication in the product notation will not make sense unless they act on the rightmost matrix

$\displaystyle \frac{\partial \mathcal L}{\partial Y^{[L-1]}}$ in a correct order (from the biggest index). To simplify notations we follow Andrew Ng's course to define

$dW = \partial \mathcal L /\partial W$ and similarly for other matrices.

Proof. By repeated use of the formular

$dY^{[\ell]} = [W^{[\ell+1]T}dY^{[\ell+1]}] * \Phi^{[\ell+1]}(U^{[\ell+1]})$ we have

$\begin{align*} dW^{[\ell]}& = \frac{1}{m} dU^{[\ell]} Y^{[\ell-1]T}\\ &=\frac{1}{m}\left(\left[dY^{[\ell]}\right] * \Phi^{[\ell]}{}'(U^{[\ell]})\right) Y^{[\ell-1]T}\\ &=\frac{1}{m}\left( \Phi^{[\ell]'}(U^{[\ell]})* \left[\prod_{i=\ell+1}^{L-1} \Phi^{[i]}{}'(U^{[i]}) * W^{[i]T}\right]\cdot dY^{[L-1]}\right) \cdot Y^{[\ell-1]T} \end{align*}$ And recall that

$dY^{[L]} =\displaystyle \frac{\partial \mathcal L}{\partial Y^{[L]}}. \qed$

格仔 Blog

Pages

Tuesday, September 29, 2020

Derive the Formula of $\displaystyle \frac{\partial \mathcal L}{\partial W^{[\ell]}}$

No comments:

Post a Comment

Pages

Tuesday, September 29, 2020

Derive the Formula of ∂L∂W[ℓ]\displaystyle \frac{\partial \mathcal L}{\partial W^{[\ell]}}

No comments:

Post a Comment

Derive the Formula of $\displaystyle \frac{\partial \mathcal L}{\partial W^{[\ell]}}$