When I study Andrew Ng course I am used to formulas like dZ[ℓ],dA , etc notations. Recently I revisit the topic and I am then used to using the notation u[ℓ]i=W[ℓ]i:⋅y[ℓ−1]+b[ℓ]andy[ℓ]=Φ[ℓ](u), so I want to record the corresponding formulas for computation.
Since the notation dW doesn't look any cleaner than ∂L∂W, in the sequel I write everything explicitly.
∂L∂W[ℓ]=1m⋅∂L∂U[ℓ]⋅Y[ℓ−1]T
∂L∂U[ℓ]=(W[ℓ+1]T⋅∂L∂U[ℓ+1])∗Φ[ℓ]′(U[ℓ])
where ∗ denotes entrywise product of matrices.
∂L∂Y[ℓ−1]=W[ℓ]T⋅∂L∂U[ℓ]
The last two yield the following for ℓ<L the hidden layer and for Φ[ℓ]:R→R the activation function at ℓ-th layer.
∂L∂U[ℓ]=∂L∂Y[ℓ]∗Φ[ℓ]′(U[ℓ])
and finally ∂L∂b[ℓ]=1m⋅m∑i=1∂L∂u[ℓ](i)=1m⋅np.sum(∂L∂U[ℓ],axis = 1)
For derivation of these formulas one can visit my another post: https://checkerlee.blogspot.com/2019/11/important-formulas-in-backward.html#more
No comments:
Post a Comment