Given training examples {(X(i),y(i)∈{0,1}):i=1,2,…,m} we define Rm∋A=[a(1)⋯a(m)]T=[σ(wTX(1)+b)⋯σ(wTX(m)+b)]T the vector (stacked results) of estimated probability of "positive results" of each training example and Y=[y(1)⋯y(m)]T the vector (stacked results) of the truth values from the training examples.
The cost function in logistic regression (given m training exmaples) is given by J=J(w,b)=1mm∑i=1L(a(i),y(i)), where L(a(i),y(i))=−y(i)lna(i)−(1−y(i))ln(1−a(i)). and a(i)'s are defined above in (∗). By minimizing this cost function we can solve out w and b, then we get a simple "trained machine". To get the minimization result we apply the gradient decent: wi:=wi−α∂J∂wi(wi,b),b:=b−α∂J∂b(wi,b), by direct computation we can easily stack up the above result to get (∂J∂w(w,b))T:=([∂J∂w1⋯∂J∂wn])T(w,b)=1mm∑i=1ei(eTiX)(A−Y)⏟=∂J/∂wi(w,b)=1mX(A−Y), where ei's denote the standard basis of Rn (recall that n=nx is the dimension of the feature X), which in
numpy
can be written as
1 2 | m = X.shape[ 1 ] dJ_dw = 1 / m * np.dot(X.reshape( - 1 , m), A.reshape( - 1 , 1 ) - Y.reshape( - 1 , 1 )). |
1 | dJ_db = 1 / m * np. sum (A - Y) |
No comments:
Post a Comment