Processing math: 100%

Sunday, November 10, 2019

On Logistic Regression

Denote σ(z) the sigmoid function defined by z1/(1+ez), for a given feature XRn the estimator of P(y=1|X)  is given by ˆy=a=σ(wTX+b), where wRn and bR.

Given training examples {(X(i),y(i){0,1}):i=1,2,,m} we define RmA=[a(1)a(m)]T=[σ(wTX(1)+b)σ(wTX(m)+b)]T the vector (stacked results) of estimated probability of "positive results" of each training example and Y=[y(1)y(m)]T the vector (stacked results) of the truth values from the training examples.

The cost function in logistic regression (given m training exmaples) is given by J=J(w,b)=1mmi=1L(a(i),y(i)), where L(a(i),y(i))=y(i)lna(i)(1y(i))ln(1a(i)). and a(i)'s are defined above in (). By minimizing this cost function we can solve out w and b, then we get a simple "trained machine". To get the minimization result we apply the gradient decent: wi:=wiαJwi(wi,b),b:=bαJb(wi,b), by direct computation we can easily stack up the above result to get (Jw(w,b))T:=([Jw1Jwn])T(w,b)=1mmi=1ei(eTiX)(AY)=J/wi(w,b)=1mX(AY), where ei's denote the standard basis of Rn (recall that n=nx is the dimension of the feature X), which in numpy can be written as
1
2
m = X.shape[1]
dJ_dw = 1/m * np.dot(X.reshape(-1, m), A.reshape(-1, 1)-Y.reshape(-1, 1)).
Note that the RHS above is an n×1 vector. Indeed in this article we strictly follow the convention in linear algebra (mathematical aspect) that any element in Rn is represented as a column. Similarly for Jb we also get Jb=1mmi=1(a(i)y(i)) which can be computed in python by
1
dJ_db = 1/m * np.sum(A-Y)

No comments:

Post a Comment