\sigma(w^TX+b),
\] where $w\in \R^n$ and $b\in \R$.
Given training examples $\{(X^{(i)}, y^{(i)}\in \{0,1\}):i=1,2,\dots,m\}$ we define \[
\R^m \ni A = \begin{bmatrix} a^{(1)}&\cdots & a^{(m)} \end{bmatrix}^T=\begin{bmatrix}\sigma(w^TX^{(1)}+b)&\cdots &\sigma(w^TX^{(m)}+b) \end{bmatrix}^T \tag*{($*$)}
\] the vector (stacked results) of estimated probability of "positive results" of each training example and \[
Y = \begin{bmatrix}y^{(1)}&\cdots &y^{(m)} \end{bmatrix}^T
\] the vector (stacked results) of the truth values from the training examples.
The cost function in logistic regression (given $m$ training exmaples) is given by \[
J = J(w,b)= \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)}),
\] where \[
\mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \ln a^{(i)} - (1-y^{(i)} ) \ln(1-a^{(i)}).
\] and $a^{(i)}$'s are defined above in ($*$). By minimizing this cost function we can solve out $w$ and $b$, then we get a simple "trained machine". To get the minimization result we apply the gradient decent: \[
\begin{align*}
w_i &:= w_i -\alpha \frac{\partial J}{\partial w_i}(w_i,b),\\
b &:= b-\alpha \frac{\partial J}{\partial b}(w_i,b),
\end{align*}
\] by direct computation we can easily stack up the above result to get \[
\brac{\frac{\partial J}{\partial w} (w,b)}^T: = \brac{\begin{bmatrix}
\dis\frac{\partial J}{\partial w_1}&\cdots &\dis \frac{\partial J}{\partial w_n}
\end{bmatrix}}^T(w,b) = \frac{1}{m} \sum_{i=1}^m e_i \underbrace{(e_i^T X)(A-Y)}_{=\partial J /\partial w_i (w,b)} = \frac{1}{m}X(A-Y) ,
\] where $e_i$'s denote the standard basis of $\R^n$ (recall that $n=n_x$ is the dimension of the feature $X$), which in
numpy
can be written as
m = X.shape[1] dJ_dw = 1/m * np.dot(X.reshape(-1, m), A.reshape(-1, 1)-Y.reshape(-1, 1)).Note that the RHS above is an $n\times 1$ vector. Indeed in this article we strictly follow the convention in linear algebra (mathematical aspect) that any element in $\R^n$ is represented as a column. Similarly for $\frac{\partial J}{\partial b}$ we also get \[
\frac{\partial J}{\partial b} = \frac{1}{m}\sum_{i=1}^m (a^{(i)} - y^{(i)})
\] which can be computed in python by
dJ_db = 1/m * np.sum(A-Y)
No comments:
Post a Comment