格仔 Blog: On Logistic Regression

Sunday, November 10, 2019

On Logistic Regression

Denote

$\sigma(z)$ the sigmoid function defined by

$z\mapsto 1/(1+e^{-z})$ , for a given feature

$X\in \R^n$ the estimator of

$\mathbb{P}(y = 1 |X)$ is given by

$\hat y= a = \sigma(w^TX+b),$ where

$w\in \R^n$ and

$b\in \R$ .

Given training examples

$\{(X^{(i)}, y^{(i)}\in \{0,1\}):i=1,2,\dots,m\}$ we define

$\R^m \ni A = \begin{bmatrix} a^{(1)}&\cdots & a^{(m)} \end{bmatrix}^T=\begin{bmatrix}\sigma(w^TX^{(1)}+b)&\cdots &\sigma(w^TX^{(m)}+b) \end{bmatrix}^T \tag*{($*$)}$ the vector (stacked results) of estimated probability of "positive results" of each training example and

$Y = \begin{bmatrix}y^{(1)}&\cdots &y^{(m)} \end{bmatrix}^T$ the vector (stacked results) of the truth values from the training examples.

The cost function in logistic regression (given

$m$ training exmaples) is given by

$J = J(w,b)= \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)}),$ where

$\mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \ln a^{(i)} - (1-y^{(i)} ) \ln(1-a^{(i)}).$ and

$a^{(i)}$ 's are defined above in (

$*$ ). By minimizing this cost function we can solve out

$w$ and

$b$ , then we get a simple "trained machine". To get the minimization result we apply the gradient decent:

$\begin{align*} w_i &:= w_i -\alpha \frac{\partial J}{\partial w_i}(w_i,b),\\ b &:= b-\alpha \frac{\partial J}{\partial b}(w_i,b), \end{align*}$ by direct computation we can easily stack up the above result to get

$\brac{\frac{\partial J}{\partial w} (w,b)}^T: = \brac{\begin{bmatrix} \dis\frac{\partial J}{\partial w_1}&\cdots &\dis \frac{\partial J}{\partial w_n} \end{bmatrix}}^T(w,b) = \frac{1}{m} \sum_{i=1}^m e_i \underbrace{(e_i^T X)(A-Y)}_{=\partial J /\partial w_i (w,b)} = \frac{1}{m}X(A-Y) ,$ where

$e_i$ 's denote the standard basis of

$\R^n$ (recall that

$n=n_x$ is the dimension of the feature

$X$ ), which in numpy can be written as

1 2	`m` `=` `X.shape[1]` `dJ_dw` `=` `1/m` `*` `np.dot(X.reshape(-1, m), A.reshape(-1,` `1)-Y.reshape(-1,` `1)).`

Note that the RHS above is an

$n\times 1$ vector. Indeed in this article we strictly follow the convention in linear algebra (mathematical aspect) that any element in

$\R^n$ is represented as a column. Similarly for

$\frac{\partial J}{\partial b}$ we also get

$\frac{\partial J}{\partial b} = \frac{1}{m}\sum_{i=1}^m (a^{(i)} - y^{(i)})$ which can be computed in python by

1	`dJ_db` `=` `1/m` `*` `np.sum(A-Y)`

格仔 Blog

Pages

Sunday, November 10, 2019

On Logistic Regression

No comments:

Post a Comment