http://ihome.ust.hk/~cclee/document/PRESENTSVD.pdf
The proof in the original text is too short and seemingly incomplete to me, so I modified the proof and filled in many details. Let me first describe terminology used in this notes, and let me for the moment describe everything in $ \mathbb R$ instead of $ \mathbb C$. Let $ \mathbb{R}^{m\times n}$ denote the collection of $ m\times n$ dimensional real matrices. For $ x\in \mathbb R^k$, write $ x = (x^1,x^2,\dots,x^k)$ (i.e., $ x^k$ is the $ k$th coordinate of $ x$), define $ \|x\|_2=\sqrt{\sum_{i=1}^k |x^i|^2}$.
We also define the $ (n-1)$-sphere to be $ S^{n-1}:=\{x\in \mathbb R^n:(x^1)^2+\cdots+(x^n)^2=1\}$. For $ A\in \mathbb R^{m\times n}$ we define the operator norm
\[ \|A\|_2= \sup_{x\in \mathbb R^n, \|x\|_2=1}\|Ax\|_2.\]
We assume, and what we are trying to prove in my draft, that $ AS^{n-1}=\{Ax:x\in S^{n-1}\}$ is a ``hyperellipse" (or just an ellipse when $ n \leq 3$). Suppose also that $ m\ge n$ and $ A$ is of full rank, then it is legitimate to assume there are unit vectors $ u_1,\dots,u_n$ which point in the direction of semi-axises of the hyperellipse.
We can describe a general construction for $ u_i$'s: first find a vector in $ AS^{n-1}$ that has largest length, call it $ a\in \mathbb R^{m}$, then find another vector $ b$ in $ AS^{n-1} \cap (\mathbb R a)^\perp$ that has largest length, then find another such vector in $ AS^{n-1} \cap (\mathbb R a)^\perp\cap (\mathbb R b)^\perp$ and continue the process until we find $ n$ such vectors. This is indeed how we find semi-axises of an ellipse in $ \mathbb R^3$ and $ \mathbb R^2$.
Now in the way above, the length of semi-axis in the direction of $ u_i$, $ \sigma_i$, is in nonincreasing order, i.e., $ \sigma_1\ge \sigma_2\ge\cdots\ge \sigma_n>0$. Take the unit vector $ v_i \in A^{-1}(\sigma_i u_i)$, one gets \[A\underbrace{\left[\begin{array}{c|c|c}v_1&\cdots&v_n\end{array}\right]}_{:=V}=\left[\begin{array}{c|c|c}u_1&\cdots&u_n\end{array}\right]\begin{bmatrix}\sigma_1&& \\ &\ddots & \\ & & \sigma_n\end{bmatrix}.\]
Let $ u_{n+1},\dots, u_m$ be orthonormal basis in $ (\mathrm{span}_{\mathbb R}\{u_1,\dots,u_n\})^\perp$, then RHS of the above equation becomes
\[\underbrace{\left[\begin{array}{c|c|c}u_1&\cdots&u_m\end{array}\right]}_{:= U}\underbrace{\left[\begin{array}{ccc}\sigma_1&& \\ &\ddots & \\ & & \sigma_n \\ \hline &\mathcal O& \end{array}\right]}_{:=\Sigma},\]
where $ \mathcal O$ denotes a matrix with only 0 entries. One can rewrite the above as: $ AV = U\Sigma$. It will be proved in my draft that indeed $ \{v_i\}$ is orthonormal, hence we can conclude both $ V,U$ are unitary, and we arrive to the expression $ A=U\Sigma V^*$. Owing to this decomposition those $ u_i$'s are called left-singular vector, $ v_i$'s are called right-singular vector and ``diagonal" elements (i.e., if $ \Sigma=(d_{ij})$, diagonal elements are $ d_{ii}$'s) are called singular values. These basically are all the motivation of the general result in my draft.
No comments:
Post a Comment