\( \newcommand{\N}{\mathbb{N}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\P}{\mathcal P} \newcommand{\B}{\mathcal B} \newcommand{\F}{\mathbb{F}} \newcommand{\E}{\mathcal E} \newcommand{\brac}[1]{\left(#1\right)} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\matrixx}[1]{\begin{bmatrix}#1\end {bmatrix}} \newcommand{\vmatrixx}[1]{\begin{vmatrix} #1\end{vmatrix}} \newcommand{\lims}{\mathop{\overline{\lim}}} \newcommand{\limi}{\mathop{\underline{\lim}}} \newcommand{\limn}{\lim_{n\to\infty}} \newcommand{\limsn}{\lims_{n\to\infty}} \newcommand{\limin}{\limi_{n\to\infty}} \newcommand{\nul}{\mathop{\mathrm{Nul}}} \newcommand{\col}{\mathop{\mathrm{Col}}} \newcommand{\rank}{\mathop{\mathrm{Rank}}} \newcommand{\dis}{\displaystyle} \newcommand{\spann}{\mathop{\mathrm{span}}} \newcommand{\range}{\mathop{\mathrm{range}}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\innerr}[1]{\left\langle #1 \right \rangle} \newcommand{\ol}[1]{\overline{#1}} \newcommand{\toto}{\rightrightarrows} \newcommand{\upto}{\nearrow} \newcommand{\downto}{\searrow} \newcommand{\qed}{\quad \blacksquare} \newcommand{\tr}{\mathop{\mathrm{tr}}} \newcommand{\bm}{\boldsymbol} \newcommand{\cupp}{\bigcup} \newcommand{\capp}{\bigcap} \newcommand{\sqcupp}{\bigsqcup} \newcommand{\re}{\mathop{\mathrm{Re}}} \newcommand{\im}{\mathop{\mathrm{Im}}} \newcommand{\comma}{\text{,}} \newcommand{\foot}{\text{。}} \)

Tuesday, February 14, 2012

Material of presentation for SVD

Prepared a draft for my presentation:

http://ihome.ust.hk/~cclee/document/PRESENTSVD.pdf

The proof in the original text is too short and seemingly incomplete to me, so I modified the proof and filled in many details. Let me first describe terminology used in this notes, and let me for the moment describe everything in $  \mathbb R$ instead of $  \mathbb C$. Let $  \mathbb{R}^{m\times n}$ denote the collection of $  m\times n$ dimensional real matrices. For $  x\in \mathbb R^k$, write $  x = (x^1,x^2,\dots,x^k)$ (i.e., $  x^k$ is the $  k$th coordinate of $  x$), define $  \|x\|_2=\sqrt{\sum_{i=1}^k |x^i|^2}$.

We also define the $  (n-1)$-sphere to be  $  S^{n-1}:=\{x\in \mathbb R^n:(x^1)^2+\cdots+(x^n)^2=1\}$. For $  A\in \mathbb R^{m\times n}$ we define the operator norm
\[ \|A\|_2= \sup_{x\in \mathbb R^n, \|x\|_2=1}\|Ax\|_2.\]
We assume, and what we are trying to prove in my draft, that $  AS^{n-1}=\{Ax:x\in S^{n-1}\}$ is a ``hyperellipse" (or just an ellipse when $  n \leq 3$). Suppose also that $  m\ge n$ and $  A$ is of full rank, then it is legitimate to assume there are unit vectors $  u_1,\dots,u_n$ which point in the direction of semi-axises of the hyperellipse.

We can describe a general construction for $  u_i$'s: first find a vector in $  AS^{n-1}$ that has largest length, call it $  a\in \mathbb R^{m}$, then find another vector $  b$ in $  AS^{n-1} \cap (\mathbb R a)^\perp$ that has largest length, then find another such vector in $  AS^{n-1} \cap (\mathbb R a)^\perp\cap (\mathbb R b)^\perp$ and continue the process until we find $  n$ such vectors. This is indeed how we find semi-axises of an ellipse in $  \mathbb R^3$ and $  \mathbb R^2$.

Now in the way above, the length of semi-axis in the direction of $  u_i$, $  \sigma_i$, is in nonincreasing order, i.e., $  \sigma_1\ge \sigma_2\ge\cdots\ge \sigma_n>0$. Take the unit vector $  v_i \in A^{-1}(\sigma_i u_i)$, one gets \[A\underbrace{\left[\begin{array}{c|c|c}v_1&\cdots&v_n\end{array}\right]}_{:=V}=\left[\begin{array}{c|c|c}u_1&\cdots&u_n\end{array}\right]\begin{bmatrix}\sigma_1&& \\ &\ddots & \\ & & \sigma_n\end{bmatrix}.\]

Let $  u_{n+1},\dots, u_m$ be orthonormal basis in $  (\mathrm{span}_{\mathbb R}\{u_1,\dots,u_n\})^\perp$, then RHS of the above equation becomes
\[\underbrace{\left[\begin{array}{c|c|c}u_1&\cdots&u_m\end{array}\right]}_{:= U}\underbrace{\left[\begin{array}{ccc}\sigma_1&& \\ &\ddots & \\ & & \sigma_n \\ \hline &\mathcal O& \end{array}\right]}_{:=\Sigma},\]
where $  \mathcal O$ denotes a matrix with only 0 entries. One can rewrite the above as: $  AV = U\Sigma$. It will be proved in my draft that indeed $  \{v_i\}$ is orthonormal, hence we can conclude both $  V,U$ are unitary, and we arrive to the expression $  A=U\Sigma V^*$. Owing to this decomposition those $  u_i$'s are called left-singular vector, $  v_i$'s are called right-singular vector and ``diagonal" elements (i.e., if $  \Sigma=(d_{ij})$, diagonal elements are $  d_{ii}$'s) are called singular values. These basically are all the motivation of the general result in my draft.

No comments:

Post a Comment