\( \newcommand{\N}{\mathbb{N}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\P}{\mathcal P} \newcommand{\B}{\mathcal B} \newcommand{\F}{\mathbb{F}} \newcommand{\E}{\mathcal E} \newcommand{\brac}[1]{\left(#1\right)} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\matrixx}[1]{\begin{bmatrix}#1\end {bmatrix}} \newcommand{\vmatrixx}[1]{\begin{vmatrix} #1\end{vmatrix}} \newcommand{\lims}{\mathop{\overline{\lim}}} \newcommand{\limi}{\mathop{\underline{\lim}}} \newcommand{\limn}{\lim_{n\to\infty}} \newcommand{\limsn}{\lims_{n\to\infty}} \newcommand{\limin}{\limi_{n\to\infty}} \newcommand{\nul}{\mathop{\mathrm{Nul}}} \newcommand{\col}{\mathop{\mathrm{Col}}} \newcommand{\rank}{\mathop{\mathrm{Rank}}} \newcommand{\dis}{\displaystyle} \newcommand{\spann}{\mathop{\mathrm{span}}} \newcommand{\range}{\mathop{\mathrm{range}}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\innerr}[1]{\left\langle #1 \right \rangle} \newcommand{\ol}[1]{\overline{#1}} \newcommand{\toto}{\rightrightarrows} \newcommand{\upto}{\nearrow} \newcommand{\downto}{\searrow} \newcommand{\qed}{\quad \blacksquare} \newcommand{\tr}{\mathop{\mathrm{tr}}} \newcommand{\bm}{\boldsymbol} \newcommand{\cupp}{\bigcup} \newcommand{\capp}{\bigcap} \newcommand{\sqcupp}{\bigsqcup} \newcommand{\re}{\mathop{\mathrm{Re}}} \newcommand{\im}{\mathop{\mathrm{Im}}} \newcommand{\comma}{\text{,}} \newcommand{\foot}{\text{。}} \)

Friday, May 16, 2014

Polar Decomposition of Matrices with an Application in Deriving SVD (to be added in my linear algebra notes)

In this post I want to record a standard result in linear algebra that I haven't paid attention before---the polar decomposition. An immediate consequence that I come up with is the SVD decompsition theorem, which I prove as two consequences, one is for square matrices and one is for general matrices.

In what follows we say that a matrix $A$ is positive if $x\mapsto x\cdot Ax$ is always nonnegative, and strictly positive if $x\mapsto x\cdot Ax$ forms an inner product.

Digression. When we say that an operator $T:H\to H$ on a complex Hilbert space is positive: $\inner{Tx,x}\ge 0$ for every $x$, it is necessarily (i.e., can be proved to be) self-adjoint! The key is that we can always recover the value $\inner{Tx,y}$ by just knowing what is $\inner{Tx,x}$ for every $x\in H$, while this may not happen in real Hilbert spaces.

Theorem (Polar Decomposition). Let $A$ be an $n\times n$ matrix over $\F=\R$ or $\C$, then there is a unitary (orthogonal when $\F=\R$) matrix $U$ and a positive matrix $P$ such that \[
A=UP,
\] where $P=\sqrt{A^*A}$, the unique positive square root of the positive matrix $A^*A$.


Proof. First we observe that \[
\|\sqrt{A^*A}x\|^2=\inner{\sqrt{A^*A}x, \sqrt{A^*A}x} = \inner{x,A^*Ax}=\inner{Ax,Ax}=\|Ax\|^2,
\] therefore $\|\sqrt{A^*A}x\|= \|Ax\|$. Now we define an isometry: \[
S_1:\range \sqrt{A^*A} \to \range A; \quad \sqrt{A^*A}x\mapsto Ax,
\] this map is automatically well-defined since it preserves norm. Namely, if $\sqrt{A^*A}x=\sqrt{A^*A}y$, then \[\|\sqrt{A^*A}(x-y)\|=\|A(x-y)\|=0,\] so $Ax=Ay$.

Now if we can extend $S_1$ to a unitary operator $S$ on $\F^n$, then we will have for every $x\in \F^n$, \[
S(\sqrt{A^*A}x)=S_1(\sqrt{A^*A}x)=Ax,
\] and we will be done. Note that since $S_1$ is injective, $\dim \range \sqrt{A^*A}=\dim \range S_1$ necessarily. Recall that extending unitary $T:U\to V$ (with $\dim U=\dim V$) "unitarily" to whole vector space is the same as defining unitary operator $T':U^\perp \to V^\perp$ with $T\oplus T'$ as an extension of $T$, which will be easily done in the next paragraph.

We let $\{u_1,\dots,u_k\}$ be an orthonormal basis of $(\range \sqrt{A^*A})^\perp$ and $\{v_1,\dots,v_k\}$ be that of $(\range S_1)^\perp$. We can also define \[
S_2: (\range \sqrt{A^*A})^\perp\to (\range S_1)^\perp;\quad a_1u_1+\cdots+a_ku_k\mapsto a_1v_1+\cdots+a_kv_k.
\] Finally we define $S=S_1\oplus S_2$. It is a direct verification that $S$ preserves length: for $u\in \text{dom } S_1$ and $v\in \text{dom } S_2$, we have \[
\|S(u+v)\|^2=\|Su+Sv\|^2 = \|S_1u+S_2v\|^2 = \|S_1u\|^2+\|S_2v\|^2,
\] since $u=\sqrt{A^*A}x$ for some $x\in \F^n$ and $\|S_2v\|=\|v\|$, we have \begin{align*}
\|S(u+v)\|^2 &= \|S_1(A^*Ax)\|^2+\|v\|^2\\
&=\|Ax\|^2 + \|v\|^2 \\
&= \|\sqrt{A^*A}x\|^2 +\|v\|^2\\
&=\|u\|^2+\|v\|^2 = \|u+v\|^2,
\end{align*}as desired. Since preserving length is the same as preserving inner product, thus $S$ must be unitary.$\qed$

Remark. A similar statement holds for every bounded linear operator $T:H\to H$ on a Hilbert space $H$, with unitary $U$ here replaced by partial isometry.

Consequence 1. For every $n\times n$ matrix $A$, the matrix $\sqrt{A^*A}$ is often called the positive part of $A$. Since it is analogues to decomposition of complex number that $z=e^{i\theta}|z|$, it is also common to denote $|A|=\sqrt{A^*A}$, therefore for every matrix $A$, there is a unitary matrix $U$ such that $A=U|A|$. With this notation, it can be easily checked that a matrix is normal if and only if $|A|=|A^*|$ (set $A=UP$, say).$\qed$

Consequence 2 (Singular Value Decomposition---SVD---of Square Matrices). The baby version of SVD can also be obtained from Polar Decomposition. Let $A$ be any $n\times n$ matrix over $\F$, then $A=S|A|$ for some unitary (orthogonal when $\F=\R$) matrix $S$. We may assume $A\neq 0$, since $|A|$ is positive definite, by Spectral Theorem there is a unitary matrix (orthogonal when $\F=\R$) $V$ and diagonal $\Sigma$ such that \[
A=S[V\Sigma V^*] = (SV) \Sigma V^* = U\Sigma V^*,
\] where $U=SV$ is unitary (orthogonal when $\F=\R$). If we further arrange columns of $V$ such that $\Sigma=\text{diag}(\sigma_1,\dots,\sigma_n)$ with $\sigma_1\ge \sigma_2\ge\cdots \ge \sigma_n\ge 0$, then the decomposition above is exactly the baby version of SVD (which applies for matrix of arbitrary dimension).$\qed$

Consequence 3 (SVD of General Matrices). The work below will be very messy. In order not to insert "(orthogonal when $\F=\R$)" everytime, let me use the term $\F$-unitary with obvious meaning.

Now suppose that $A$ is an $m\times n$ matrix, w.l.o.g. let's assume $m>n$ (otherwise consider $A^T$). Extend $A$ to a square by adjoining it an $m\times (m-n)$ 0 matrix $O$, then by consequence 2 there are $m\times m$ dimensional $\F$-unitary $U$ and $V$ such that \begin{equation}\label{mul here}
\begin{bmatrix} A&O\end{bmatrix}=U\Sigma V^*,
\end{equation} where $\Sigma=\text{diag}(\sigma_1,\dots,\sigma_m)$ with $\sigma_1\ge \sigma_2\ge \cdots \ge \sigma_m\ge 0$.

By our definition, for every $i>n$, \[
0 = U\Sigma V^* e_i=U\brac{\sum_{k=1}^m \sigma_k e_kv_k^*(e_i)},
\] since $U$ is invertible, we have \[
\sigma_k v_k^*(e_i)=0 \text{ for each }k=1,2,\dots, m\text{ and for each }i>n.
\]In particular, if $\rank A=r$, then $\sigma_{r+1}=0$ (and then $\sigma_i=0$ for every $i > r$), and we then have $v_k^*(e_i)=0$ for every $k=1,2,\dots,r$ and $i>n$. In other words, \begin{equation}\label{recall this}
v_1,\dots v_r\in \F^n\times \{0_{m-n}\}.
\end{equation}
Now multiplying both sides of (\ref{mul here}) by $\begin{bmatrix}I_{n\times n}\\ O\end{bmatrix}$, we have
\[
A = U\Sigma V^*_{m\times n},
\] where $V^*_{m\times n}$ is the left $m\times n$ part of the matrix $V^*$. Denote $D=\text{diag}(\sigma_1,\dots,\sigma_n)$---the upper left $n\times n$ part of $\Sigma$, then \[
A=U
\begin{bmatrix}
D&O_{n,m-n}\\
O_{m-n,n}&O_{m-n,m-n}
\end{bmatrix}  V^*_{m\times n}= U\begin{bmatrix}
DV_{n\times n}^*\\
O_{m-n,n}
\end{bmatrix}=U\begin{bmatrix}
D\\
O
\end{bmatrix} V_{n\times n}^*,
\] where $V_{n\times n}^*$ denotes the upper left $n\times n$ part of $V^*$.

We are almost there, except the fact that $V_{n\times n}^*$ is not necessarily $\F$-unitrary. But we can always rearrange $V$, if necessary, to achieve this. For this, let's study the term $DV_{n\times n}^*$.

Since $\sigma_i=0$ for $i>r$, we have  \[
DV_{n\times n}^* = \begin{bmatrix}
\sigma_1 v_1'^*\\
\vdots \\
\sigma_rv_r'^*\\
0\cdot v_{r+1}'^*\\
\vdots \\
0\cdot v_{n}'^*
\end{bmatrix},\]
where $v'$ means the first $n$ coordinates of $v\in \F^m$. We may replace $v_{r+1},\dots,v_n\in \F^m$ freely such that $v_1,\dots,v_n$ form an orthogonal basis of $\F^n\times \{0_{m-n}\}$ (recall (\ref{recall this})) without changing $DV_{n\times n}^*$, thus we may assume $V_{n\times n}^*$ itself is $\F$-unitary, and we are done.$\qed$

No comments:

Post a Comment