\( \newcommand{\N}{\mathbb{N}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\P}{\mathcal P} \newcommand{\B}{\mathcal B} \newcommand{\F}{\mathbb{F}} \newcommand{\E}{\mathcal E} \newcommand{\brac}[1]{\left(#1\right)} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\matrixx}[1]{\begin{bmatrix}#1\end {bmatrix}} \newcommand{\vmatrixx}[1]{\begin{vmatrix} #1\end{vmatrix}} \newcommand{\lims}{\mathop{\overline{\lim}}} \newcommand{\limi}{\mathop{\underline{\lim}}} \newcommand{\limn}{\lim_{n\to\infty}} \newcommand{\limsn}{\lims_{n\to\infty}} \newcommand{\limin}{\limi_{n\to\infty}} \newcommand{\nul}{\mathop{\mathrm{Nul}}} \newcommand{\col}{\mathop{\mathrm{Col}}} \newcommand{\rank}{\mathop{\mathrm{Rank}}} \newcommand{\dis}{\displaystyle} \newcommand{\spann}{\mathop{\mathrm{span}}} \newcommand{\range}{\mathop{\mathrm{range}}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\innerr}[1]{\left\langle #1 \right \rangle} \newcommand{\ol}[1]{\overline{#1}} \newcommand{\toto}{\rightrightarrows} \newcommand{\upto}{\nearrow} \newcommand{\downto}{\searrow} \newcommand{\qed}{\quad \blacksquare} \newcommand{\tr}{\mathop{\mathrm{tr}}} \newcommand{\bm}{\boldsymbol} \newcommand{\cupp}{\bigcup} \newcommand{\capp}{\bigcap} \newcommand{\sqcupp}{\bigsqcup} \newcommand{\re}{\mathop{\mathrm{Re}}} \newcommand{\im}{\mathop{\mathrm{Im}}} \newcommand{\comma}{\text{,}} \newcommand{\foot}{\text{。}} \)

Saturday, November 16, 2013

The application of Lagrange multiplier's method with multiple constraints to linear algebra

In this post we record an application of a standard result in multivariable calculus (whose rigorous proof is still beyond the standard multivariable calculus course) to proving the following standard fact in linear algebra:
Theorem (Spectral). Let $A$ be an $n\times n$ real symmetric matrix, then $A$ is orthogonally diagonalizable. 
In other words, there is an orthonormal  basis $\{v_1,\dots,v_n\}$ such that each of $v_i$'s is an eigenvector. Througout the proof the following will be assumed:
Theorem (Lagrange's Multiplier). Let $n\ge m$, suppose that:
  • $G=(g_1,\dots,g_m):\R^n\to \R^m$ is $C^1$ on $\R^n$ and $G'(x_0)$ has full rank.
  •  $f$ is a real-valued function defined near $x_0$ and is differentiable at $x_0$.
  • $f$ has a  local extreme at $x_0$ under the constraint $G(x_0)=0$.
 Then there are $\lambda_1,\lambda_2,\dots,\lambda_m\in \R$ such that \begin{equation}\label{equality of lagrange}\nabla f(x_0)=\lambda_1\nabla g_1(x_0)+\lambda_2\nabla g_2(x_0)+\cdots +\lambda_m \nabla g_m(x_0).\end{equation}
A detailed proof of this multiplier method has been recorded in my lengthened version of Math3033 tutorial note 3.

Proof of Spectral Theorem. The maximum of \[
\psi (x) = x^TAx
\] w.r.t. the constraint $x^Tx=1$ can be attained since $S^{n-1}$ is compact. Now it is easy to show that \[\psi'(x)=2x^TA\quad \text{and}\quad \nabla(x^Tx)(x)=2x^T,
\] therefore the existence of the constrained extreme implies that there is $\lambda\in \R$ and a vector $v\in \R^n$ such that
  1. $v^Tv=1$.
  2. $\nabla \psi (v)=\lambda\nabla (x^Tx-1)(v)\iff 2v^TA=2\lambda v^T\iff Av=\lambda v$.
Therefore we have obtained our first eigenvalue and eigenvector.

Now in view of the statement of Spectral Theorem let's suppose that there are already $\lambda_1,\dots,\lambda_k\in \R$ and $v_1,\dots,v_k$ orthonormal such that $Av_i=\lambda_iv_i$ for each $i=1,\dots,k$. We try to find $\mu\in \R$ and $u\in S^{n-1}$ such that $Au=\mu u$ and $v_i\cdot u=0$ for all $1\leq i\leq k$. After that our proof is completed by induction. 

Following the same idea in the first paragraph of the proof we know that the constrained extreme problem with the following data can be attained: \[ x^TAx = \text{max},\quad x^Tx=1,\quad v_i^Tx=0,\forall i=1,\dots,k.
\] Therefore there are $\mu_1,\dots,\mu_k,\mu\in \R$ and a $u\in\R^n$ satisfying the constraints such that \[
\nabla  (x^TAx)(u)=  \mu \nabla(x^Tx-1)(u)+\sum_{i=1}^k \mu_i\nabla(v_i^Tx)(u),
\] on simplification we get \[
2u^TA=2 \mu u^T+\sum_{i=1}^k \mu_i v_i^T.
\] From our hypothesis in the last paragraph, by applying $v_i$'s on both sides above we have for $i=1,2,\dots,k$, \[
0=0+\mu_i \|v_i\|^2,
\] therefore $\mu_1=\mu_2=\cdots=\mu_k=0$, and we have \[
2u^TA=2\mu u^T\iff Au=\mu u.
\] Recall that $u$ satisfies the constraints $v_i\cdot u=0$ for $i=1,\dots,k$ and $u\cdot u=1$.$\qed$

No comments:

Post a Comment