{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# TOPIC 2 \n", "\n", "# Spectral and singular value decompositions\n", "\n", "## 6 Further observations\n", "\n", "***\n", "*Course:* [Math 535](http://www.math.wisc.edu/~roch/mmids/) - Mathematical Methods in Data Science (MMiDS) \n", "*Author:* [Sebastien Roch](http://www.math.wisc.edu/~roch/), Department of Mathematics, University of Wisconsin-Madison \n", "*Updated:* Sep 21, 2020 \n", "*Copyright:* © 2020 Sebastien Roch\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "We make further observations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 6.1 Condition numbers\n", "\n", "In this section we consider condition numbers, a measure of the sensitivity to perturbations of certain numerical problems. We look in particular at the conditioning of the least-squares problem." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### 6.1.1 Pseudoinverses\n", "\n", "We have seen that the least-squares problem provides a solution concept for overdetermined systems. This leads to the following generalization of the matrix inverse.\n", "\n", "**Definition (Pseudoinverse):** Let $A \\in \\mathbb{R}^{n \\times m}$ be a matrix with SVD $A = U \\Sigma V^T$ and singular values $\\sigma_1 \\geq \\cdots \\geq \\sigma_r > 0$. A pseudoinverse $A^+ \\in \\mathbb{R}^{m \\times n}$ is defined as\n", "\n", "$$\n", "A^+ = V \\Sigma^+ U^T\n", "$$\n", "\n", "where $\\Sigma^+$ is the diagonal matrix with diagonal entries $\\sigma_1^{-1}, \\ldots, \\sigma_r^{-1}$. $\\lhd$\n", "\n", "While it is not obvious from the definition (*Exercise:* Why? $\\lhd$), the pseudoinverse is in fact [unique](https://en.wikipedia.org/wiki/Proofs_involving_the_Moore–Penrose_inverse#Proof_of_uniqueness). Note further that\n", "\n", "$$\n", "A A^+ = U \\Sigma V^T V \\Sigma^+ U^T = U U^T\n", "$$\n", "\n", "and\n", "\n", "$$\n", "A^+ A = V \\Sigma^+ U^T U \\Sigma V^T = V V^T.\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Three important cases:\n", "\n", "1. In the square nonsingular case, both products are the identity and we necessarily have $A^+ = A^{-1}$ by the Existence of an Inverse Lemma. \n", "\n", "2. If $A$ has full column rank $m \\leq n$, then $r = m$ and $A^+ A = I_{m \\times m}$.\n", "\n", "3. If $A$ has full row rank $n \\leq m$, then $r = n$ and $A A^+ = I_{n \\times n}$.\n", "\n", "In the second case above, we recover our solution to the least-squares problem in the overdetermined case." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "***\n", "\n", "**Lemma (Pseudoinverse and least squares):** Let $A \\in \\mathbb{R}^{n \\times m}$ of full column rank $m$ with $m \\leq n$. Then\n", "\n", "$$\n", "A^+ = (A^T A)^{-1} A^T.\n", "$$\n", "\n", "In particular, the solution to the least-square problem\n", "\n", "$$\n", "\\min_{\\mathbf{x} \\in \\mathbb{R}^m} \\|A \\mathbf{x} - \\mathbf{b}\\|\n", "$$\n", "\n", "is $\\mathbf{x}^* = A^+ \\mathbf{b}$.\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*Proof idea:* We use the SVD definition and check that the two sides are the same.\n", "\n", "*Proof:* We first note that $A^T A$ is nonsingular. Indeed, if $A^T A \\mathbf{x} = \\mathbf{0}$, then $\\mathbf{x}^T A^T A \\mathbf{x} = \\|A\\mathbf{x}\\|^2 = 0$ so that $A \\mathbf{x} = \\mathbf{0}$ by the point-separating property of the vector norm. In turn, since the columns of $A$ are linearly independent by assumption, this implies $\\mathbf{x} = \\mathbf{0}$. Hence the columns of $A^T A$ are also linearly indepedent. \n", "\n", "Now let $A = U \\Sigma V^T$ be an SVD of $A$ (with positive singular values). We then note that\n", "\n", "$$\n", "(A^T A)^{-1} A^T \n", "= (V \\Sigma U^T U \\Sigma V^T)^{-1} V \\Sigma U^T\n", "= V \\Sigma^{-2} V^T V \\Sigma U^T\n", "= A^+\n", "$$\n", "\n", "as claimed. \n", "\n", "The second claim follows from the normal equations. $\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "**NUMERICAL CORNER** In Julia, the pseudoinverse of a matrix can be computed using the function [pinv](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.pinv)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "#Julia version: 1.5.1\n", "using Plots, LinearAlgebra" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "3×2 Array{Float64,2}:\n", " 1.5 1.3\n", " 1.2 1.9\n", " 2.1 0.8" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M = [1.5 1.3; 1.2 1.9; 2.1 0.8]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2×3 Array{Float64,2}:\n", " 0.0930539 -0.311014 0.587446\n", " 0.126271 0.629309 -0.449799" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Mp = pinv(M)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2×2 Array{Float64,2}:\n", " 1.0 -4.05256e-17\n", " 2.5981e-16 1.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Mp*M" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### 6.1.2 Conditioning of matrix-vector multiplication\n", "\n", "We define the condition number of a matrix and show that it captures some information about the sensitivity to perturbations of matrix-vector multiplications. First an exercise.\n", "\n", "*Exercise:* Let $A \\in \\mathbb{R}^{n \\times n}$ be nonsingular with SVD $A = U \\Sigma V^T$ where the singular values satisfy $\\sigma_1 \\geq \\cdots \\geq \\sigma_n > 0$. Show that\n", "\n", "$$\n", "\\min_{\\mathbf{x} \\neq \\mathbf{0}} \\frac{\\|A \\mathbf{x}\\|}{\\|\\mathbf{x}\\|}\n", "= \\min_{\\mathbf{y} \\neq \\mathbf{0}} \\frac{\\|\\mathbf{y}\\|}{\\|A^{-1}\\mathbf{y}\\|}\n", "= \\sigma_n\n", "= 1/\\|A^+\\|_2.\n", "$$\n", "\n", "$\\lhd$\n", "\n", "**Definition (Condition number of a matrix):** The condition number (in the induced $2$-norm) of a matrix $A \\in \\mathbb{R}^{n \\times m}$ is defined as\n", "\n", "$$\n", "\\kappa_2(A) = \\|A\\|_2 \\|A^+\\|_2.\n", "$$\n", "\n", "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "In the square nonsingular case, this reduces to \n", "\n", "$$\n", "\\kappa_2(A) = \\|A\\|_2 \\|A^{-1}\\|_2 = \\frac{\\sigma_1}{\\sigma_n}\n", "$$\n", "\n", "where we used the exercise above. In words, $\\kappa_2(A)$ is the ratio of the largest to the smallest stretching under $A$.\n", "\n", "***\n", "\n", "**Theorem (Relative conditioning of matrix-vector multiplication):** Let $A \\in \\mathbb{R}^{n \\times n}$ be nonsingular. Then, for any $\\mathbf{x} \\in \\mathbb{R}^n$,\n", "\n", "$$\n", "\\lim_{\\delta \\to 0} \\sup_{0 < \\|\\mathbf{d}\\| \\leq \\delta}\n", "\\frac{\\|A(\\mathbf{x}+\\mathbf{d}) - A \\mathbf{x}\\|/\\|A\\mathbf{x}\\|}\n", "{\\|\\mathbf{d}\\|/\\|\\mathbf{x}\\|} \n", "=\n", "\\max_{\\mathbf{d} \\neq \\mathbf{0}}\n", "\\frac{\\|A(\\mathbf{x}+\\mathbf{d}) - A \\mathbf{x}\\|/\\|A\\mathbf{x}\\|}\n", "{\\|\\mathbf{d}\\|/\\|\\mathbf{x}\\|} \n", "\\leq \\kappa_2(A)\n", "$$\n", "\n", "and the inequality is tight.\n", "\n", "***\n", "\n", "The ratio above measures the worst rate of relative change in $A \\mathbf{x}$ under infinitesimal perturbations of $\\mathbf{x}$. The theorem says that when $\\kappa_2(A)$ is large, a case referred to as ill-conditioning, large relative changes in $A \\mathbf{x}$ can be obtained from relatively small perturbations to $\\mathbf{x}$. In words, a matrix-vector product is potentially sensitive to perturbations when the matrix is ill-conditioned." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*Proof:* Write\n", "\n", "$$\n", "\\frac{\\|A(\\mathbf{x}+\\mathbf{d}) - A \\mathbf{x}\\|/\\|A\\mathbf{x}\\|}\n", "{\\|\\mathbf{d}\\|/\\|\\mathbf{x}\\|} \n", "= \\frac{\\|A \\mathbf{d}\\|/\\|A\\mathbf{x}\\|}\n", "{\\|\\mathbf{d}\\|/\\|\\mathbf{x}\\|} \n", "= \\frac{\\|A (\\mathbf{d}/\\|\\mathbf{d}\\|)\\|}{\\|A(\\mathbf{x}/\\|\\mathbf{x}\\|)\\|}\n", "\\leq \\frac{\\sigma_1}{\\sigma_n}\n", "$$\n", "\n", "where we used the exercise above.\n", "\n", "In particular, we see that the ratio can achieve its maximum by taking $\\mathbf{d}$ and $\\mathbf{x}$ to be the right singular vectors corresponding to $\\sigma_1$ and $\\sigma_n$ respectively.$\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we apply the theorem to the inverse instead, we get that the relative conditioning of the nonsingular linear system $A \\mathbf{x} = \\mathbf{b}$ to perturbations in $\\mathbf{b}$ is also $\\kappa_2(A)$. The latter can be large in particular when the columns of $A$ are close to linearly dependent." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "**NUMERICAL CORNER** In Julia, the condition number of a matrix can be computed using the function [cond](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.cond)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "For example, orthogonal matrices have condition number $1$:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2×2 Array{Float64,2}:\n", " 0.707107 0.707107\n", " 0.707107 -0.707107" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Q = [1/sqrt(2) 1/sqrt(2); 1/sqrt(2) -1/sqrt(2)]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1.0000000000000002" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cond(Q)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "And matrices with nearly linearly dependent columns have large condition numbers:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2×2 Array{Float64,2}:\n", " 0.707107 0.707107\n", " 0.707107 0.707108" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "eps = 1e-6\n", "A = [1/sqrt(2) 1/sqrt(2); 1/sqrt(2) 1/sqrt(2)+eps]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2.8284291245366517e6" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cond(A)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Let's look at the SVD of $A$." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "SVD{Float64,Float64,Array{Float64,2}}\n", "U factor:\n", "2×2 Array{Float64,2}:\n", " -0.707107 -0.707107\n", " -0.707107 0.707107\n", "singular values:\n", "2-element Array{Float64,1}:\n", " 1.4142140623732717\n", " 4.999998232605338e-7\n", "Vt factor:\n", "2×2 Array{Float64,2}:\n", " -0.707107 -0.707107\n", " -0.707107 0.707107" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F = svd(A)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We compute the solution to $A \\mathbf{x} = \\mathbf{b}$ when $\\mathbf{b}$ is the right singular vector corresponding to the largest singular value." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2-element Array{Float64,1}:\n", " -0.7071065311865034\n", " -0.7071070311865033" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = F.V[:,1]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2-element Array{Float64,1}:\n", " -0.4999996465020582\n", " -0.49999999994448885" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = A\\b" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We make a small perturbation in the direction of the second right singular vector." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2-element Array{Float64,1}:\n", " -0.7071072382935345\n", " -0.7071063240799721" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "delta = 1e-6\n", "bp = b + delta*F.V[2,:]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The relative change in solution is:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2-element Array{Float64,1}:\n", " -1.9142142088291174\n", " 0.9142135623822168" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xp = A\\bp" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2.828429124665918e6" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(norm(x-xp)/norm(x))/(norm(b-bp)/norm(b))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Note that this is exactly the condition number of $A$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### 6.1.3 Back to the least-squares problem [optional]\n", "\n", "We return to the least-squares problem\n", "\n", "$$\n", "\\min_{\\mathbf{x} \\in \\mathbb{R}^m} \\|A \\mathbf{x} - \\mathbf{b}\\|\n", "$$\n", "\n", "where\n", "\n", "$$\n", "A =\n", "\\begin{pmatrix}\n", "| & & | \\\\\n", "\\mathbf{a}_1 & \\ldots & \\mathbf{a}_m \\\\\n", "| & & | \n", "\\end{pmatrix}\n", "\\quad\n", "\\text{and}\n", "\\quad\n", "\\mathbf{b}\n", "=\n", "\\begin{pmatrix}\n", "b_1 \\\\\n", "\\vdots \\\\\n", "b_n\n", "\\end{pmatrix}.\n", "$$\n", "\n", "We showed that the solution satisfies the normal equations\n", "\n", "$$\n", "A^T A \\mathbf{x} = A^T \\mathbf{b}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "As we show next, the condition number of $A^T A$ can be much larger than that of $A$ itself.\n", "\n", "***\n", "\n", "**Lemma (Condition number of $A^T A$):** Let $A \\in \\mathbb{R}^{n \\times m}$ have full column rank. The \n", "\n", "$$\n", "\\kappa_2(A^T A) = \\kappa_2(A)^2.\n", "$$\n", "\n", "***\n", "\n", "*Proof idea:* We use the SVD." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*Proof:* Let $A = U \\Sigma V^T$ be an SVD of $A$ with singular values\n", "$\\sigma_1 \\geq \\cdots \\geq \\sigma_m > 0$. Then\n", "\n", "$$\n", "A^T A \n", "= V \\Sigma U^T U \\Sigma V^T\n", "= V \\Sigma^2 V^T.\n", "$$\n", "\n", "In particular the latter expression is an SVD of $A^T A$, and hence the condition number of $A^T A$ is\n", "\n", "$$\n", "\\kappa_2(A^T A) \n", "= \\frac{\\sigma_1^2}{\\sigma_m^2}\n", "= \\kappa_2(A)^2.\n", "$$\n", "\n", "$\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "**NUMERICAL CORNER** We give a quick example." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "5×2 Array{Float64,2}:\n", " 1.0 101.0\n", " 1.0 102.0\n", " 1.0 103.0\n", " 1.0 104.0\n", " 1.0 105.0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = [1. 101.; 1. 102.; 1. 103.; 1. 104.; 1. 105]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "7503.817028686117" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cond(A)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "5.630727000263108e7" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cond(A'*A)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "This observation - and the resulting increased numerical instability - is one of the reasons we previously developed an alternative approach to the least-squares problem. Quoting [Sol, Section 5.1]:\n", "\n", "> Intuitively, a primary reason that $\\mathrm{cond}(A^T A)$ can be large is that columns of $A$ might\n", "look “similar” [...] If two columns $\\mathbf{a}_i$ and $\\mathbf{a}_j$ satisfy $\\mathbf{a}_i \\approx \\mathbf{a}_j$, then the least-squares residual length $\\|\\mathbf{b} − A \\mathbf{x}\\|_2$ will not suffer much if we replace multiples of $\\mathbf{a}_i$ with multiples of $\\mathbf{a}_j$ or vice versa. This wide range of nearly—but not completely—equivalent solutions yields poor conditioning. [...] To solve such poorly conditioned problems, we will employ an alternative technique with closer attention to the column space of $A$ rather than employing row operations as in Gaussian elimination. This strategy identifies and deals with such near-dependencies explicitly, bringing about greater numerical stability." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We quote without proof a theorem from [[Ste](https://epubs.siam.org/doi/book/10.1137/1.9781611971408), Theorem 4.2.7] which provides further light on this issue.\n", "\n", "***\n", "\n", "**Theorem (Accuracy of computed least-squares solutions):** Let $\\mathbf{x}^*$ be the solution of the least-squares problem $\n", "\\min_{\\mathbf{x} \\in \\mathbb{R}^m} \\|A \\mathbf{x} - \\mathbf{b}\\|$. Let $\\mathbf{x}_{\\mathrm{NE}}$ be the solution obtained by forming and solving the normal equations in [floating-point arithmetic](https://en.wikipedia.org/wiki/Floating-point_arithmetic) with rounding unit $\\epsilon_M$. Then $\\mathbf{x}_{\\mathrm{NE}}$ satisfies\n", "\n", "$$\n", "\\frac{\\|\\mathbf{x}_{\\mathrm{NE}} - \\mathbf{x}^*\\|}{\\|\\mathbf{x}^*\\|}\n", "\\leq \\gamma_{\\mathrm{NE}} \\kappa_2^2(A) \n", "\\left(\n", "1 + \\frac{\\|\\mathbf{b}\\|}{\\|A\\|_2 \\|\\mathbf{x}^*\\|}\n", "\\right) \\epsilon_M.\n", "$$\n", "\n", "Let $\\mathbf{x}_{\\mathrm{QR}}$ be the solution obtained from a QR factorization in the same arithmetic. Then\n", "\n", "$$\n", "\\frac{\\|\\mathbf{x}_{\\mathrm{QR}} - \\mathbf{x}^*\\|}{\\|\\mathbf{x}^*\\|}\n", "\\leq 2 \\gamma_{\\mathrm{QR}} \\kappa_2(A) \\epsilon_M\n", "+\n", "\\gamma_{\\mathrm{NE}} \\kappa_2^2(A) \n", "\\frac{\\|\\mathbf{r}^*\\|}{\\|A\\|_2 \\|\\mathbf{x}^*\\|}\n", "\\epsilon_M\n", "$$\n", "\n", "where $\\mathbf{r}^* = \\mathbf{b} - A \\mathbf{x}^*$ is the residual vector. \n", "The constants $\\gamma$ are slowly growing functions of the dimensions of the problem.\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To explain, let's quote [[Ste](https://epubs.siam.org/doi/book/10.1137/1.9781611971408), Section 4.2.3] again:\n", "\n", "> The perturbation theory for the normal equations shows that $\\kappa_2^2(A)$ controls the size of the errors we can expect. The bound for the solution computed from the QR equation also has a term multiplied by $\\kappa_2^2(A)$, but this term is also multiplied by the scaled residual, which can diminish its effect. However, in many applications the vector $\\mathbf{b}$ is contaminated with error, and the residual can, in general, be no smaller than the size of that error." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "**NUMERICAL CORNER** Here is a numerical example taken from [[TB](https://books.google.com/books/about/Numerical_Linear_Algebra.html?id=JaPtxOytY7kC), Lecture 19]. We will approximate the following function with a polynomial." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n = 100 \n", "t = (0:n-1)/(n-1)\n", "b = exp.(sin.(4*t))\n", "plot(t, b, legend=false, lw=2)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "m = 17\n", "A = [t[i].^(j-1) for i=1:n, j=1:m];" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "The condition numbers of $A$ and $A^T A$ are both high in this case." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cond(A) = 7.558243605585787e11\n", "cond(A' * A) = 6.812930320935918e17\n" ] } ], "source": [ "@show cond(A)\n", "@show cond(A'*A);" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "We first use the normal equations and plot the residual vector." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(b - A * xNE) = 0.001744072670843444\n" ] }, { "data": { "image/svg+xml": [ "\n", "\n" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xNE = (A'*A)\\(A'*b)\n", "@show norm(b-A*xNE)\n", "plot(t, b-A*xNE, label=\"NE\", lw=2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "We then use the [qr](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.qr) function of Julia to compute the QR solution instead. Note that [Matrix](https://docs.julialang.org/en/v1/base/arrays/#Base.Matrix) is used to transform the factors obtained from qr into regular arrays." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(b - A * xQR) = 7.359747057852724e-6\n" ] }, { "data": { "image/svg+xml": [ "\n", "\n" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F = qr(A)\n", "Q, R = Matrix(F.Q), Matrix(F.R)\n", "xQR = R\\(Q'*b)\n", "@show norm(b-A*xQR) \n", "plot!(t, b-A*xQR, label=\"QR\", lw=2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### 6.2 Proof of Spectral theorem [optional]\n", "\n", "When $A$ is symmetric, that is, $a_{ij} = a_{ji}$ for all $i,j$, a remarkable result is that $A$ is similar to a diagonal matrix by an orthogonal transformation. Put differently, there exits an orthonormal basis of $\\mathbb{R}^d$ made of eigenvectors of $A$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "***\n", "\n", "**Theorem (Spectral Theorem):** Let $A \\in \\mathbb{R}^{d \\times d}$ be a symmetric matrix, that is, $A^T = A$. Then $A$ has $d$ orthonormal eigenvectors $\\mathbf{q}_1, \\ldots, \\mathbf{q}_d$ with corresponding (not necessarily distinct) real eigenvalues $\\lambda_1 \\geq \\lambda_2 \\geq \\cdots \\geq \\lambda_d$. In matrix form, this is written as the matrix factorization\n", "\n", "$$\n", "A \n", "= Q \\Lambda Q^T\n", "= \\sum_{i=1}^d \\lambda_i \\mathbf{q}_i \\mathbf{q}_i^T\n", "$$\n", "\n", "where $Q$ has columns $\\mathbf{q}_1, \\ldots, \\mathbf{q}_d$ and $\\Lambda = \\mathrm{diag}(\\lambda_1, \\ldots, \\lambda_d)$. We refer to this factorization as a spectral decomposition of $A$.\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "One might hope that the SVD of a symmetric matrix would produce identical left and right singular vectors, thereby providing the desired eigendecomposition. However that is not the case.\n", "\n", "*Exercise:* Compute the SVD of $A = [-1]$. $\\lhd$\n", "\n", "However, the same roadmap that led to the SVD can be made to produce an eigendecomposition.\n", "\n", "*Exercise:* Consider the block matrices\n", "\n", "$$\n", "\\begin{pmatrix}\n", "\\mathbf{y}\\\\\n", "\\mathbf{z}\n", "\\end{pmatrix},\n", "\\quad \n", "\\quad\n", "\\begin{pmatrix}\n", "X_{11} & X_{12}\\\\\n", "X_{21} & X_{22}\n", "\\end{pmatrix}\n", "\\quad\n", "\\text{and}\n", "\\quad\n", "\\begin{pmatrix}\n", "A & B\\\\\n", "C & D\n", "\\end{pmatrix}\n", "$$\n", "\n", "where $\\mathbf{y}\\in\\mathbb{R}^{d_1}$, $\\mathbf{z}\\in\\mathbb{R}^{d_2}$, $X_{11}, A \\in \\mathbb{R}^{d_1 \\times d_1}$, $X_{12}, B \\in \\mathbb{R}^{d_1 \\times d_2}$, $X_{21}, C \\in \\mathbb{R}^{d_2 \\times d_1}$, and $X_{22}, D \\in \\mathbb{R}^{d_2 \\times d_2}$. Show that\n", "\n", "$$\n", "\\begin{pmatrix}\n", "\\mathbf{y}\\\\\n", "\\mathbf{z}\n", "\\end{pmatrix}^T\n", "\\begin{pmatrix}\n", "A & B\\\\\n", "C & D\n", "\\end{pmatrix}\n", "\\begin{pmatrix}\n", "\\mathbf{y}\\\\\n", "\\mathbf{z}\n", "\\end{pmatrix}\n", "= \n", "\\mathbf{y}^T A \\mathbf{y}\n", "+ \\mathbf{y}^T B \\mathbf{z}\n", "+ \\mathbf{z}^T C \\mathbf{y}\n", "+ \\mathbf{z}^T D \\mathbf{z}.\n", "$$\n", "\n", "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "*Proof idea (Spectral Theorem):* Use a greedy sequence, as in the SVD derivation, this time maximizing the [quadratic form](https://en.wikipedia.org/wiki/Quadratic_form#Real_quadratic_forms) $\\langle \\mathbf{v}, A \\mathbf{v}\\rangle$. How is this quadratic form is related to eigenvalues? Note that, for a unit eigenvector $\\mathbf{v}$ with eigenvalue $\\lambda$, we have $\\langle \\mathbf{v}, A \\mathbf{v}\\rangle = \\langle \\mathbf{v}, \\lambda \\mathbf{v}\\rangle = \\lambda$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "*Proof (Spectral Theorem):* We proceed by induction.\n", "\n", "***A first eigenvector:*** Let $A_1 = A$ and\n", "\n", "$$\n", "\\mathbf{v}_1 \\in \\arg\\max\\{\\langle \\mathbf{v}, A_1 \\mathbf{v}\\rangle:\\|\\mathbf{v}\\| = 1\\}\n", "$$\n", "\n", "and\n", "\n", "$$\n", "\\lambda_1 = \\max\\{\\langle \\mathbf{v}, A_1 \\mathbf{v}\\rangle:\\|\\mathbf{v}\\| = 1\\}.\n", "$$\n", "\n", "Complete $\\mathbf{v}_1$ into an orthonormal basis of $\\mathbb{R}^d$, $\\mathbf{v}_1$, $\\hat{\\mathbf{v}}_2, \\ldots, \\hat{\\mathbf{v}}_d$, and form the block matrix\n", "\n", "$$\n", "\\hat{W}_1 \n", "=\n", "\\begin{pmatrix}\n", "\\mathbf{v}_1 & \\hat{V}_1\n", "\\end{pmatrix}\n", "$$\n", "\n", "where the columns of $\\hat{V}_1$ are $\\hat{\\mathbf{v}}_2, \\ldots, \\hat{\\mathbf{v}}_d$. Note that $\\hat{W}_1$ is orthogonal by construction. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "***Getting one step closer to diagonalization:*** We show next that $W_1$ gets us one step closer to a diagonal matrix by similarity transformation. Note first that\n", "\n", "$$\n", "\\hat{W}_1 A_1 \\hat{W}_1^T\n", "= \n", "\\begin{pmatrix}\n", "\\lambda_1 & \\mathbf{w}_1^T \\\\\n", "\\mathbf{w}_1 & A_2\n", "\\end{pmatrix}\n", "$$\n", "\n", "where $\\mathbf{w}_1 = \\hat{V}_1^T A_1 \\mathbf{v}_1$ and $A_2 = \\hat{V}_1^T A_1 \\hat{V}_1$. The key claim is that $\\mathbf{w}_1 = \\mathbf{0}$. This follows from a contradiction argument. Indeed, suppose $\\mathbf{w}_1 \\neq \\mathbf{0}$ and consider the unit (Exercise: Why?) vector\n", "\n", "$$\n", "\\mathbf{z}\n", "=\n", "\\hat{W}_1^T \n", "\\times\n", "\\frac{1}{\\sqrt{1 + \\delta^2 \\|\\mathbf{w}_1\\|^2}} \\begin{pmatrix}\n", "1 \\\\\n", "\\delta \\mathbf{w}_1\n", "\\end{pmatrix}\n", "$$\n", "\n", "which, by the exercise above, achieves objective value\n", "\n", "$$\n", "\\mathbf{z}^T A_1 \\mathbf{z}\n", "=\n", "\\frac{1}{1 + \\delta^2 \\|\\mathbf{w}_1\\|^2}\n", "\\begin{pmatrix}\n", "1 \\\\\n", "\\delta \\mathbf{w}_1\n", "\\end{pmatrix}^T\n", "\\begin{pmatrix}\n", "\\lambda_1 & \\mathbf{w}_1^T \\\\\n", "\\mathbf{w}_1 & A_2\n", "\\end{pmatrix}\n", "\\begin{pmatrix}\n", "1 \\\\\n", "\\delta \\mathbf{w}_1\n", "\\end{pmatrix}\n", "= \n", "\\frac{1}{1 + \\delta^2 \\|\\mathbf{w}_1\\|^2}\n", "\\left(\n", "\\lambda_1\n", "+ 2 \\delta \\|\\mathbf{w}_1\\|^2\n", "+ \\delta^2 \\mathbf{w}_1^T A_2 \\mathbf{w}_1\n", "\\right).\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "In the proof of the *Left Singular Vectors are Orthogonal Lemma*, we showed that for $\\epsilon \\in (0,1)$\n", "\n", "$$\n", "\\frac{1}{\\sqrt{1 + \\epsilon^2}} \\geq 1 - \\epsilon^2/2.\n", "$$\n", "\n", "So for $\\delta$ small enough\n", "\n", "\n", "\\begin{align}\n", "\\mathbf{z}^T A_1 \\mathbf{z}\n", "&\\geq\n", "(\\lambda_1\n", "+ 2 \\delta \\|\\mathbf{w}_1\\|^2\n", "+ \\delta^2 \\mathbf{w}_1^T A_2 \\mathbf{w}_1)\n", "(1 - \\delta^2 \\|\\mathbf{w}_1\\|^2/2)\\\\\n", "&\\geq \\lambda_1 \n", "+ 2 \\delta \\|\\mathbf{w}_1\\|^2\n", "+ C \\delta^2\\\\ \n", "&> \\lambda_1\n", "\\end{align}\n", "\n", "\n", "where $C \\in \\mathbb{R}$ depends on $\\mathbf{w}_1$ and $A_2$. That gives the desired contradiction. So, letting $W_1 = \\hat{W}_1$,\n", "\n", "$$\n", "W_1 A_1 W_1^T\n", "= \n", "\\begin{pmatrix}\n", "\\lambda_1 & \\mathbf{0} \\\\\n", "\\mathbf{0} & A_2\n", "\\end{pmatrix}.\n", "$$\n", "\n", "Finally note that $A_2 = \\hat{V}_1^T A_1 \\hat{V}_1$ is symmetric\n", "\n", "$$\n", "A_2^T = (\\hat{V}_1^T A_1 \\hat{V}_1)^T = \\hat{V}_1^T A_1^T \\hat{V}_1 = \\hat{V}_1^T A_1 \\hat{V}_1 = A_2\n", "$$\n", "\n", "by the symmetry of $A_1$ itself." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "***Next step of the induction:*** Apply the same argument to the symmetric matrix $A_2 \\in \\mathbb{R}^{(d-1)\\times (d-1)}$, let $\\hat{W}_2 \\in \\mathbb{R}^{(d-1)\\times (d-1)}$ be the corresponding orthogonal matrix, and define $\\lambda_2$ and $A_3$ through the equation\n", "\n", "$$\n", "\\hat{W}_2 A_2 \\hat{W}_2^T\n", "= \n", "\\begin{pmatrix}\n", "\\lambda_2 & \\mathbf{0} \\\\\n", "\\mathbf{0} & A_3\n", "\\end{pmatrix}.\n", "$$\n", "\n", "Define the block matrix\n", "\n", "$$\n", "W_2\n", "=\n", "\\begin{pmatrix}\n", "1 & \\mathbf{0}\\\\\n", "\\mathbf{0} & \\hat{W}_2\n", "\\end{pmatrix}\n", "$$\n", "\n", "and observe that\n", "\n", "$$\n", "W_2 W_1 A_1 W_1^T W_2^T\n", "= W_2 \\begin{pmatrix}\n", "\\lambda_1 & \\mathbf{0} \\\\\n", "\\mathbf{0} & A_2\n", "\\end{pmatrix}\n", "W_2^T\n", "=\n", "\\begin{pmatrix}\n", "\\lambda_1 & \\mathbf{0}\\\\\n", "\\mathbf{0} & \\hat{W}_2 A_2 \\hat{W}_2^T\n", "\\end{pmatrix}\n", "=\\begin{pmatrix}\n", "\\lambda_1 & 0 & \\mathbf{0} \\\\\n", "0 & \\lambda_2 & \\mathbf{0} \\\\\n", "\\mathbf{0} & \\mathbf{0} & A_3\n", "\\end{pmatrix}.\n", "$$\n", "\n", "Proceeding similarly by induction gives the claim. $\\square$" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.5.1", "language": "julia", "name": "julia-1.5" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.5.1" } }, "nbformat": 4, "nbformat_minor": 2 }