{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# TOPIC 1 \n", "\n", "# Least squares: Cholesky, QR and Householder\n", "\n", "## 2 A key concept: orthogonality\n", "\n", "***\n", "*Course:* [Math 535](http://www.math.wisc.edu/~roch/mmids/) - Mathematical Methods in Data Science (MMiDS) \n", "*Author:* [Sebastien Roch](http://www.math.wisc.edu/~roch/), Department of Mathematics, University of Wisconsin-Madison \n", "*Updated:* Jun 14, 2021 \n", "*Copyright:* © 2021 Sebastien Roch\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Orthogonality plays a key role in linear algebra for data science thanks to its computational properties and its connection to the least-squares problem.\n", "\n", "**Definition (Orthogonality):** Vectors $\\mathbf{u}$ and $\\mathbf{v}$ in $V$ are orthogonal if their inner product satisfies $\\langle \\mathbf{u}, \\mathbf{v} \\rangle = 0$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "![orthogonal vectors](https://miro.medium.com/max/936/0*XFgw4j6mfImcgCVB.png)\n", "\n", "[(Source)](https://towardsdatascience.com/from-norm-to-orthogonality-fundamental-mathematics-for-machine-learning-with-intuitive-examples-57bb898e69f2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Orthogonality has important implications. The following classical result will be useful below.\n", "\n", "***\n", "**Lemma (Pythagoras):** Let $\\mathbf{u}, \\mathbf{v} \\in V$ be orthogonal. Then\n", "$\\|\\mathbf{u} + \\mathbf{v}\\|^2 = \\|\\mathbf{u}\\|^2 + \\|\\mathbf{v}\\|^2$.\n", "***\n", "\n", "*Proof:* Using $\\|\\mathbf{w}\\|^2 = \\langle \\mathbf{w}, \\mathbf{w}\\rangle$, we get\n", "\n", "\n", "\\begin{align}\n", "\\|\\mathbf{u} + \\mathbf{v}\\|^2\n", "&= \\langle \\mathbf{u} + \\mathbf{v}, \\mathbf{u} + \\mathbf{v}\\rangle\\\\\n", "&= \\langle \\mathbf{u}, \\mathbf{u}\\rangle\n", "+ 2 \\,\\langle \\mathbf{u}, \\mathbf{v}\\rangle\n", "+ \\langle \\mathbf{v}, \\mathbf{v}\\rangle\\\\\n", "&= \\|\\mathbf{u}\\|^2 + \\|\\mathbf{v}\\|^2.\n", "\\end{align}\n", "\n", "\n", "$\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Here is an application of *Pythagoras*.\n", "\n", "***\n", "**Lemma (Cauchy-Schwarz):** For any $\\mathbf{u}, \\mathbf{v} \\in V$,\n", "$|\\langle \\mathbf{u}, \\mathbf{v}\\rangle| \\leq \\|\\mathbf{u}\\| \\|\\mathbf{v}\\|$.\n", "***\n", "\n", "*Proof:* Let $\\mathbf{q} = \\frac{\\mathbf{v}}{\\|\\mathbf{v}\\|}$ be the unit vector in the direction of $\\mathbf{v}$. We want to show $|\\langle \\mathbf{u}, \\mathbf{q}\\rangle| \\leq \\|\\mathbf{u}\\|$. Decompose $\\mathbf{u}$ into its projection onto $\\mathbf{q}$ and what's left: \n", "\n", "$$\n", "\\mathbf{u} \n", "= \\left\\langle \\mathbf{u}, \\mathbf{q}\\right\\rangle \\mathbf{q}\n", "+ \\left\\{\\mathbf{u} - \\left\\langle \\mathbf{u}, \\mathbf{q}\\right\\rangle \\mathbf{q}\\right\\}.\n", "$$\n", "\n", "The two terms on the right-hand side are orthogonal, so *Pythagoras* gives\n", "\n", "$$\n", "\\|\\mathbf{u}\\|^2\n", "= \\left\\|\\left\\langle \\mathbf{u}, \\mathbf{q}\\right\\rangle \\mathbf{q}\\right\\|^2\n", "+ \\left\\|\\mathbf{u} - \\left\\langle \\mathbf{u}, \\mathbf{q}\\right\\rangle \\mathbf{q}\\right\\|^2\n", "\\geq \\left\\|\\left\\langle \\mathbf{u}, \\mathbf{q}\\right\\rangle \\mathbf{q}\\right\\|^2\n", "= \\left\\langle \\mathbf{u}, \\mathbf{q}\\right\\rangle^2.\n", "$$\n", "\n", "Taking a square root gives the claim.$\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 2.1 Basis expansion\n", "\n", "To begin to see the power of orthogonality, consider the following. A list of vectors $\\{\\mathbf{u}_1,\\ldots,\\mathbf{u}_m\\}$ is orthonormal if the $\\mathbf{u}_i$'s are pairwise orthogonal and each has norm 1, that is, for all $i$ and all $j \\neq i$, $\\langle \\mathbf{u}_i, \\mathbf{u}_j \\rangle = 0$ and $\\|\\mathbf{u}_i\\| = 1$.\n", "\n", "***\n", "**Lemma (Properties of Orthonormal Lists):** Let $\\{\\mathbf{u}_1,\\ldots,\\mathbf{u}_m\\}$ be an orthonormal list of vectors. Then:\n", "1. $\\|\\sum_{j=1}^m \\alpha_j \\mathbf{u}_j\\|^2 = \\sum_{j=1}^m \\alpha_j^2$ for any $\\alpha_j \\in \\mathbb{R}$, $j \\in [m]$, and\n", "2. $\\{\\mathbf{u}_1,\\ldots,\\mathbf{u}_m\\}$ are linearly independent.\n", "***\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*Proof:* For 1, using that $\\|\\mathbf{x}\\|^2 = \\langle \\mathbf{x}, \\mathbf{x} \\rangle$ and $\\langle \\beta \\,\\mathbf{x}_1 + \\mathbf{x}_2, \\mathbf{x}_3 \\rangle = \\beta \\,\\langle \\mathbf{x}_1,\\mathbf{x}_3\\rangle + \\langle \\mathbf{x}_2,\\mathbf{x}_3\\rangle$ (which follow directly from the definition of the inner product), we have\n", "\n", "\n", "\\begin{align}\n", "\\left\\|\\sum_{j=1}^m \\alpha_j \\mathbf{u}_j\\right\\|^2\n", "&= \\left\\langle\n", "\\sum_{i=1}^m \\alpha_i \\mathbf{u}_i, \\sum_{j=1}^m \\alpha_j \\mathbf{u}_j\n", "\\right\\rangle\\\\\n", "&= \\sum_{i=1}^m \\alpha_i \\left\\langle\n", " \\mathbf{u}_i, \\sum_{j=1}^m \\alpha_j \\mathbf{u}_j\n", "\\right\\rangle\\\\\n", "&= \\sum_{i=1}^m \\sum_{j=1}^m \\alpha_i \\alpha_j \\left\\langle\n", " \\mathbf{u}_i, \\mathbf{u}_j\n", "\\right\\rangle\\\\\n", "&= \\sum_{i=1}^m \\alpha_i^2\n", "\\end{align}\n", "\n", "\n", "where we used orthonormality in the rightmost equation, that is, $\\langle\n", " \\mathbf{u}_i, \\mathbf{u}_j \\rangle$ is $1$ if $i=j$ and $0$ otherwise.\n", " \n", "For 2, suppose $\\sum_{i=1}^m \\beta_i \\mathbf{u}_i = \\mathbf{0}$, then we must have by 1 that $\\sum_{i=1}^m \\beta_i^2 = 0$. That implies $\\beta_i = 0$ for all $i$. Hence the $\\mathbf{u}_i$'s are linearly independent.$\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Given a basis $\\{\\mathbf{u}_1,\\ldots,\\mathbf{u}_m\\}$ of $\\mathcal{U}$, we know that: for any $\\mathbf{w} \\in \\mathcal{U}$, $\\mathbf{w} = \\sum_{i=1}^m \\alpha_i \\mathbf{u}_i$ for some $\\alpha_i$'s. It is not immediately obvious in general how to find the $\\alpha_i$'s. In the orthonormal case, however, it is straighforward.\n", "\n", "***\n", "**Theorem (Orthonormal Expansion):** Let $\\mathbf{q}_1,\\ldots,\\mathbf{q}_m$ be an orthonormal basis of $\\mathcal{U}$ and let $\\mathbf{u} \\in \\mathcal{U}$. Then\n", "\n", "$$\n", "\\mathbf{u}\n", "= \\sum_{j=1}^m \\langle \\mathbf{u}, \\mathbf{q}_j\\rangle \\,\\mathbf{q}_j.\n", "$$\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "*Proof:* Because $\\mathbf{u} \\in \\mathcal{U}$, $\\mathbf{u} = \\sum_{i=1}^m \\alpha_i \\mathbf{q}_i$ for some $\\alpha_i$. Take the inner product with $\\mathbf{q}_j$ and use orthonormality:\n", "\n", "$$\n", "\\langle \\mathbf{u}, \\mathbf{q}_j\\rangle \n", "= \\left\\langle \\sum_{i=1}^m \\alpha_i \\mathbf{q}_i, \\mathbf{q}_j\\right\\rangle \n", "= \\sum_{i=1}^m \\alpha_i \\langle \\mathbf{q}_i, \\mathbf{q}_j\\rangle \n", "= \\alpha_j.\n", "$$\n", "\n", "$\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "So we've shown that working with orthonormal bases is desirable. What if we don't have one? We review the [Gram-Schmidt algorithm](https://en.wikipedia.org/wiki/Gram–Schmidt_process) below, which will imply that every linear subspace has an orthonormal basis. \n", "\n", "But, first, we define the orthogonal projection. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 2.2 Orthogonal projection\n", "\n", "Let's consider the following problem. We have a linear subspace $\\mathcal{U} \\subseteq V$ and a vector $\\mathbf{v} \\notin \\mathcal{U}$. We want to find the vector $\\mathbf{v}^*$ in $\\mathcal{U}$ that is closest to $\\mathbf{v}$ in $2$-norm, that is, we want to solve\n", "\n", "$$\n", "\\min_{\\mathbf{v}^* \\in \\mathcal{U}} \\|\\mathbf{v}^* - \\mathbf{v}\\|.\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Example:** Consider the two-dimensional case with a one-dimensional subspace, say $\\mathcal{U} = \\mathrm{span}(\\mathbf{u_1})$ with $\\|\\mathbf{u}_1\\|=1$. The geometrical intuition is in the following figure. The solution $\\mathbf{v}^*$ has the property that the difference $\\mathbf{v} - \\mathbf{v}^*$ makes a right angle with $\\mathbf{u}_1$, that is, it is orthogonal to it. \n", "\n", "![Orthogonal projection](https://upload.wikimedia.org/wikipedia/commons/1/17/Linalg_projection_4.png)\n", "([Source](https://commons.wikimedia.org/wiki/File:Linalg_projection_4.png))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Letting $\\mathbf{v}^* = \\alpha^* \\,\\mathbf{u}_1$, the geometrical condition above translates into \n", "\n", "$$\n", "0\n", "= \\langle \\mathbf{u}_1, \\mathbf{v} - \\mathbf{v}^* \\rangle \n", "= \\langle \\mathbf{u}_1, \\mathbf{v} - \\alpha^* \\,\\mathbf{u}_1 \\rangle \n", "= \\langle \\mathbf{u}_1, \\mathbf{v} \\rangle \n", "- \\alpha^* \\,\\langle \\mathbf{u}_1, \\mathbf{u}_1 \\rangle \n", "= \\langle \\mathbf{u}_1, \\mathbf{v} \\rangle - \\alpha^*\n", "$$\n", "\n", "so\n", "\n", "$$\n", "\\mathbf{v}^* = \\langle \\mathbf{u}_1, \\mathbf{v} \\rangle \\,\\mathbf{u}_1.\n", "$$\n", "\n", "By *Pythagoras*, we then have for any $\\alpha \\in \\mathbb{R}$\n", "\n", "\n", "\\begin{align}\n", "\\|\\mathbf{v}\n", "- \\alpha \\,\\mathbf{u}_1\\|^2 \n", "&= \\|\\mathbf{v}- \\mathbf{v}^*\n", "+ \\mathbf{v}^* - \\alpha \\,\\mathbf{u}_1\\|^2\\\\\n", "&= \\|\\mathbf{v}- \\mathbf{v}^*\n", "+ (\\alpha^* - \\alpha) \\,\\mathbf{u}_1\\|^2\\\\\n", "&= \\|\\mathbf{v}- \\mathbf{v}^*\\|^2\n", "+ \\| (\\alpha^* - \\alpha) \\,\\mathbf{u}_1\\|^2\\\\\n", "&\\geq \\|\\mathbf{v}- \\mathbf{v}^*\\|^2.\n", "\\end{align}\n", "\n", "\n", "That confirms the optimality of $\\mathbf{v}^*$. The argument in this example carries through in higher dimension, as we show next. $\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Definition (Orthogonal Projection on an Orthonormal List):** Let $\\mathbf{q}_1,\\ldots,\\mathbf{q}_m$ be an orthonormal list. The orthogonal projection of $\\mathbf{v} \\in V$ on $\\{\\mathbf{q}_i\\}_{i=1}^m$ is defined as\n", "\n", "$$\n", "(\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v}\n", "= \\sum_{j=1}^m \\langle \\mathbf{v}, \\mathbf{q}_j \\rangle \\,\\mathbf{q}_j.\n", "$$\n", "\n", "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "***\n", "**Definition and Theorem (Orthogonal Projection):** Let $\\mathcal{U} \\subseteq V$ be a linear subspace with orthonormal basis $\\mathbf{q}_1,\\ldots,\\mathbf{q}_m$ and let $\\mathbf{v} \\in V$. Then $(\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v} \\in \\mathcal{U}$ and, for any $\\mathbf{u} \\in \\mathcal{U}$, \n", "\n", "$$\n", "(*) \\quad \\left\\langle \\mathbf{v} - (\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v}, \\mathbf{u}\\right\\rangle =0\n", "$$\n", "\n", "and\n", "\n", "$$\n", "(**) \\quad \\|\\mathbf{v} - (\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v}\\| \\leq \\|\\mathbf{v} - \\mathbf{u}\\|.\n", "$$\n", "\n", "Furthermore, if $\\mathbf{u} \\in \\mathcal{U}$ and the inequality above is an equality, then $\\mathbf{u} = (\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m}) \\mathbf{v}$. Hence, for any orthonormal basis $\\mathbf{q}'_1,\\ldots,\\mathbf{q}'_m$ of $\\mathcal{U}$, it holds that\n", "\n", "$$\n", "(*\\!*\\!*) \\quad\n", "\\mathcal{P}_\\mathcal{U} \\mathbf{v}\n", "= (\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v} \n", "= (\\mathcal{P}_{\\{\\mathbf{q}'_i\\}_{i=1}^m})\\, \\mathbf{v},\n", "$$\n", "\n", "where the first equality is a definition. We refer to $\\mathcal{P}_\\mathcal{U} \\mathbf{v}$ as the orthogonal projection of $\\mathbf{v}$ on $\\mathcal{U}$.\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![orthogonal projection 3D](https://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Ortho_projection.svg/800px-Ortho_projection.svg.png)\n", "\n", "[(Source)](https://commons.wikimedia.org/wiki/File:Ortho_projection.svg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*Proof:* By definition, it is immediate that $(\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v} \\in \\mathrm{span}(\\{\\mathbf{q}_i\\}_{i=1}^m) = \\mathcal{U}$. We first prove $(*)$. We can write any $\\mathbf{u} \\in \\mathcal{U}$ as\n", "$\\sum_{j=1}^m \\alpha_j \\mathbf{q}_j$ for some $\\alpha_j$'s. Then\n", "\n", "$$\n", "\\left\\langle \\mathbf{v} - \\sum_{j=1}^m \\langle \\mathbf{v}, \\mathbf{q}_j \\rangle \\,\\mathbf{q}_j, \\sum_{j=1}^m \\alpha_j' \\mathbf{q}_j \\right\\rangle\n", "= \\sum_{j=1}^m \\langle \\mathbf{v}, \\mathbf{q}_j \\rangle \\,\\alpha_j' \n", "- \\sum_{j=1}^m \\alpha_j' \\langle \\mathbf{v}, \\mathbf{q}_j \\rangle\n", "= 0\n", "$$\n", "\n", "where we used the orthonormality of the $\\mathbf{q}_j$'s in the leftmost equality. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To prove $(**)$, note that for any $\\mathbf{u} \\in \\mathcal{U}$ the vector $\\mathbf{u}' = (\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v} - \\mathbf{u}$ is also in $\\mathcal{U}$. Hence by $(*)$ and *Pythagoras*, \n", "\n", "\n", "\\begin{align}\n", "\\|\\mathbf{v} - \\mathbf{u}\\|^2\n", "&= \\|\\mathbf{v} - (\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v} + (\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v} - \\mathbf{u}\\|^2\\\\\n", "&= \\|\\mathbf{v} - (\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v}\\|^2 + \\|(\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v} - \\mathbf{u}\\|^2\\\\\n", "&\\geq \\|\\mathbf{v} - (\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v}\\|^2.\n", "\\end{align}\n", "\n", "\n", "Furthermore, equality holds only if $\\|(\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v} - \\mathbf{u}\\|^2 = 0$ which holds only if $\\mathbf{u} = (\\mathcal{P}_{\\{\\mathbf{q}_i\\}_{i=1}^m})\\, \\mathbf{v}$ by the point-separating property of the $2$-norm. The rightmost equality in $(*\\!*\\!*)$ follows from swapping $\\{\\mathbf{q}'_i\\}_{i=1}^m$ for $\\{\\mathbf{q}_i\\}_{i=1}^m$. $\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "The *Orthogonal Projection Theorem* implies that any $\\mathbf{v} \\in V$ can be decomposed into its orthonogonal projection onto $\\mathcal{U}$ and a vector orthogonal to it." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Definition (Orthogonal Complement):** Let $\\mathcal{U} \\subseteq V$ be a linear subspace. The orthogonal complement of $\\mathcal{U}$, denoted $\\mathcal{U}^\\perp$, is defined as\n", "\n", "$$\n", "\\mathcal{U}^\\perp\n", "=\n", "\\{\\mathbf{w} \\in V\\,:\\, \\langle \\mathbf{w}, \\mathbf{u}\\rangle = 0, \n", "\\forall \\mathbf{u} \\in \\mathcal{U}\\}.\n", "$$\n", "\n", "$\\lhd$\n", "\n", "*Exercise:* Establish that $\\mathcal{U}^\\perp$ is a linear subspace. $\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "***\n", "**Lemma (Orthogonal Decomposition):**\n", "Let $\\mathcal{U} \\subseteq V$ be a linear subspace and let $\\mathbf{v} \\in V$. Then $\\mathbf{v}$ can be decomposed as $(\\mathbf{v} - \\mathcal{P}_\\mathcal{U} \\mathbf{v}) + \\mathcal{P}_\\mathcal{U} \\mathbf{v}$ where $(\\mathbf{v} - \\mathcal{P}_\\mathcal{U} \\mathbf{v}) \\in \\mathcal{U}^\\perp$ and $\\mathcal{P}_\\mathcal{U} \\mathbf{v} \\in \\mathcal{U}$.\n", "***\n", "\n", "*Proof:* Immediate consequence of the previous theorem. $\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The map $\\mathcal{P}_\\mathcal{U}$ is linear, that is, $\\mathcal{P}_\\mathcal{U} (\\alpha \\,\\mathbf{x} + \\mathbf{y}) = \\alpha \\,\\mathcal{P}_\\mathcal{U} \\mathbf{x} + \\mathcal{P}_\\mathcal{U} \\mathbf{y}$ for all $\\alpha \\in \\mathbb{R}$ and $\\mathbf{x}, \\mathbf{y} \\in \\mathbb{R}^n$. Indeed,\n", "\n", "$$\n", "\\mathcal{P}_\\mathcal{U} (\\alpha \\,\\mathbf{x} + \\mathbf{y})\n", "= \\sum_{j=1}^m \\langle \\alpha \\,\\mathbf{x} + \\mathbf{y}, \\mathbf{q}_j \\rangle \\,\\mathbf{q}_j\n", "= \\sum_{j=1}^m \\left\\{\\alpha \\, \\langle \\mathbf{x}, \\mathbf{q}_j \\rangle \n", "+ \\langle \\mathbf{y}, \\mathbf{q}_j \\rangle\\right\\} \n", "\\mathbf{q}_j\n", "= \\alpha \\,\\mathcal{P}_\\mathcal{U} \\mathbf{x} + \\mathcal{P}_\\mathcal{U} \\mathbf{y}.\n", "$$\n", "\n", "Therefore it can be encoded as an $n \\times n$ matrix $P$. Let\n", "\n", "$$\n", "Q =\n", "\\begin{pmatrix}\n", "| & & | \\\\\n", "\\mathbf{q}_1 & \\ldots & \\mathbf{q}_m \\\\\n", "| & & | \n", "\\end{pmatrix}\n", "$$\n", "\n", "and note that computing\n", "\n", "$$\n", "Q^T \\mathbf{v}\n", "= \n", "\\begin{pmatrix}\n", "\\langle \\mathbf{v}, \\mathbf{q}_1 \\rangle \\\\\n", "\\cdots \\\\\n", "\\langle \\mathbf{v}, \\mathbf{q}_m \\rangle\n", "\\end{pmatrix}\n", "$$\n", "\n", "lists the coefficients in the expansion of $\\mathcal{P}_\\mathcal{U} \\mathbf{v}$ over the basis $\\mathbf{q}_1,\\ldots,\\mathbf{q}_m$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Hence we see that\n", "\n", "$$\n", "P\n", "= \n", "Q Q^T.\n", "$$\n", "\n", "This is not to be confused with\n", "\n", "$$\n", "Q^T Q\n", "=\n", "\\begin{pmatrix}\n", "\\langle \\mathbf{q}_1, \\mathbf{q}_1 \\rangle & \\cdots & \\langle \\mathbf{q}_1, \\mathbf{q}_m \\rangle \\\\\n", "\\langle \\mathbf{q}_2, \\mathbf{q}_1 \\rangle & \\cdots & \\langle \\mathbf{q}_2, \\mathbf{q}_m \\rangle \\\\\n", "\\vdots & \\ddots & \\vdots \\\\\n", "\\langle \\mathbf{q}_m, \\mathbf{q}_1 \\rangle & \\cdots & \\langle \\mathbf{q}_m, \\mathbf{q}_m \\rangle\n", "\\end{pmatrix}\n", "= I_{m \\times m}\n", "$$\n", "\n", "where $I_{m \\times m}$ denotes the $m \\times m$ identity matrix." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "*Exercise:* Let $\\mathcal{U} \\subseteq V$ be a linear subspace and let $\\mathbf{v} \\in \\mathcal{U}$. Show that $\\mathcal{P}_\\mathcal{U} \\mathbf{v} = \\mathbf{v}$. $\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 2.3 Gram-Schmidt\n", "\n", "We have some business left over: constructing orthonormal bases. Let $\\mathbf{a}_1,\\ldots,\\mathbf{a}_m$ be linearly independent. We use the Gram-Schmidt algorithm to obtain an orthonormal basis of $\\mathrm{span}(\\mathbf{a}_1,\\ldots,\\mathbf{a}_m)$. The process takes advantage of the properties of the orthogonal projection derived above. In essence we add the vectors $\\mathbf{a}_i$ one by one, but only after taking out their orthogonal projection on the previously included vectors. The outcome spans the same subspace and the *Orthogonal Decomposition Lemma* ensures orthogonality." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "***\n", "**Theorem (Gram-Schmidt):** Let $\\mathbf{a}_1,\\ldots,\\mathbf{a}_m$ be linearly independent. Then there exists an orthonormal basis $\\mathbf{q}_1,\\ldots,\\mathbf{q}_m$ of \n", "$\\mathrm{span}(\\mathbf{a}_1,\\ldots,\\mathbf{a}_m)$.\n", "***\n", "\n", "*Proof idea:* Suppose first that $m=1$. In that case, all that needs to be done is to divide $\\mathbf{a}_1$ by its norm to obtain a unit vector whose span is the same as $\\mathbf{a}_1$, that is, we set $\\mathbf{q}_1 = \\frac{\\mathbf{a}_1}{\\|\\mathbf{a}_1\\|}$.\n", "\n", "Suppose now that $m=2$. We first let $\\mathbf{q}_1 = \\frac{\\mathbf{a}_1}{\\|\\mathbf{a}_1\\|}$ as in the previous case. Then we subtract from $\\mathbf{a}_2$ its projection on $\\mathbf{q}_1$, that is, we set $\\mathbf{v}_2 = \\mathbf{a}_2 - \\langle \\mathbf{q}_1, \\mathbf{a}_2 \\rangle \\,\\mathbf{q}_1$. By the *Orthogonal Projection Theorem*, $\\mathbf{v}_2$ is orthogonal to $\\mathbf{q}_1$. Moreover, because $\\mathbf{a}_2$ is a linear combination of $\\mathbf{q}_1$ and $\\mathbf{v}_2$, we have $\\mathrm{span}(\\mathbf{q}_1,\\mathbf{v}_2)\n", "= \\mathrm{span}(\\mathbf{a}_1,\\mathbf{a}_2)$. It remains to divide by the norm of the resulting vector: $\\mathbf{q}_2 = \\frac{\\mathbf{v}_2}{\\|\\mathbf{v}_2\\|}$.\n", "\n", "For general $m$, we proceed similarly but project onto the subspace spanned by the previously added vectors at each step." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![GS](https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/Gram–Schmidt_process.svg/640px-Gram–Schmidt_process.svg.png)\n", "([Source](https://en.wikipedia.org/wiki/Gram–Schmidt_process))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*Proof:* The inductive step is the following. Assume that we have constructed orthonormal vectors $\\mathbf{q}_1,\\ldots,\\mathbf{q}_{j-1}$ such that\n", "\n", "$$\n", "U_{j-1} := \n", "\\mathrm{span}(\\mathbf{q}_1,\\ldots,\\mathbf{q}_{j-1}) = \n", "\\mathrm{span}(\\mathbf{a}_1,\\ldots,\\mathbf{a}_{j-1}).\n", "$$\n", "\n", "By the *Properties of Orthonormal Lists*, $\\{\\mathbf{q}\\}_{i=1}^{j-1}$ forms an orthonormal basis for $U_{j-1}$, so we can compute the orthogonal projection of $\\mathbf{a}_j$ on $U_{j-1}$ as\n", "\n", "$$\n", "\\mathcal{P}_{U_{j-1}}\\mathbf{a}_j\n", "= \\sum_{i=1}^{j-1} r_{ij} \\,\\mathbf{q}_i.\n", "$$\n", "\n", "where we defined $r_{ij} = \\langle \\mathbf{q}_i , \\mathbf{a}_j\\rangle$. And we set \n", "\n", "$$\n", "\\mathbf{v}_j = \\mathbf{a}_j - \\mathcal{P}_{U_{j-1}}\\mathbf{a}_j\n", "\\quad\n", "\\text{and}\n", "\\quad\n", "\\mathbf{q}_j \n", "= \\frac{\\mathbf{v}_j}{\\|\\mathbf{v}_j\\|}.\n", "$$\n", "\n", "Here we used that $\\|\\mathbf{v}_j\\| > 0$: indeed otherwise $\\mathbf{a}_j$\n", "would be equal to its projection $\\mathcal{P}_{U_{j-1}}\\mathbf{a}_j \\in \\mathrm{span}(\\mathbf{a}_1,\\ldots,\\mathbf{a}_{j-1})$ which would contradict linear independence of the $\\mathbf{a}_k$'s." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "By the *Orthogonal Projection Theorem*, $\\mathbf{q}_j$ is orthogonal to $\\mathrm{span}(\\mathbf{q}_1,\\ldots,\\mathbf{q}_{j-1})$ and, unrolling the calculations above,\n", "$\\mathbf{a}_j$ can be re-written as the following linear combination of $\\mathbf{q}_1,\\ldots,\\mathbf{q}_j$\n", "\n", "\n", "\\begin{align}\n", "\\mathbf{a}_j\n", "&= \\mathcal{P}_{U_{j-1}}\\mathbf{a}_j + \\mathbf{v}_j\\\\\n", "&= \\mathcal{P}_{U_{j-1}}\\mathbf{a}_j + \\|\\mathbf{v}_j\\| \\mathbf{q}_j\\\\\n", "&= \\mathcal{P}_{U_{j-1}}\\mathbf{a}_j + \\|\\mathbf{a}_j - \\mathcal{P}_{U_{j-1}}\\mathbf{a}_j\\| \\mathbf{q}_j\\\\\n", "&= \\sum_{i=1}^{j-1} r_{ij} \\,\\mathbf{q}_i\n", "+ \\left\\|\\mathbf{a}_j - \\sum_{i=1}^{j-1} r_{ij}\\,\\mathbf{q}_i\\right\\| \\,\\mathbf{q}_j\\\\\n", "&= \\sum_{i=1}^{j-1} r_{ij} \\,\\mathbf{q}_i\n", "+ r_{jj} \\,\\mathbf{q}_j,\n", "\\end{align}\n", "\n", "\n", "where we defined $r_{jj} = \\left\\|\\mathbf{a}_j - \\sum_{i=1}^{j-1} r_{ij}\\,\\mathbf{q}_i\\right\\| = \\|\\mathbf{v}_j\\|$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Hence $\\mathbf{q}_1,\\ldots,\\mathbf{q}_j$ forms an orthonormal list with $\\mathrm{span}(\\mathbf{a}_1,\\ldots,\\mathbf{a}_{j}) \\subseteq \\mathrm{span}(\\mathbf{q}_1,\\ldots,\\mathbf{q}_{j})$. The opposite inclusion holds by construction. Moreover, because $\\mathbf{q}_1,\\ldots,\\mathbf{q}_j$ are orthonormal, they are linearly independent by the *Properties of Orthonormal Lists* so must form a basis of their span. So induction goes through. $\\square$ " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**NUMERICAL CORNER** We implement the Gram-Schmidt algorithm in Julia. For reasons that will become clear in a future notebook, we output the $\\mathbf{q}_j$'s and $r_{ij}$'s, each in matrix form." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# Julia version: 1.5.1\n", "using LinearAlgebra" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "mmids_gramschmidt (generic function with 1 method)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function mmids_gramschmidt(A)\n", " n,m = size(A)\n", " Q = zeros(Float64,n,m)\n", " R = zeros(Float64,m,m)\n", " for j = 1:m\n", " v = A[:,j]\n", " for i = 1:j-1\n", " R[i,j] = dot(Q[:,i],A[:,j])\n", " v -= R[i,j]*Q[:,i]\n", " end\n", " R[j,j] = norm(v)\n", " Q[:,j] = v/R[j,j]\n", " end\n", " return Q,R\n", "end" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Let's try a simple example." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "3×2 Array{Int64,2}:\n", " 1 0\n", " 0 1\n", " 1 1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "w1, w2 = [1,0,1], [0,1,1]\n", "A = hcat(w1,w2)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "Q,R = mmids_gramschmidt(A);" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "3×2 Array{Float64,2}:\n", " 0.707107 -0.408248\n", " 0.0 0.816497\n", " 0.707107 0.408248" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Q" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "2×2 Array{Float64,2}:\n", " 1.41421 0.707107\n", " 0.0 1.22474" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "R" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "*Exercise:* Let $B \\in \\mathbb{R}^{n \\times m}$ be a matrix. Show that there exist orthonormal bases of $\\mathrm{col}(B)$ and $\\mathrm{null}(B)$. $\\lhd$\n", "\n", "*Exercise:* Let $\\mathcal{W}$ be a linear subspace of $\\mathbb{R}^d$ and let $\\mathbf{w}_1,\\ldots,\\mathbf{w}_k$ be an orthonormal basis of $\\mathcal{W}$. Show that there exists an orthonormal basis of $\\mathbb{R}^d$ that includes the $\\mathbf{w}_i$'s. $\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "***\n", "\n", "**Lemma (Nullity):** Let $B \\in \\mathbb{R}^{n \\times m}$ be a matrix of rank $= k$. Then \n", "$\\mathrm{dim}(\\mathrm{null}(B)) = n - k$. Put differently,\n", "\n", "$$\n", "\\mathrm{dim}(\\mathrm{col}(B))\n", "+\n", "\\mathrm{dim}(\\mathrm{null}(B))\n", "= n.\n", "$$\n", "\n", "***\n", "\n", "For a proof, see [Wikipedia](https://en.wikipedia.org/wiki/Rank%E2%80%93nullity_theorem#First_proof)." ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.5.1", "language": "julia", "name": "julia-1.5" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.5.1" } }, "nbformat": 4, "nbformat_minor": 2 }