*Course:* Math 535 - Mathematical Methods in Data Science (MMiDS)

*Author:* Sebastien Roch, Department of Mathematics, University of Wisconsin-Madison

*Updated:* Sep 13, 2020

*Copyright:* © 2020 Sebastien Roch

In this section, we introduce some basic linear algebra concepts that will be needed throughout this topic. For more background, see [MML].

Throughout, we work over $V = \mathbb{R}^n$. We will use the notation $[m] = \{1,\ldots,m\}$.

We begin with the concept of a linear subspace.

**Definition (Linear Subspace):** A linear subspace of $V$ is a subset $U \subseteq V$ that is closed under vector addition and scalar multiplication. That is, for all $\mathbf{u}_1, \mathbf{u}_2 \in U$ and $\alpha \in \mathbb{R}$, it holds that

$\lhd$

*Exercise:* Show that $\mathbf{0}$ is an element of any (non-empty) linear subspace. $\lhd$

We consider a first key example:

**Definition (Span):** Let $\mathbf{w}_1, \ldots, \mathbf{w}_m \in V$. The span of $\{\mathbf{w}_1, \ldots, \mathbf{w}_m\}$, denoted $\mathrm{span}(\mathbf{w}_1, \ldots, \mathbf{w}_m)$, is the set of all linear combinations of the $\mathbf{w}_j$'s. That is,

By convention, we declare that the span of the empty list is $\{\mathbf{0}\}$. $\lhd$

We check that a span is a linear subspace:

**Lemma:** Let $W=\mathrm{span}(\mathbf{w}_1, \ldots, \mathbf{w}_m)$. Then $W$ is a linear subspace.

*Proof:* Let $\mathbf{u}_1, \mathbf{u}_2 \in W$ and $\alpha \in \mathbb{R}$. Then for $i=1,2$

and

$$ \alpha \mathbf{u}_1 + \mathbf{u}_2 = \alpha \sum_{j=1}^m \beta_{1,j} \mathbf{w}_j + \sum_{j=1}^m \beta_{2,j} \mathbf{w}_j = \sum_{j=1}^m (\alpha \beta_{1,j} + \beta_{2,j}) \mathbf{w}_j. $$So $\alpha \,\mathbf{u}_1 + \mathbf{u}_2 \in W$.$\square$

In matrix form, we talk about the column space of a (not necessarily square) matrix.

**Definition (Column Space):** Let $A \in \mathbb{R}^{n\times m}$ be an $n\times m$ matrix with columns $\mathbf{a}_1,\ldots, \mathbf{a}_m \in \mathbb{R}^n$. The column space of $A$, denoted $\mathrm{col}(A)$, is the span of the columns of $A$, that is, $\mathrm{col}(A) = \mathrm{span}(\mathbf{a}_1,\ldots, \mathbf{a}_m)$. $\lhd$

We will need another important linear subspace defined in terms of a matrix.

**Defintion (Null Space):** Let $B \in \mathbb{R}^{n \times m}$. The null space of $B$ is the linear subspace

$\lhd$

*Exercise:* Check that $\mathrm{null}(B)$ is indeed a linear subspace. $\lhd$

*Exercise:* Let $\beta_j \neq 0$ for all $j \in [m]$. Show that $\mathrm{span}(\beta_1 \mathbf{w}_1, \ldots, \beta_m \mathbf{w}_m) = \mathrm{span}(\mathbf{w}_1, \ldots, \mathbf{w}_m).$ $\lhd$

**NUMERICAL CORNER** The plane $z = x+y$, which passes through the origin, is a linear subspace of $\mathbb{R}^3$. Indeed, for any $(x_1, y_1, z_1)$ and $(x_2, y_2, z_2)$ such that $z_1 = x_1 + y_1$ and $z_2 = x_2 + y_2$ and for any $\alpha \in \mathbb{R}$, we have

In this example, the plane $W$ is the collection of every vector of the form $(x, y, x+y)$. These can be written as $x \,\mathbf{w}_1 + y \,\mathbf{w}_2$ where $\mathbf{w}_1 = (1,0,1)$ and $\mathbf{w}_2 = (0,1,1)$, and vice versa. Hence $W = \mathrm{span}(\mathbf{w}_1,\mathbf{w}_2)$. $\lhd$

In [1]:

```
# Julia version: 1.5.1
using Plots, LinearAlgebra
```

In [2]:

```
surface(LinRange(0,1,100), LinRange(0,1,100), (x,y)->x+y, legend=false)
```

Out[2]:

It is often desirable to avoid redundancy in the description of a linear subspace.

**Example:** Consider the linear subspace $\mathrm{span}(\mathbf{w}_1,\mathbf{w}_2,\mathbf{w}_3)$, where $\mathbf{w}_1 = (1,0,1)$, $\mathbf{w}_2 = (0,1,1)$, and $\mathbf{w}_3 = (1,-1,0)$. We claim that $\mathrm{span}(\mathbf{w}_1,\mathbf{w}_2,\mathbf{w}_3) = \mathrm{span}(\mathbf{w}_1,\mathbf{w_2})$. Recall that to prove an equality between sets, it suffices to prove inclusion in both directions.

First, it is immediate by definition of the span that

$$ \mathrm{span}(\mathbf{w}_1,\mathbf{w}_2) \subseteq \mathrm{span}(\mathbf{w}_1,\mathbf{w}_2,\mathbf{w}_3). $$To prove the other direction, let $\mathbf{u} \in \mathrm{span}(\mathbf{w}_1,\mathbf{w}_2,\mathbf{w}_3)$ so that

$$ \mathbf{u} = \beta_1\,(1,0,1) + \beta_2\,(0,1,1) + \beta_3\,(1,-1,0). $$Now observe that $(1,-1,0) = (1,0,1) - (0,1,1)$. Put differently, $\mathbf{w}_3 \in \mathrm{span}(\mathbf{w}_1,\mathbf{w}_2)$. Replacing above gives

$$ \mathbf{u} = \beta_1\,(1,0,1) + \beta_2\,(0,1,1) + \beta_3\,[(1,0,1) - (0,1,1)] = (\beta_1+\beta_3)\,(1,0,1) + (\beta_2-\beta_3)\,(0,1,1) $$which shows that $\mathbf{u} \in \mathrm{span}(\mathbf{w}_1,\mathbf{w_2})$. In words, $(1,-1,0)$ is redundant. Hence

$$ \mathrm{span}(\mathbf{w}_1,\mathbf{w}_2) \supseteq \mathrm{span}(\mathbf{w}_1,\mathbf{w}_2,\mathbf{w}_3) $$and that concludes the proof.$\lhd$

**Definition (Linear Independence):** A list of vectors $\mathbf{u}_1,\ldots,\mathbf{u}_m$ is linearly independent if none of them can be written as a linear combination of the others, that is,

By convention, we declare the empty list to be linearly independent. A list of vectors is called linearly dependent if it is not linearly independent. $\lhd$

**Lemma (Equivalent Definition of Linear Independence):** Vectors $\mathbf{u}_1,\ldots,\mathbf{u}_m$ are linearly independent if and only if

Equivalently, $\mathbf{u}_1,\ldots,\mathbf{u}_m$ are linearly dependent if and only if there exist $\alpha_j$'s, not all zero, such that $\sum_{j=1}^m \alpha_j \mathbf{u}_j = \mathbf{0}$.

*Proof:* We prove the second statement.

- Assume $\mathbf{u}_1,\ldots,\mathbf{u}_m$ are linearly dependent. Then $\mathbf{u}_i = \sum_{j\neq i} \alpha_j \mathbf{u}_j$ for some $i$. Taking $\alpha_i = -1$ gives $\sum_{j=1}^m \alpha_j \mathbf{u}_j = \mathbf{0}$.
- Assume $\sum_{j=1}^m \alpha_j \mathbf{u}_j = \mathbf{0}$ with $\alpha_j$'s not all zero. In particular $\alpha_i \neq 0$ for some $i$. Then $\mathbf{u}_i = \frac{1}{\alpha_i} \sum_{j\neq i} \alpha_j \mathbf{u}_j$.$\square$

In matrix form: let $\mathbf{a}_1,\ldots,\mathbf{a}_m \in \mathbb{R}^n$ and form the matrix whose columns are the $\mathbf{a}_i$'s

$$ A = \begin{pmatrix} | & & | \\ \mathbf{a}_1 & \ldots & \mathbf{a}_m \\ | & & | \end{pmatrix}. $$Note that $A\mathbf{x}$ is the following linear combination of the columns of $A$: $\sum_{j=1}^m x_j \mathbf{a}_j$. Hence $\mathbf{a}_1,\ldots,\mathbf{a}_m$ are linearly independent if and only if $A \mathbf{x} = \mathbf{0} \implies \mathbf{x} = \mathbf{0}$. Equivalently, $\mathbf{a}_1,\ldots,\mathbf{a}_m$ are linearly dependent if and only if $\exists \mathbf{x}\neq \mathbf{0}$ such that $A \mathbf{x} = \mathbf{0}$.

Bases give a convenient representation of a subspace.

**Definition (Basis):** Let $U$ be a linear subspace of $V$. A basis of $U$ is a list of vectors $\mathbf{u}_1,\ldots,\mathbf{u}_m$ in $U$ that: (1) span $U$, that is, $U = \mathrm{span}(\mathbf{u}_1,\ldots,\mathbf{u}_m)$; and (2) are linearly independent. $\lhd$

A list of vectors that span a linear subspace $U$ is also referred to as a spanning set of $U$. We denote by $\mathbf{e}_1, \ldots, \mathbf{e}_n$ the standard basis of $\mathbb{R}^n$, where $\mathbf{e}_i$ has a one in coordinate $i$ and zeros in all other coordinates.

The first key property of a basis is that it provides a unique representation of the vectors in the subspace. Indeed, let $U$ be a linear subspace and $\mathbf{u}_1,\ldots,\mathbf{u}_m$ be a basis of $U$. Suppose that $\mathbf{w} \in U$ can be written as $\mathbf{w} = \sum_{j=1}^m \alpha_j \mathbf{u}_j$ and $\mathbf{w} = \sum_{j=1}^m \alpha_j' \mathbf{u}_j$. Then subtracting one equation from the other we arrive at $\sum_{j=1}^m (\alpha_j - \alpha_j') \,\mathbf{u}_j = \mathbf{0}$. By linear independence, we have $\alpha_j - \alpha_j' = 0$ for each $j$.

The second key property of a basis is that it always has the same number of elements, which is called the dimension of the subspace. When applied to a matrix $A$, the dimension of the column space of $A$ is called the column rank of $A$.

**Theorem and Definition (Dimension):** Let $U$ be a linear subspace of $V$. All bases of $U$ have the same length, that is, the same number of elements. We call this number the dimension of $U$ and denote it $\mathrm{dim}(U)$.

The proof requires a key lemma first. It states that, given a linearly dependent list of vectors, one of the vectors is in the span of the previous ones and we can remove it without changing the span.

**Lemma (Linear Dependence):** Let $\mathbf{u}_1,\ldots,\mathbf{u}_m$ be a linearly dependent list of vectors with $\mathbf{u}_1 \neq 0$. Then there is an $i$ such that:

- $\mathbf{u}_i \in \mathrm{span}(\mathbf{u}_1,\ldots,\mathbf{u}_{i-1})$
- $\mathrm{span}(\{\mathbf{u}_j:j \in [m]\}) = \mathrm{span}(\{\mathbf{u}_j:j \in [m],\ j\neq i\})$

*Proof idea:* By linear dependence, $\mathbf{0}$ can be written as a non-trivial linear combination of the $\mathbf{u}_j$'s. Then $i$ is the largest index with non-zero coefficient.

*Proof:* For 1, by linear dependence, $\sum_{j=1}^m \alpha_j \mathbf{u}_j = \mathbf{0}$ with not all $\alpha_j$'s zero. Further, because $\mathbf{u}_1 \neq \mathbf{0}$, not all $\alpha_2, \ldots, \alpha_m$ are zero. Take the largest index among the $\alpha_j$'s that are non-zero, say $i$. Then rearranging gives

For 2, we note that for any $\mathbf{w} \in \mathrm{span}(\{\mathbf{u}_j:j \in [m]\})$ we can write it as $\mathbf{w} = \sum_{j=1}^m \beta_j \mathbf{u}_j$ and we can replace $\mathbf{u}_i$ by the equation above, producing a representation of $\mathbf{w}$ in terms of $\{\mathbf{u}_j:j \in [m], j\neq i\}$.$\square$

We return to the proof of the theorem.

*Proof idea (Dimension):* It suffices to establish the following claim: the length of a spanning list is greater or equal than the length of a basis. Indeed applying this claim to two different bases implies they must have the same length. To prove the claim, we add the basis vectors one by one to the spanning list and remove one of the latter by the linear dependence lemma. This process preserves the length and the span. If we run out of the vectors from the spanning list before the process is over, then we contradict the linear independence of the basis.

*Proof (Dimension):* It suffices to show that the length of a spanning list $\{\mathbf{w}_i:i\in [n]\}$ of $U$ is greater or equal than the length of a basis $\{\mathbf{u}_j:j\in[m]\}$ of $U$. First, we consider the list $\{\mathbf{u}_1,\mathbf{w}_1,\ldots,\mathbf{w}_n\}$. Because the $\mathbf{w}_i$'s are spanning, adding $\mathbf{u_1} \neq \mathbf{0}$ to them necessarily produces a linearly dependent list. By the *Linear Dependence Lemma*, we can remove one of the $\mathbf{w}_i$'s without changing the span. The new list $B$ has length $n$ again. Then we add $\mathbf{u}_2$ to $B$ immediately after $\mathbf{u}_1$. By the *Linear Dependence Lemma*, one of the vectors in this list is in the span of the previous ones. It cannot be $\mathbf{u}_2$ as $\{\mathbf{u}_1, \mathbf{u}_2\}$ are linearly independent by assumption. So it must be one of the remaining $\mathbf{w}_i$'s. We remove that one, without changing the span by the *Linear Dependence Lemma* again. This process can be continued until we have added all the $\mathbf{u}_j$'s, as otherwise we would have a contradiction in the argument above. Hence, there must be at least as many $\mathbf{w}_i$'s as there are $\mathbf{u}_j$'s.$\square$

*Exercise:* Let $\mathcal{W}$ be a linear subspace of $\mathbb{R}^d$ and let $\mathbf{w}_1,\ldots,\mathbf{w}_k$ be a basis of $\mathcal{W}$. Show that there exists a basis of $\mathbb{R}^d$ that includes the $\mathbf{w}_i$'s. [Hint: Repeatedly apply the *Linear Dependence Lemma* to the list $\{\mathbf{w}_1,\ldots,\mathbf{w}_k, \mathbf{e}_1,\ldots,\mathbf{e}_d\}$.] $\lhd$

*Exercise:* Let $\mathcal{W}$ be a linear subspace of $\mathbb{R}^d$. Show that $\mathrm{dim}(\mathcal{W}) \leq d$. [Hint: Use the previous exercise.] $\lhd$

*Exercise:* Show that, in a linear subspace, the length of every linearly independent list of vectors is less than or equal to the length of every spanning list of vectors. [Hint: Suppose that $\{\mathbf{u}_1,\ldots,\mathbf{u}_m\}$ is linearly independent in subspace $\mathcal{W}$ and that $\{\mathbf{w}_1,\ldots,\mathbf{w}_n\}$ spans $\mathcal{W}$. Start with the list of $\mathbf{w}_i$'s, add the $\mathbf{u}_j$'s one by one to beginning of the list, and use the *Linear Dependence Lemma*.]
$\lhd$

*Exercise:* Let $\mathcal{W}$ be a linear subspace of $\mathbb{R}^d$. Show that $\mathcal{W}$ has a basis. [Hint: Repeatedly add vectors not in the span and use the previous exercise.] $\lhd$

The following dimension-based argument will prove useful.

*Exercise:* Let $\mathcal{U}, \mathcal{V} \subseteq \mathbb{R}^d$ be subspaces such that $\mathrm{dim}(\mathcal{U}) + \mathrm{dim}(\mathcal{V}) > d$. Show there exists a non-zero vector in the intersection $\mathcal{U} \cap \mathcal{V}$. [Hint: Take the union of a basis for $\mathcal{U}$ and a basis for $\mathcal{V}$, then use linear dependence.] $\lhd$

Recall that the column rank of $A$ is the dimension of its column space. Similarly the row rank of $A$ is the dimension of its row space.

**Definition (Row Space):** The row space of $A \in \mathbb{R}^{n \times m}$, denoted $\mathrm{row}(A)$, is the span of the rows of $A$ as vectors in $\mathbb{R}^m$. $\lhd$

As it turns out, these two notions of rank are the same. From now on, we refer to the row rank and column rank of $A$ simply as the rank, which we denote by $\mathrm{rk}(A)$.

**Lemma (Row Rank Equals Column Rank):** For any $A \in \mathbb{R}^{n \times m}$, the row rank of $A$ equals the column rank of $A$.

*Proof idea:* Write $A$ as a matrix decomposition $BC$ where the columns of $B$ form a basis of $\mathrm{col}(A)$. Then the rows of $C$ necessarily form a spanning set of $\mathrm{row}(A)$. So the row rank is less or equal than the column rank. Applying the same argument to the transpose gives the claim.

*Proof:* Assume that $A$ has column rank $r$. Then there exists a basis $\mathbf{b}_1,\ldots, \mathbf{b}_r \in \mathbb{R}^n$ of $\mathrm{col}(A)$. That is, for each $j$, letting $\mathbf{a}_{j}$ be the $j$-th column of $A$ we can write

for some $c_{i,j}$'s. Think of $\mathbf{c}_{i} = (c_{i,1},\ldots,c_{i,r})$ as a column vector and form the matrix $C$ with row $i$ equal to $\mathbf{c}_{i}^T$. Then the equation above can be re-written as the matrix decomposition $A = BC$. Let $\boldsymbol{\alpha}_{i}^T$ be the $i$-th row of $A$. This decomposition is equivalent to

$$ \boldsymbol{\alpha}_{i}^T = \sum_{j=1}^r b_{i,j} \mathbf{c}_j^T $$for $i=1,\ldots, n$. In particular, $\mathcal{C} = \{\mathbf{c}_{j}:j=1,\ldots,r\}$ is a spanning set of the row space of $A$: each row of $A$ can be written as a linear combination of $\mathcal{C}$. So the row rank of $A$ is at most $r$, the column rank of $A$. Applying the same argument to $A^T$, which switches the role of the columns and the rows, gives that the column rank of $A$ (i.e. the row rank of $A^T$) is at most the row rank of $A$ (i.e. the column rank of $A^T$). Hence the two notions of rank must be equal. $\square$

**NUMERICAL CORNER** In Julia, one can compute the rank of a matrix using the function `rank`

. We will see later in the course how to compute it using the singular value decomposition (which is how `rank`

does it).

Let's try the example above.

In [3]:

```
w1, w2, w3 = [1,0,1], [0,1,1], [1,-1,0];
```

In [4]:

```
A = hcat(w1,w2)
```

Out[4]:

In [5]:

```
rank(A)
```

Out[5]:

In [6]:

```
B = hcat(A,w3)
```

Out[6]:

In [7]:

```
rank(B)
```

Out[7]:

Recall the following basic definition.

**Definition (Nonsingular Matrix):** A square matrix $A \in \mathbb{R}^{n \times n}$ is nonsingular if it has rank $n$. $\lhd$

Equivalently:

**Lemma (Invertibility):** A square matrix $A \in \mathbb{R}^{n \times n}$ is nonsingular if and only if there exists a unique $A^{-1}$ such that

The matrix $A^{-1}$ is referred to as the inverse of $A$. We also say that $A$ is invertible.

*Proof idea:* We use the nonsingularity of $A$ to write the columns of the identity matrix as unique linear combinations of the columns of $A$.

*Proof:* Suppose first that $A$ has rank $n$. Then its columns are linearly independent and form a basis of $\mathbb{R}^n$. In particular, for any $i$ the standard basis vector $\mathbf{e}_i$ can be written as a unique linear combination of the columns of $A$, i.e., there is $\mathbf{b}_i$ such that $A \mathbf{b}_i =\mathbf{e}_i$. Let $B$ be the matrix with columns $\mathbf{b}_i$, $i=1,\ldots,n$. By construction, $A B = I_{n\times n}$. Applying the same idea to the rows of $A$ (which by the *Row Rank Equals Column Rank Lemma* also form a basis of $\mathbb{R}^n$), there is a unique $C$ such that $C A = I_{n\times n}$. Multiplying both sides by $B$, we get

So we take $A^{-1} = B = C$.

In the other direction, the equation $A A^{-1} = I_{n \times n}$ implies that the standard basis is in the column space of $A$. That proves the claim. $\square$

**Lemma:** Let $A \in \mathbb{R}^{n \times n}$ be a nonsingular square matrix. Then for any $\mathbf{b} \in \mathbb{R}^n$, there exists a unique $\mathbf{x} \in \mathbb{R}^n$ such that $A \mathbf{x} = \mathbf{b}$. Moreover $\mathbf{x} = A^{-1} \mathbf{b}$.

*Proof:* The first claim follows immediately from the fact that the columns of $A$ form a basis of $\mathbb{R}^n$. For the second claim, note that

$\square$

*Exercise:* Prove the reverse direction of the previous lemma. That is, let $A \in \mathbb{R}^{n \times n}$ be a square matrix. Show that, if for any $\mathbf{b} \in \mathbb{R}^n$ there exists a unique $\mathbf{x} \in \mathbb{R}^n$ such that $A \mathbf{x} = \mathbf{b}$, then $A$ is nonsingular. [Hint: Consider $\mathbf{b} = \mathbf{0}$.] $\lhd$

For the following example, we will a useful property of matrix transposes.

*Exercise:* Show that, if $B \in \mathbb{R}^{n \times m}$ and $C \in \mathbb{R}^{m \times p}$, then $(BC)^T = C^T B^T$. [Hint: Check that the entries match.] $\lhd$

**Example:** Let $A \in \mathbb{R}^{n \times m}$ with $n \geq m$ be of full column rank. We will show that the square matrix $B = A^T A$ is then invertible.

First observe that $B$ is an $m \times m$ matrix. By definition, to show that it is nonsingular, we need to establish that it has rank $m$, or put differently that its columns are linear independent. By the *Equivalent Definition of Linear Independence*, it suffices to show that

where the $\mathbf{b}_j$'s are the columns of $B$.

The sum $\sum_{j=1}^m x_j \mathbf{b}_j$ can be expressed in matrix form as $B \mathbf{x}$, where $\mathbf{x} = (x_1, \ldots, x_m)^T$. Hence

$$ \sum_{j=1}^m x_j \mathbf{b}_j = \mathbf{0} \iff B \mathbf{x} = \mathbf{0} \iff A^T A \mathbf{x} = \mathbf{0}. $$Now comes the key idea: after multiplying from the left by $\mathbf{x}^T$, the rightmost equality implies

$$ \mathbf{x}^T (A^T A \mathbf{x}) = (A \mathbf{x})^T (A \mathbf{x}) = \|A \mathbf{x}\|^2 = \mathbf{x}^T \mathbf{0} = 0, $$where we used the exercise above. By the point-separating property of the Euclidean norm, that means that we must have $A \mathbf{x} = \mathbf{0}$. Because $A$ has full column rank itself, the *Equivalent Definition of Linear Independence* implies that $\mathbf{x} = \mathbf{0}$, which is what we needed to prove. $\lhd$