TOPIC 2

Spectral and singular value decompositions

1 Background: review of eigenvalues; graph-theoretic definitions


Course: Math 535 - Mathematical Methods in Data Science (MMiDS)
Author: Sebastien Roch, Department of Mathematics, University of Wisconsin-Madison
Updated: Sep 28, 2020
Copyright: © 2020 Sebastien Roch


We review spectral decompositions and elements of graph theory.

1.1 Spectral decomposition

We first review eigenvalues and eigenvectors.

1.1.1 Eigenvalues and eigenvectors

As before, we work on $\mathbb{R}^d$.

Definition (Eigenvalues and eigenvectors): Let $A \in \mathbb{R}^{d \times d}$ be a square matrix. Then $\lambda \in \mathbb{R}$ is an eigenvalue of $A$ if there exists a nonzero vector $\mathbf{x} \neq \mathbf{0}$ such that

$$ A \mathbf{x} = \lambda \mathbf{x}. $$

The vector $\mathbf{x}$ is referred to as an eigenvector. $\lhd$

As the next example shows, not every matrix has an eigenvalue.

Example (No Real Eigenvalues): Set $d = 2$ and let

$$ A = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}. $$

For $\lambda$ to be an eigenvalue, there must be an nonzero eigenvector $\mathbf{x} = (x_1, x_2)^T$ such that

$$ A \mathbf{x} = \lambda \mathbf{x} $$

or put differently

$$ - x_2 = \lambda x_1 \quad\text{and}\quad x_1 = \lambda x_2. $$

Replacing these equations into each other, it must be that

$$ - x_2 = \lambda^2 x_2 \quad\text{and}\quad x_1 = - \lambda^2 x_1. $$

Because $x_1, x_2$ cannot both be $0$, $\lambda$ must satisfy the equation

$$ \lambda^2 = -1 $$

for which there is no real solution. $\lhd$

In general, $A \in \mathbb{R}^{d \times d}$ has at most $d$ distinct eigenvalues.


Lemma (Number of Eigenvalues): Let $A \in \mathbb{R}^{d \times d}$ and let $\lambda_1, \ldots, \lambda_m$ be distinct eigenvalues of $A$ with corresponding nonzero eigenvectors $\mathbf{x}_1, \ldots, \mathbf{x}_m$. Then $\mathbf{x}_1, \ldots, \mathbf{x}_m$ are linearly independent. As a result, $m \leq d$.


Proof idea: We use the Linear Dependence Lemma and the distinctness of the eigenvalues to derive a contradiction.

Proof: Assume by contradiction that $\mathbf{x}_1, \ldots, \mathbf{x}_m$ are linearly dependent. By the Linear Dependence Lemma, there is $k \leq m$ such that $$ \mathbf{x}_k \in \mathrm{span}(\mathbf{x}_1, \ldots, \mathbf{x}_{k-1}) $$

where $\mathbf{x}_1, \ldots, \mathbf{x}_{k-1}$ are linearly independent. In particular, there are $a_1, \ldots, a_{k-1}$ such that

$$ \mathbf{x}_k = a_1 \mathbf{x}_1 + \cdots + a_{k-1} \mathbf{x}_{k-1}. $$

Transform the equation above in two ways: (1) multiply both sides by $\lambda_k$ and (2) apply $A$. Then subtract the resulting equations. That leads to

$$ \mathbf{0} = a_1 (\lambda_k - \lambda_1) \mathbf{x}_1 + a_{k-1} (\lambda_k - \lambda_{k-1}) \mathbf{x}_{k-1}. $$

Because the $\lambda_i$'s are distinct and $\mathbf{x}_1, \ldots, \mathbf{x}_{k-1}$ are linearly independent, we must have $a_1 = \cdots = a_{k-1} = 0$. But that implies that $\mathbf{x}_k = \mathbf{0}$, a contradiction.

For the second claim, if there were more than $d$ distinct eigenvalues, then there would be more than $d$ corresponding linearly independent eigenvectors by the first claim, a contradiction. $\square$

Example (Diagonal (and Similar) Matrices): We use the notation $\mathrm{diag}(\lambda_1,\ldots,\lambda_d)$ for the diagonal matrix with diagonal entries $\lambda_1,\ldots,\lambda_d$. The upper bound in the Number of Eigenvalues Lemma can be achieved, for instance, by diagonal matrices with distinct diagonal entries $A = \mathrm{diag}(\lambda_1, \ldots, \lambda_d)$. Each standard basis vector $\mathbf{e}_i$ is then an eigenvector

$$ A \mathbf{e}_i = \lambda_i \mathbf{e}_i. $$

More generally, let $A$ be similar to a matrix $D = \mathrm{diag}(\lambda_1, \ldots, \lambda_d)$ with distinct diagonal entries, that is, there exists a nonsingular matrix $P$ such that

$$ A = P D P^{-1}. $$

Let $\mathbf{p}_1, \ldots, \mathbf{p}_d$ be the columns of $P$. Note that, because the columns of $P$ form a basis of $\mathbb{R}^d$, the entries of the vector $\mathbf{c} = P^{-1} \mathbf{x}$ are the coefficients of the unique linear combination of the $\mathbf{p}_i$'s equal to $\mathbf{x}$. Indeed, $P \mathbf{c} = \mathbf{x}$. Hence, $A \mathbf{x}$ is can be thought of as: (1) expressing $\mathbf{x}$ in the basis $\mathbf{p}_1, \ldots, \mathbf{p}_d$, (2) and scaling the $\mathbf{p}_i$'s by the corresponding $\lambda_i$'s. In particular, the $\mathbf{p}_i$'s are eigenvectors of $A$ since, by the above, $P^{-1} \mathbf{p}_i = \mathbf{e}_i$ and so

$$ A \mathbf{p}_i = P D P^{-1} \mathbf{p}_i = P D \mathbf{e}_i = P (\lambda_i \mathbf{e}_i) = \lambda_i \mathbf{p}_i. $$

$\lhd$

NUMERICAL CORNER In Julia, the eigenvalues and eigenvectors of a matrix can be computed using eigen.

In [1]:
#Julia version: 1.5.1
using LinearAlgebra, LightGraphs, GraphPlot
In [2]:
A = [2.5 -0.5; -0.5 2.5]
Out[2]:
2×2 Array{Float64,2}:
  2.5  -0.5
 -0.5   2.5
In [3]:
eigen(A)
Out[3]:
Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
2-element Array{Float64,1}:
 2.0
 3.0
vectors:
2×2 Array{Float64,2}:
 -0.707107  -0.707107
 -0.707107   0.707107

1.1.2 Spectral theorem

When $A$ is symmetric, that is, $a_{ij} = a_{ji}$ for all $i,j$, a remarkable result is that $A$ is similar to a diagonal matrix by an orthogonal transformation. Put differently, there exists an orthonormal basis of $\mathbb{R}^d$ made of eigenvectors of $A$. We will prove this result later in this topic.


Theorem (Spectral Theorem): Let $A \in \mathbb{R}^{d \times d}$ be a symmetric matrix, that is, $A^T = A$. Then $A$ has $d$ orthonormal eigenvectors $\mathbf{q}_1, \ldots, \mathbf{q}_d$ with corresponding (not necessarily distinct) real eigenvalues $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_d$. In matrix form, this is written as the matrix factorization

$$ A = Q \Lambda Q^T = \sum_{i=1}^d \lambda_i \mathbf{q}_i \mathbf{q}_i^T $$

where $Q$ has columns $\mathbf{q}_1, \ldots, \mathbf{q}_d$ and $\Lambda = \mathrm{diag}(\lambda_1, \ldots, \lambda_d)$. We refer to this factorization as a spectral decomposition of $A$.


1.1.3 Positive semidefinite matrices

The eigenvalues of a symmetric matrix - while real - may be negative. There is however an important special case where the eigenvalues are nonnegative, positive semidefinite matrices.


Theorem (Characterization of Positive Semidefiniteness): Let $A \in \mathbb{R}^{d \times d}$ be a symmetric matrix and let $A = Q \Lambda Q^T$ be a spectral decomposition of $A$ with $\Lambda = \mathrm{diag}(\lambda_1, \ldots, \lambda_d)$. Then $A \succeq 0$ if and only if its eigenvalues $\lambda_1, \ldots, \lambda_d$ are nonnegative.


Proof: Assume $A \succeq 0$. Let $\mathbf{q}_i$ be an eigenvector of $A$ with corresponding eigenvalue $\lambda_i$. Then

$$ \langle \mathbf{q}_i, A \mathbf{q}_i \rangle = \langle \mathbf{q}_i, \lambda_i \mathbf{q}_i \rangle = \lambda_i $$

which must be nonnegative by definition of a positive semidefinite matrix.

In the other direction, assume $\lambda_1, \ldots, \lambda_d \geq 0$. Note that the vector $Q^T \mathbf{x}$ has entries $\langle \mathbf{q}_i, \mathbf{x}\rangle$, $i=1,\ldots, d$. Thus

$$ \langle \mathbf{x}, A \mathbf{x} \rangle = \mathbf{x}^T Q \Lambda Q^T \mathbf{x} = \sum_{i=1}^d \lambda_i \langle \mathbf{q}_i, \mathbf{x}\rangle^2 $$

which is necessarily nonnegative. $\square$

1.2 Elements of graph theory

We will need some basic notions from graph theory.

1.2.1 Basic definitions

We start with undirected graphs.

Definition (Undirected Graph): An undirected graph (or graph for short) is a pair $G = (V,E)$ where $V$ is the set of vertices (or nodes) and

$$ E \subseteq \{\{u,v\}\,:\, u,v \in V\} $$

is the set of edges.$\lhd$

Example (Petersen Graph): The following example is called the Petersen graph. Here the circles are vertices (of which there are $10$) and the segments connecting them represent edges (of which there are $15$).

Petersen Graph

(Source)

$\lhd$

We occasionally write $V(G)$ and $E(G)$ for the vertices and edges of the graph $G$. In our case, the set of vertices $V$ is always finite.

Definition (Incidence and Adjacency): A vertex $v \in V$ is incident with an edge $e \in E$ if $v \in e$. The incident vertices of an edge are called its endvertices. Two vertices $u,v \in V$ are adjacent (or neighbors), which we denote by $u \sim v$, if $\{u,v\} \in E$. $\lhd$

Definition (Neighborhood and Degrees): The set of adjacent vertices of $v$, denoted by $N(v)$, is called the neighborhood of $v$ and its size, i.e., $\delta(v):=|N(v)|$, is the degree of $v$. $\lhd$

A vertex $v$ with $\delta(v) = 0$ is called isolated. A graph is called $d$-regular if all its degrees are $d$.

Example (Petersen, continued): Returning to the Petersen graph, all its vertices have degree $3$, that is, it is $3$-regular. In particular there is no isolated vertex. $\lhd$

Definition (Subgraph): A subgraph of $G = (V,E)$ is a graph $G' = (V',E')$ with $V' \subseteq V$ and $E' \subseteq E$. The subgraph $G'$ is said to be induced if

$$ E' = \{\{x,y\}\,:\, x,y \in V',\ \{x,y\}\in E\}, $$

that is, it contains all edges of $G$ between the vertices of $V'$. In that case the notation $G' := G[V']$ is used. $\lhd$

Implicit in this last definition is the fact that the edges in $E'$ are incident only to $V'$.

A subgraph is said to be spanning if $V' = V$. A subgraph containing all possible edges between its vertices is called a complete subgraph or clique.

Example (Petersen, continued): The Petersen graph contains no triangle (that is, complete subgraphs with $3$ vertices), induced or not. $\lhd$

Defintion (Paths): A path in $G$ is a sequence of (not necessarily distinct) vertices $x_0 \sim x_1 \sim \cdots \sim x_k$. The number of edges, $k$, is called the length of the path. If the endvertices $x_0$, $x_k$ coincide, that is, $x_0 = x_k$, we call the path a cycle. If the vertices are all distinct (except possibly for the endvertices), we say that the path (or cycle) is self-avoiding. $\lhd$

A self-avoiding path or cycle can be seen as a (not necessarily induced) subgraph of $G$. We write $u \leftrightarrow v$ if there is a path between $u$ and $v$. The relation $\leftrightarrow$ is in fact an equivalence relation. The equivalence classes, that is the maximal subsets $U \subseteq V$ such that $u \leftrightarrow v$ for all $u, v \in U$, are called connected components and form a partition of $V$. The length of the shortest self-avoiding path connecting two distinct vertices $u, v$ is called the graph distance between $u$ and $v$, denoted by $\rho(u,v)$.

Definition (Connectedness): A graph is connected if any two of its vertices are linked by a path, that is, if $u \leftrightarrow v$ for all $u, v \in V$. Or put differently, if there is only one connected component. $\lhd$

Example (Petersen, continued): The Petersen graph is connected. $\lhd$

Example: The following graph has three connected components:

Connected components

(Source)

$\lhd$

A forest is a graph with no self-avoiding cycle. A tree is a connected forest. Vertices of degree $1$ are called leaves. A spanning tree of $G$ is a subgraph which is a tree and is also spanning.

NUMERICAL CORNER In Julia, the LightGraphs.jl package provides many functionalities for defining, modifying and plotting graphs. For instance, many standard graphs can be defined conveniently. The smallgraph function is one way to do this. The following defines the Petersen graph.

In [7]:
g = smallgraph(:petersen)
Out[7]:
{10, 15} undirected simple Int64 graph

This graph can be plotted using the function gplot of the GraphPlot.jl package.

In [8]:
gplot(g)
Out[8]:

Other standard graphs can be generated with special functions, e.g. complete graphs using complete_graph. See here for a complete list.

In [9]:
g = complete_graph(3)
Out[9]:
{3, 3} undirected simple Int64 graph
In [10]:
gplot(g)
Out[10]:

See here and here for a list of functions to access various properties of a graph. Here are a few examples:

In [11]:
g = path_graph(10)
gplot(g)
Out[11]:
In [12]:
nv(g) # number of vertices
Out[12]:
10
In [13]:
ne(g) # number of edges
Out[13]:
9
In [14]:
has_vertex(g, 8) # checks whether the graph has a particular vertex
Out[14]:
true
In [15]:
has_vertex(g, 11)
Out[15]:
false
In [16]:
has_edge(g, 1, 2) # checks whether the graph has a particular vertex
Out[16]:
true
In [17]:
has_edge(g, 1, 3)
Out[17]:
false
In [18]:
neighbors(g, 3) # returns a list of neighbors of the specified vertex
Out[18]:
2-element Array{Int64,1}:
 2
 4
In [19]:
is_connected(g) # checks whether the graph is connected
Out[19]:
true
In [20]:
connected_components(g) # returns the connected components
Out[20]:
1-element Array{Array{Int64,1},1}:
 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
In [21]:
for e in edges(g) # iterate over all edges of a graph
    @show src(e), dst(e) # prints source and destination vertices of the edge
end
(src(e), dst(e)) = (1, 2)
(src(e), dst(e)) = (2, 3)
(src(e), dst(e)) = (3, 4)
(src(e), dst(e)) = (4, 5)
(src(e), dst(e)) = (5, 6)
(src(e), dst(e)) = (6, 7)
(src(e), dst(e)) = (7, 8)
(src(e), dst(e)) = (8, 9)
(src(e), dst(e)) = (9, 10)

Another way of specifying a dgraph is to start with an empty graph with a given number of vertices and then add edges one by one. The following command creates a graph with $4$ vertices and no edge (see SimpleGraph).

In [22]:
g = SimpleGraph(4)
Out[22]:
{4, 0} undirected simple Int64 graph
In [23]:
add_edge!(g, 1, 2)
add_edge!(g, 3, 4)
add_edge!(g, 1, 4)
add_edge!(g, 4, 1);
In [24]:
label = 1:nv(g)
gplot(g, nodelabel=label)
Out[24]:
1 2 3 4

1.2.2 Directed graphs

We will also need directed graphs.

Definition (Directed Graph): A directed graph (or digraph for short) is a pair $G = (V,E)$ where $V$ is a set of vertices (or nodes) and $E \subseteq V^2$ is a set of directed edges (or arcs). $\lhd$

Note that, unlike the undirected case, in a digraph the edges are ordered pairs - which is taken to mean that they have an orientation. If $e = (i,j) \in E$ is an edge in a digraph $G = (V,E)$, then $i$ is called the source of $e$ and $j$ is the destination.

Example: Here is an example:

A drected graph

(Source)

Note that a pair of vertices can have arcs between them in both directions simultaneously. $\lhd$

The definitions discussed in the undirected case can be adapted to the directed case.

Definition (Directed Path): A directed path is a sequence of vertices $x_0, \ldots, x_k$ with $(x_{i-1},x_i) \in E$ for all $i=1,\ldots,k$. We write $u \to v$ if there is such a path with $x_0 = u$ and $x_k = v$. $\lhd$

Definition (Communication): We say that $u,v \in V$ communicate, which we denote by $u \leftrightarrow v$, if $u \to v$ and $v \to u$. The $\leftrightarrow$ relation is again an equivalence relation. The equivalence classes of $\leftrightarrow$ are called the strongly connected components of $G$. $\lhd$

NUMERICAL CORNER The package LightGraphs.jl also supports digraphs.

In [25]:
g = star_digraph(5)
Out[25]:
{5, 4} directed simple Int64 graph
In [26]:
gplot(g)
Out[26]:

Another way of specifying a digraph is to start with an empty graph with a given number of vertices and then add edges one by one (compare to the undirected case above). The following command creates a graph with $4$ vertices and no edge (see SimpleDiGraph).

In [27]:
g = SimpleDiGraph(4)
Out[27]:
{4, 0} directed simple Int64 graph
In [28]:
add_edge!(g, 1, 2)
add_edge!(g, 3, 4)
add_edge!(g, 1, 4)
add_edge!(g, 4, 1);
In [29]:
label = 1:nv(g)
gplot(g, nodelabel=label)
Out[29]:
1 2 3 4