# Spectral and singular value decompositions¶

## 5 Graph partitioning¶

Course: Math 535 - Mathematical Methods in Data Science (MMiDS)
Author: Sebastien Roch, Department of Mathematics, University of Wisconsin-Madison
Updated: Sep 21, 2020

### 5.1 Extremal characterization of eigenvalues¶

We first generalize the extremal characterization used in the previous section.

Definition (Rayleigh Quotient): Let $A \in \mathbb{R}^{d \times d}$ be a symmetric matrix. The Rayleigh quotient is defined as

$$\mathcal{R}_A(\mathbf{u}) = \frac{\langle \mathbf{u}, A \mathbf{u} \rangle}{\langle \mathbf{u}, \mathbf{u} \rangle}$$

which is defined for any $\mathbf{u} \neq \mathbf{0}$ in $\mathbb{R}^{d}$. $\lhd$

Theorem (Courant-Fischer): Let $A \in \mathbb{R}^{d \times d}$ be a symmetric matrix with spectral decomposition $A = \sum_{i=1}^d \lambda_i \mathbf{v}_i \mathbf{v}_i^T$ where $\lambda_1 \geq \cdots \geq \lambda_d$. For each $k = 1,\ldots,d$, define the subspace

$$\mathcal{V}_k = \mathrm{span}(\mathbf{v}_1, \ldots, \mathbf{v}_k) \quad\text{and}\quad \mathcal{W}_{d-k+1} = \mathrm{span}(\mathbf{v}_k, \ldots, \mathbf{v}_d).$$

Then, for all $k = 1,\ldots,d$,

$$\lambda_k = \min_{\mathbf{u} \in \mathcal{V}_k} \mathcal{R}_A(\mathbf{u}) = \max_{\mathbf{u} \in \mathcal{W}_{d-k+1}} \mathcal{R}_A(\mathbf{u}).$$

Furthermore we have the following min-max formulas, which do not depend on the choice of spectral decomposition, for all $k = 1,\ldots,d$

$$\lambda_k = \max_{\mathrm{dim}(\mathcal{V}) = k} \min_{\mathbf{u} \in \mathcal{V}} \mathcal{R}_A(\mathbf{u}) = \min_{\mathrm{dim}(\mathcal{W}) = d-k+1} \max_{\mathbf{u} \in \mathcal{W}} \mathcal{R}_A(\mathbf{u}).$$

Proof idea: For the local formula, we expand a vector in $\mathcal{V}_k$ into the basis $\mathbf{v}_1,\ldots,\mathbf{v}_k$ and use the fact that $\mathcal{R}_A(\mathbf{v}_i) = \lambda_i$ and that eigenvalues are in non-increasing order. The global formulas then follow from a dimension argument.

### 5.2 Returning to the Laplacian¶

We record one important special case of Courant-Fischer for the Laplacian.

Corollary (Extremal Characterization of $\mu_2$): Let $G = (V, E)$ be a graph with $n = |V|$ vertices. Assume the Laplacian $L$ of $G$ has spectral decomposition $L = \sum_{i=1}^n \mu_i \mathbf{y}_i \mathbf{y}_i^T$ with $0 = \mu_1 \leq \mu_2 \leq \cdots \leq \mu_n$ and $\mathbf{y}_1 = \frac{1}{\sqrt{n}}(1,\ldots,1)^T$. Then

$$\mu_2 = \min\left\{ \frac{\sum_{\{u, v\} \in E} (x_u - x_v)^2}{\sum_{u = 1}^n x_u^2} \,:\, \mathbf{x} = (x_1, \ldots, x_n)^T \neq \mathbf{0}, \sum_{u=1}^n x_u = 0 \right\}.$$

One application of this extremal characterization is the graph drawing heuristic we described previously. Consider the entries of the second Laplacian eigenvector $\mathbf{y}_2$ normalized to have unit norm. The entries are centered around $0$ by the condition $\sum_{u=1}^n x_{u} = 0$. Because it minimizes the quantity

$$\frac{\sum_{\{u, v\} \in E} (x_u - x_v)^2}{\sum_{u = 1}^n x_u^2}$$

over all centered unit vectors, $\mathbf{y}_2$ tends to assign similar coordinates to adjacent vertices. A similar reasoning applies to the third Laplacian eigenvector, which in addition is orthogonal to the second one.

In [2]:
n = 10
g = path_graph(n)

Out[2]:
{10, 9} undirected simple Int64 graph
In [3]:
gplot(g, nodelabel=1:n)

Out[3]:
In [4]:
L = Array(laplacian_matrix(g))
F = eigen(L)
v = F.vectors
plot(v[:,2], legend=false, marker=:diamond)

Out[4]:

### 5.3 How to cut a graph¶

Let $G=(V, E)$ be a graph. Imagine that we are interested in finding a good cut. That is, roughly speaking, we seek to divide it into two disjoint subsets of vertices to achieve two goals simultaneously:

1. the two sets have relatively few edges between them
2. neither set is too small.

We will show that the Laplacian eigenvectors provide useful information in order to perform this kind of graph cutting. First we formulate the problem formally.

Definition (Isoperimetric Number): Let $G=(V, E)$ be a graph. A cut is a bipartition $(S, S^c)$ of the vertices of $G$, where $S$ and $S^c$ are non-empty subsets of $V$. For short, we refer to $\emptyset \neq S \subset V$ as a cut. The size of the cut, $|\partial S|$, is the number of edges from $S$ to $S^c$, where the edge boundary is defined as

$$\partial S = \{ \{i,j\} \in E : i \in S, j \in S^c \}.$$

The cut ratio of $S$ is

$$\phi(S) = \frac{|\partial S|}{\min\{|S|, |S^c|\}}$$

and the isoperimetric number (or Cheeger constant) of $G$ is the minimum cut ratio over all of its cuts, that is,

$$\phi_G = \min_{\emptyset \neq S \subset V} \phi(S).$$

$\lhd$

Theorem (Cheeger Inequality): Let $G = (V, E)$ be a graph with $n = |V|$ vertices and maximum degree $\bar{\delta}$. Let $0 = \mu_1 \leq \mu_2 \leq \cdots \leq \mu_n$ be its Laplacian eigenvalues. Then

$$\frac{\phi_G^2}{2 \bar{\delta}} \leq \mu_2 \leq 2 \phi_G.$$

Proof idea: We find an appropriate test vector to plug into the extremal characterization of $\mu_2$ and link it to $\phi_G$.

Constructing a good test vector: We construct an $\mathbf{x}$ that provides a good upper bound. Let $\emptyset \neq S \subset V$ be a proper, nonempty subset of $V$ such that $0 < |S| \leq \frac{1}{2}|V|$. Consider the vector $\mathbf{x} \in \mathbb{R}^n$ with entries

$$x_i = \begin{cases} |S| & \text{if i \in S^c}\\ - |S^c| & \text{if i \in S}. \end{cases}$$

This choice ensures that

$$\sum_{i=1}^n x_i = \sum_{i\in S^c} |S| + \sum_{i\in S} (- |S^c|) = |S^c| |S| - |S| |S^c| = 0.$$

and

\begin{align} \sum_{i=1}^n x_i^2 &= \sum_{i\in S^c} |S|^2 + \sum_{i\in S} (- |S^c|)^2\\ &= |S^c| |S|^2 + |S| |S^c|^2\\ &= |S^c| |S| (|S| + |S^c|)\\ &= n |S^c| |S|. \end{align}

So

$$\sum_{\{i, j\} \in E} (x_i - x_j)^2 = \sum_{\{i, j\} \in E} (|S| + |S^c|)^2 = n^2 |\partial S|$$

where we used that, for each edge $\{i, j\} \in E$, one endvertex is in $S$ and one endvertex is in $S^c$. Hence $x_i - x_j$ is either $|S| + |S^c|$ or $-|S| - |S^c|$.

Using the definition of the isoperimetric number: So for this choice of $\mathbf{x}$ we have

$$\mu_2 \leq \frac{\sum_{\{i, j\} \in E} (x_i - x_j)^2}{\sum_{i=1}^n x_i^2} = \frac{n^2 |\partial S|}{n |S^c| |S|} = \frac{|\partial S|}{(|S^c|/n) |S|} \leq 2 \frac{|\partial S|}{|S|}$$

where we used that $|S^c| \geq n/2$. This inequality holds for any $S$ such that $0 < |S| \leq \frac{1}{2}|V|$. In particular, it holds for the $S$ producing the smallest value. Hence, by the definition of the isoperimetric number (see the exercise above), we get

$$\mu_2 \leq 2 \phi_G$$

as claimed. $\square$

An algorithm: Let $\mathbf{y}_2 \in \mathbb{R}^n$ be the unit-norm eigenvector of $L$ associated to $\mu_2$. There is one entry of $\mathbf{y}_2 = (y_{2,1}, \ldots, y_{2,n})^T$ for each vertex of $G$. We use these entries to embed the graph $G$ in $\mathbb{R}$: vertex $i$ is mapped to $y_{2,i}$. Now order the entries $y_{2,\pi(1)}, \ldots, y_{2,\pi(n)}$, where $\pi$ is a permutation, that is, a re-ordering of $1,\ldots,n$. Specifically, $\pi(1)$ is the vertex corresponding to the smallest entry of $\mathbf{y}_{2}$, $\pi(2)$ is the second smallest, and so on. We consider only cuts of the form

$$S_k = \{\pi(1), \ldots, \pi(k)\}$$

and we choose the $k \leq n/2$ that minimizes the cut ratio

$$\phi(S_k) = \frac{|\partial S_k|}{|S_k|}.$$

What can be proved rigorously is that there exists some $k$ such that

$$\sum_{\{u, v\} \in E} (y_{2,u} - y_{2,v})^2 \geq \frac{\phi(S_k)^2}{2 \bar{\delta}}$$

which implies the Cheeger Inequality.

In [5]:
function cut_ratio(A, order, k, n)
edge_boundary = 0 # initialize size of edge boundary
for i=1:k # for all vertices before cut
for j=k+1:n # for all vertices after cut
edge_boundary += A[order[i],order[j]] # add one if {i,j} in E
end
end
denominator = min(k, n-k)
return edge_boundary/denominator
end

Out[5]:
cut_ratio (generic function with 1 method)
In [6]:
function spectral_cut2(A)
n = size(A,1) # number of vertices
L = Diagonal(reshape(sum(A,dims=2), n)) - A # construct Laplacian
v = eigvecs(L) # eigenvectors of L
order = sortperm(v[:,2]) # index of entries in increasing order
phi = zeros(Float64, n-1) # initialize cut ratios
for k=1:n-1
phi[k] = cut_ratio(A, order, k, n)
end
(phimin, imin) = findmin(phi) # find best cut ratio
return order[1:imin], order[imin+1:end]
end

Out[6]:
spectral_cut2 (generic function with 1 method)
In [7]:
n = 10
g = path_graph(n)
gplot(g, nodelabel=1:n)

Out[7]:
In [8]:
A = Array(adjacency_matrix(g))
degrees = reshape(sum(A,dims=2), n)
D = Diagonal(degrees)
L = D - A

Out[8]:
10×10 Array{Int64,2}:
1  -1   0   0   0   0   0   0   0   0
-1   2  -1   0   0   0   0   0   0   0
0  -1   2  -1   0   0   0   0   0   0
0   0  -1   2  -1   0   0   0   0   0
0   0   0  -1   2  -1   0   0   0   0
0   0   0   0  -1   2  -1   0   0   0
0   0   0   0   0  -1   2  -1   0   0
0   0   0   0   0   0  -1   2  -1   0
0   0   0   0   0   0   0  -1   2  -1
0   0   0   0   0   0   0   0  -1   1
In [9]:
F = eigen(L)
lambda = F.values
plot(lambda, legend=false, marker=:diamond)

Out[9]:
In [10]:
v = F.vectors
scatter(v[:,2], zeros(n), legend=false, ylims=(-1,1))

Out[10]:
In [11]:
(s, sc) = spectral_cut2(A)

Out[11]:
([1, 2, 3, 4, 5], [6, 7, 8, 9, 10])
In [12]:
function viz_cut(g, s; layout=spectral_layout)
n = nv(g)
assign = ones(Int64, n)
assign[s] .= 2
colors = [colorant"lightseagreen", colorant"orange"]
nodecolor = colors[assign]
gplot(g, nodelabel=1:n, nodefillc=nodecolor, layout=layout)
end

Out[12]:
viz_cut (generic function with 1 method)
In [14]:
g = LightGraphs.SimpleGraphs.grid([4, 7])