\documentclass[11pt]{article}
\usepackage{latexsym}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{epsfig}
%I added a package --- hopefully that's fine. If not, just delete the \url command wherever it appears in the references section.
\PassOptionsToPackage{hyphens}{url}\usepackage{hyperref}
\newcommand{\handout}[5]{
\noindent
\begin{center}
\framebox{
\vbox{
\hbox to 5.78in { {\bf MATH888: High-dimensional probability and statistics } \hfill #2 }
\vspace{4mm}
\hbox to 5.78in { {\Large \hfill #5 \hfill} }
\vspace{2mm}
\hbox to 5.78in { {\em #3 \hfill #4} }
}
}
\end{center}
\vspace*{4mm}
}
\newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{Scribe: #4}{Lecture #1}}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{observation}[theorem]{Observation}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{fact}[theorem]{Fact}
\newtheorem{assumption}[theorem]{Assumption}
% 1-inch margins, from fullpage.sty by H.Partl, Version 2, Dec. 15, 1988.
\topmargin 0pt
\advance \topmargin by -\headheight
\advance \topmargin by -\headsep
\textheight 8.9in
\oddsidemargin 0pt
\evensidemargin \oddsidemargin
\marginparwidth 0.5in
\textwidth 6.5in
\parindent 0in
\parskip 1.5ex
%\renewcommand{\baselinestretch}{1.25}
%Make life easier
\newcommand{\qin}{q_{\mathrm{in}}}
\newcommand{\qout}{q_{\mathrm{out}}}
\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
\newcommand{\RR}{\mathbb R}
\newcommand{\EE}{\mathbb E}
\newcommand{\PP}{\mathbb P}
\begin{document}
\lecture{40 --- December 10, 2021}{Fall 2021}{Sebastien Roch, UW-Madison}{Aidan Howells}
\section{Overview}
In this and the previous lecture we discussed the stochastic block model. This lecture was given from slides, available on the course website. The first half of the slides were review of last time, and the rest was new material based on Sections 3.1 and 3.4 in \cite{CCFM}. These lecture notes cover the new material, with notation changed to match last time.
\section{Notation}
Recall that we are concerned with detecting communities in the stochastic block model (SBM). Here $n$, the number of vertices, is a positive even integer. The vertices are partitioned uniformly at random into two communities of size $n/2$. Each pair of vertices in the same community has an edge between them with probability $\qin\in[0,1]$, and each pair in different communities has an edge with probability $\qout\in[0,\qin]$, all independently. Let $\Pi_2^n=\{\mathbf x\in\{\pm1\}^n:\mathbf x^\intercal\mathbf 1=0\}$ be the set of possible partitions (we think of $i$ as being in the $+$ community when $x_i=+1$, and in the $-$ community otherwise). Let $(X,A)$ be drawn from the SBM, so that $X\in\Pi_2^n$ is uniformly random and $A$ is the (random) adjacency matrix of the resulting graph.
\section{Spectral Clustering}
Last time we discussed estimating the partition into communities, given the adjacency matrix $A$, by an $\mathbf x\in\Pi_2^n$ which maximized $\mathbf x^\intercal A\mathbf x$. We suggested that the correct approach was to relax this optimization problem. The relaxed optimization problem that we wish to consider is the following:
\[
\mathop{\max_{\mathbf x\in\RR^n:\norm{\mathbf x}_2^2=n}}_{\mathbf x^\intercal\mathbf 1=0} \mathbf x^\intercal A\mathbf x.
\]
Consider the matrix
\[
M:=A-\frac{\qin+\qout}2 \mathbf 1\mathbf 1^\intercal+\qin I
\]
Let $M^*=\EE[M|X]$. (Alternatively, you can assume that $M$ is indexed so that all members of the first community come before those of the second community, and just look at $\EE[M]$). One can check that $M^*_{ij}$ is $(\qin-\qout)/2$ if $i$ and $j$ are in the same community and $-(\qin-\qout)/2$ otherwise. It follows that the leading eigenvalue of $M^*$ is
\begin{align*}
\lambda^*&=\frac{(\qin-\qout)n}2,
\end{align*}
with associated unit eigenvalue $\mathbf u^*=(u^*_i)$ given by
\[
u^*_i=\begin{cases}
n^{-1/2} & X_i=+1\\
-n^{-1/2} & X_i=-1
\end{cases}
\]
If we knew $\mathbf u^*$, we would be done. So our strategy to estimate $X$ is the following spectral clustering algorithm:
\begin{enumerate}
\item Compute the leading eigenvector $\mathbf u=(u_i)$ of $M$.
\item Estimate $X$ via the vector $\hat X=(\operatorname{sgn}(u_i))_{i=1}^n$.
\end{enumerate}
The intuition is that w.h.p.\ $\mathbf u^*$ should be close to $\mathbf u$. Indeed, next week we intend to sketch the proof of the following theorem:
\begin{theorem}[{\cite[Thm 3.8]{CCFM}}]
Suppose that $\qin\gtrsim \log(n)/n$ and $\sqrt{\qin/n}=o(\qin-\qout)$. With probability exceeding $1-O(n^{-8})$, the spectral method achieves
\[
\max_{s\in\{\pm1\}}\frac1n\sum_{i=1}^n\mathbf 1\{X_i-s\hat X_i\}=1-o(1).
\]
\end{theorem}
Note the following special cases:
\textbf{Special Case 1:} $\qin-\qout\gg\sqrt{\log n}/n$ if $\qin\asymp \log(n)/n$
\textbf{Special Case 2:} $\qin-\qout\gg 1/\sqrt n$ if $\qin\asymp 1$
The proof of the theorem will use the following theorem, which we will not prove:
\begin{theorem}[{\cite[Thm 3.4]{CCFM}}]
Consider a symmetric random matrix $Y=[Y_{ij}]\in\RR^{n\times n}$, whose entries are independently generated, mean zero, and uniformly bounded by $B\in(0,\infty)$. Define
\[
v=\max_i\sum_j\EE[Y_{ij}^2].
\]
There there exists some universal constant $c>0$ such that for any $t\ge0$,
\[
\PP\left\{\norm Y\ge 4\sqrt\nu+t\right\}\le n\exp\left(-\frac{t^2}{cB^2}\right).
\]
\end{theorem}
In addition to this theorem, the proof will also use a version of the Davis--Kahan theorem. It is also true, though the Davis--Kahan theorem will not suffice to prove, that the spectral method will give us exact recovery:
\begin{theorem}[{\cite[Thm 4.6]{CCFM}}]
Fix $\varepsilon>0$. Suppose that $\qin=\alpha\log(n)/n$ and $\qout=\beta\log(n)/n$ for some sufficiently large constants $\alpha>\beta>0$. In addition, assume that
\[
(\sqrt{\qin}-\sqrt{\qout})^2\ge 2(1-\varepsilon)\frac{\log n}n.
\]
With probability $1-o(1)$, the spectral method yields $X=s\hat X$ for some $s\in\{\pm1\}$.
\end{theorem}
%\bibliography{mybib}
\bibliographystyle{alpha}
\begin{thebibliography}{77}
\bibitem{CCFM}
Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma,
\emph{Spectral Methods for Data Science: A Statistical Perspective},
\url{https://arxiv.org/pdf/2012.08496.pdf}
\end{thebibliography}
\end{document}