The goal of these notes is to give an introduction to
fundamental models and techniques in graduate-level
modern discrete probability.
Topics are taken mostly from probability on graphs:
Markov random fields,
random walks on graphs,
No attempt is made at covering these areas in depth.
Rather the emphasis is on developing and illustrating common and important techniques.
Various applications, in particular in the theoretical foundations of data science and machine learning, are
also discussed along the way.
The notes are aimed at graduate students
in mathematics, statistics, computer science, electrical
engineering, physics, economics, etc.
with previous exposure to basic probability theory
(ideally measure-theoretic probability theory;
at Wisconsin, Math 733;
my own course notes)
and stochastic processes
(at Wisconsin, Math 632).
These notes were developed for a one-semester course
(Fall 2014, Fall 2017, Fall 2020, and Fall 2023).
I also gave a version of this course at the TRIPODS
Summer School on Fundamentals of Data Analysis.
The slides are here.
Slides for background sections are also available below.
Much of the material covered can also be found in the following excellent texts
(conveniently available online):
Review: basics of graph theory and Markov chain theory.
random walks on networks.
In this chapter we describe a few discrete probability models to which we will come back repeatedly throughout. While there exists a vast array of well-studied random combinatorial structures (permutations, partitions, urn models, Boolean functions, polytopes, etc.), our focus is primarily on a limited number of graph-based processes, namely percolation, random graphs, the Ising model, and random walks on graphs. After a brief review of graph basics and Markov chains theory, we formally introduce our main models. We also formulate various key questions about these models that will be answered (at least partially) later on.
first moment principle,
second moment method,
sub-Gaussian and sub-exponential variables,
random k-SAT threshold,
percolation on trees and lattices (critical value),
Erdos-Renyi graph (containment, connectivity),
stochastic knapsack problem,
In this chapter we look at the moments of a random variable. Specifically we demonstrate that moments capture useful information about the tail of a random variable while often being simpler to compute or at least bound. Several well-known inequalities quantify this intuition. Although they are straightforward to derive, such inequalities are surprisingly powerful. Through a range of applications, we illustrate the utility of controlling the tail of a random variable, typically by allowing one to dismiss certain "bad events" as rare. Two applications in data science are also introduced: sparse recovery and empirical risk minimization.
concentration for martingales,
method of bounded differences,
Markov chains (hitting times, cover times, recurrence),
percolation on trees (critical regime),
Erdos-Renyi graphs (chromatic number),
preferential attachment graphs (degree sequence),
uniform spanning trees (Wilson's method),
In this chapter we turn to martingales, which play a central role in probability theory. We illustrate their use in a number of applications to the analysis of discrete stochastic processes. After some background on stopping times and a brief review of basic martingale properties and results, we develop two major directions. We show how martingales can be used to derive a substantial generalization of our previous concentration inequalities---from the sums of independent random variables we focused on previously to nonlinear functions with Lipschitz properties. In particular, we give several applications of the method of bounded differences to random graphs. We also discuss bandit problems in machine learning. In the second thread, we give an introduction to potential theory and electrical network theory for Markov chains.
harmonic functions on lattices and trees,
Markov chains (mixing time, Glauber dynamics),
Erdos-Renyi graph (degree sequence, Janson's inequality),
percolation on lattices (RSW theory, Harris' theorem),
and Ising model on lattices (extremal measures).
In this chapter we move on to coupling, another probabilistic technique with a wide range of applications (far beyond discrete stochastic processes). The idea behind the coupling method is deceptively simple: to compare two probability measures, it is sometimes useful to construct a joint probability space with the corresponding marginals. We begin by defining coupling formally and deriving its connection to the total variation distance through the coupling inequality. Then we introduce the concept of stochastic domination and some related correlation inequalities. Coupling of Markov chains is the next topic, where it serves as a powerful tool to derive mixing time bounds. Finally, we end with the Chen-Stein method for Poisson approximation, a technique that applies in particular in some natural settings with dependent variables.
Markov chains (mixing time, Glauber dynamics, Varopoulos-Carne),
In this chapter, we develop spectral techniques. We highlight some applications to Markov chain mixing and network analysis. The main tools are the spectral theorem and the variational characterization of eigenvalues, which we review together with some related results. We also give a brief introduction to spectral graph theory and detail an application to community recovery. Then we apply the spectral theorem to reversible Markov chains. In particular we define the spectral gap and establish its close relationship to the mixing time. We also show in that the spectral gap can be bounded using certain isoperimetric properties of the underlying network.
Random-walk representation, duality principle, comparison to branching processes.
percolation on trees (critical exponents) and on Galton-Watson trees,
Erdos-Renyi graph (phase transition),
random binary search trees,
the reconstruction problem.
Branching processes, which are the focus of this chapter, arise naturally in the study of stochastic processes on trees and locally tree-like graphs. Similarly to martingales, finding a hidden branching process within a probabilistic model can lead to useful bounds and insights into asymptotic behavior. After a review of extinction theory for branching processes and of a useful random-walk perspective, we give a couple examples of applications in discrete probability. In particular we analyze the height of a binary search tree, a standard data structure in computer science. We also give an introduction to phylogenetics, where a
"multitype" variant of the Galton-Watson branching process plays an important role. We end this chapter with a detailed look into the phase transition of the Erdos-Renyi graph model.