Quicklinks: Back to my homepage

In Fall 2023, MATH/STAT 833 will cover selected topics from the notes below. All announcements for the course will be made on Piazza, which you can also access through Canvas.

*Note*: If you are currently on the waitlist, you can view the latest announcements
on the public Piazza course page.

The goal of these notes is to give an introduction to
fundamental models and techniques in graduate-level
modern discrete probability.
Topics are taken mostly from probability on graphs:
**percolation**,
**random graphs**,
**Markov random fields**,
**random walks on graphs**,
etc.
No attempt is made at covering these areas in depth.
Rather the emphasis is on developing and illustrating **common and important techniques**.
Various **applications**, in particular in the theoretical foundations of data science and machine learning, are
also discussed along the way.

The notes are aimed at graduate students in mathematics, statistics, computer science, electrical engineering, physics, economics, etc. with previous exposure to basic probability theory (ideally measure-theoretic probability theory; at Wisconsin, Math 733; my own course notes) and stochastic processes (at Wisconsin, Math 632). These notes were developed for a one-semester course at UW-Madison (Fall 2014, Fall 2017, Fall 2020, and Fall 2023). I also gave a version of this course at the TRIPODS Summer School on Fundamentals of Data Analysis. The slides are here.

Slides for background sections are also available below.

Much of the material covered can also be found in the following excellent texts (conveniently available online):

- [vdH] Random Graphs and Complex Networks. Vol. I by van der Hofstad
- [LP] Probability on Trees and Networks by Lyons and Peres
- [LPW] Markov Chains and Mixing Times by Levin, Peres and Wilmer
- [St] A Mini Course on Percolation Theory by Steif
- [Gr] Probability on Graphs by Grimmett
- [Lu] Concentration-of-measure Inequalities by Lugosi
- [Ve] High-dimensional probability: An introduction with applications in data science by Vershynin
- [Du] Random Graph Dynamics by Durrett
- [vH] Probability in High Dimension by van Handel

Full notes in one file (12MB) (updated: mar 6, 2023)

Front matter (table of contents, preface, notation) (updated: mar 6, 2023)

*Chapter 1:* Introduction (updated: mar 6, 2023)

**Review:**basics of graph theory and Markov chain theory.**Models:**percolation, random graphs, Ising model, Glauber dynamics, random walks on networks.**Summary:**In this chapter we describe a few discrete probability models to which we will come back repeatedly throughout. While there exists a vast array of well-studied random combinatorial structures (permutations, partitions, urn models, Boolean functions, polytopes, etc.), our focus is primarily on a limited number of graph-based processes, namely percolation, random graphs, the Ising model, and random walks on graphs. After a brief review of graph basics and Markov chains theory, we formally introduce our main models. We also formulate various key questions about these models that will be answered (at least partially) later on.

*Chapter 2:* Moments and tails (updated: mar 6, 2023)

**Review:**Markov's inequality, Chebyshev's inequality, moment-generating function.**Techniques:**probabilistic method, first moment principle, second moment method, Chernoff-Cramer method, sub-Gaussian and sub-exponential variables, epsilon-nets, chaining.**Examples:**random k-SAT threshold, percolation on trees and lattices (critical value), Erdos-Renyi graph (containment, connectivity), stochastic knapsack problem, Johnson-Lindenstrauss, VC theory.**Summary:**In this chapter we look at the moments of a random variable. Specifically we demonstrate that moments capture useful information about the tail of a random variable while often being simpler to compute or at least bound. Several well-known inequalities quantify this intuition. Although they are straightforward to derive, such inequalities are surprisingly powerful. Through a range of applications, we illustrate the utility of controlling the tail of a random variable, typically by allowing one to dismiss certain "bad events" as rare. Two applications in data science are also introduced: sparse recovery and empirical risk minimization.

*Chapter 3:* Martingales and potentials (updated: mar 6, 2023)

**Review:**stopping times, martingales.**Techniques:**concentration for martingales, method of bounded differences, Talagrand's inequality, slicing method, electrical networks.**Examples:**Markov chains (hitting times, cover times, recurrence), percolation on trees (critical regime), Erdos-Renyi graphs (chromatic number), preferential attachment graphs (degree sequence), uniform spanning trees (Wilson's method), bandit problems.**Summary:**In this chapter we turn to martingales, which play a central role in probability theory. We illustrate their use in a number of applications to the analysis of discrete stochastic processes. After some background on stopping times and a brief review of basic martingale properties and results, we develop two major directions. We show how martingales can be used to derive a substantial generalization of our previous concentration inequalities---from the sums of independent random variables we focused on previously to nonlinear functions with Lipschitz properties. In particular, we give several applications of the method of bounded differences to random graphs. We also discuss bandit problems in machine learning. In the second thread, we give an introduction to potential theory and electrical network theory for Markov chains.

*Chapter 4:* Coupling (updated: mar 6, 2023)

**Review:**coupling inequality, maximal coupling, Markovian coupling.**Techniques:**stochastic domination, Strassen's theorem, correlation inequalities (FKG, Holley), path coupling.**Examples:**harmonic functions on lattices and trees, Markov chains (mixing time, Glauber dynamics), Erdos-Renyi graph (degree sequence, Janson's inequality), percolation on lattices (RSW theory, Harris' theorem), and Ising model on lattices (extremal measures).**Summary:**In this chapter we move on to coupling, another probabilistic technique with a wide range of applications (far beyond discrete stochastic processes). The idea behind the coupling method is deceptively simple: to compare two probability measures, it is sometimes useful to construct a joint probability space with the corresponding marginals. We begin by defining coupling formally and deriving its connection to the total variation distance through the coupling inequality. Then we introduce the concept of stochastic domination and some related correlation inequalities. Coupling of Markov chains is the next topic, where it serves as a powerful tool to derive mixing time bounds. Finally, we end with the Chen-Stein method for Poisson approximation, a technique that applies in particular in some natural settings with dependent variables.

*Chapter 5:* Spectral methods (updated: mar 6, 2023)

**Review:**spectral theorem, Courant-Fischer, perturbation, spectral graph theory.**Techniques:**spectral gap, canonical paths, bottleneck ratio, expander graphs.**Examples:**Markov chains (mixing time, Glauber dynamics, Varopoulos-Carne), expander graphs, community detection.**Summary:**In this chapter, we develop spectral techniques. We highlight some applications to Markov chain mixing and network analysis. The main tools are the spectral theorem and the variational characterization of eigenvalues, which we review together with some related results. We also give a brief introduction to spectral graph theory and detail an application to community recovery. Then we apply the spectral theorem to reversible Markov chains. In particular we define the spectral gap and establish its close relationship to the mixing time. We also show in that the spectral gap can be bounded using certain isoperimetric properties of the underlying network.

*Chapter 6:* Branching processes (updated: mar 6, 2023)

**Review:**Galton-Watson processes, extinction, multitype branching processes.**Techniques:**Random-walk representation, duality principle, comparison to branching processes.**Examples:**percolation on trees (critical exponents) and on Galton-Watson trees, Erdos-Renyi graph (phase transition), random binary search trees, the reconstruction problem.**Summary:**Branching processes, which are the focus of this chapter, arise naturally in the study of stochastic processes on trees and locally tree-like graphs. Similarly to martingales, finding a hidden branching process within a probabilistic model can lead to useful bounds and insights into asymptotic behavior. After a review of extinction theory for branching processes and of a useful random-walk perspective, we give a couple examples of applications in discrete probability. In particular we analyze the height of a binary search tree, a standard data structure in computer science. We also give an introduction to phylogenetics, where a "multitype" variant of the Galton-Watson branching process plays an important role. We end this chapter with a detailed look into the phase transition of the Erdos-Renyi graph model.

Back matter (appendices, bibliography, index) (updated: mar 6, 2023)

Slides covering mostly background material (and more) in each chapter:

- Slides for Chapter 1 (Part 1), Slides for Chapter 1 (Part 2) (last updated: sep 25, 2020)
- Slides for Chapter 3 (last updated: jan 2, 2015)
- Slides for Chapter 4 (last updated: nov 29, 2014)
- Slides 1/review 1 (spectral theorem and spectral graph theory), slides 2/review 2 (matrix perturbation results), slides 3 (stochastic blockmodel and community detection), slides 4 (spectral techniques for Markov chains), for Chapter 5 (last updated: nov 30, 2020)
- Slides for Chapter 6 (last updated: nov 15, 2014)

Last updated: aug 29, 2023.