Covered topics for Math 431

Disclaimer: although the plan is to have a fairly detailed list of the covered topics, the list below might not cover everything that we discussed in class.

Week 1

Lecture 1, 01/22. Introduction and definition of probability space (based on section 1.1)

Syllabus
Introduction
Outcomes, Sample space.
Examples: coin flip, two coin flips, die rolls, waiting for the bus.
What is an event?
Probability space (sample space, collection of events, probability measure).
Axioms of probability (properties of the probability measure).

Lecture 2, 01/24. Constructions of probability spaces (based on sections 1.1, 1.2)

Direct product of sets.
Experiments with equally likely outcomes.
Uniformly chosen point from the interval or 2D shape.
Sampling (with replacement, without replacement, group sampling).

Week 2

Lecture 3, 01/27. Counting and geometric distribution (based on sections 1.2, 1.3)

Sampling examples.
Discrete probability space with infinitely many outcomes.
The probability of eventually getting head in a sequence of coin flips is 1.
Review of geometric series.

Lecture 4, 01/29. Computing probability by decomposing (based on sections 1.4)

Decomposing an event as the disjoint union of simpler events: the probability of getting heads in an even number of coin flips.
$\mathsf{\mathrm{P}}\left. \left( A^{\complement} \right) \right. = 1 - \mathsf{\mathrm{P}}\left. (A) \right.$.
If $B \subseteq A$ then $\mathsf{\mathrm{P}}\left. (B) \right. \leq \mathsf{\mathrm{P}}\left. (A) \right.$.
How to prove $P(\text{eventually we will get heads with a fair coin}) = 1$ using the monotonicity property of the probability measure.
Inclusion-exclusion to compute the probability of union of two events: $\mathsf{\mathrm{P}}\left. (A \cup B) \right. = \mathsf{\mathrm{P}}\left. (A) \right. + \mathsf{\mathrm{P}}\left. (B) \right. - \mathsf{\mathrm{P}}\left. (AB) \right.$.

Lecture 5, 01/31. Inclusion--Exclusion and Random variables (based on sections 1.4, 1.5)

Inclusion-exclusion formula for three and $n$ events.
Inclusion-exclusion examples.
Definition of a random variable.
Distribution of a random variable.
Discrete random variable, probability mass function.
Example of continuous random variable.

Week 3

Lecture 6, 02/03. Conditional Probability (based on section 2.1)

Conditional probability of $A$ given $B$ (with $\mathsf{\mathrm{P}}\left.(B) \right. > 0$): \[ \mathsf{\mathrm{P}}\left( A\,~|~\, B \right) = \frac{\mathsf{\mathrm{P}}\left. (AB) \right.}{\mathsf{\mathrm{P}}\left. (B) \right.}.\]
$\mathsf{\mathrm{P}}\left. \left( \, \cdot |\, B \right) \right.$ is a probability measure for any fixed $B$ with $\mathsf{\mathrm{P}}(B) > 0$.
The multiplication rule for conditional probabilities \[\begin{aligned} \mathsf{\mathrm{P}}\left. (AB) \right. & = \mathsf{\mathrm{P}}\left. \left( A\,~|~\, B \right) \right.\mathsf{\mathrm{P}}\left. (B) \right., \\ \mathsf{\mathrm{P}}\left. \left( A_{1}A_{2}\ldots A_{n} \right) \right. & = \mathsf{\mathrm{P}}\left. \left( A_{1} \right) \right.\mathsf{\mathrm{P}}\left. \left( A_{2}\,~|~\, A_{1} \right) \right.\mathsf{\mathrm{P}}\left. \left( A_{3}\,~|~\, A_{1}A_{2} \right) \right.\ldots\mathsf{\mathrm{P}}\left. \left( A_{1}\,~|~\, A_{2}\ldots A_{n - 1} \right) \right.. \end{aligned}\]
Law of total probability for two sets: $\mathsf{\mathrm{P}}(A) = \mathsf{\mathrm{P}}(B)\mathsf{\mathrm{P}}(A\,~|~\, B) + P\left. \left( B^{\complement} \right) \right.P( A| B^{\complement}).$

Lecture 7, 02/05. Bayes' formula (based on section 2.2)

Definition of a partition.
If $B_{1}$, $\ldots$, $B_{n}$ is a partition with $\mathsf{\mathrm{P}}\left. \left( B_{i} \right) \right. > 0$ then we have the law of total probability for partitions \[\mathsf{\mathrm{P}}\left. (A) \right. = \sum_{i = 1}^{n}\mathsf{\mathrm{P}}\left. \left( A\,~|~\, B_{i} \right) \right.\mathsf{\mathrm{P}}\left. \left( B_{i} \right) \right.\]
Bayes' formula: \[\mathsf{\mathrm{P}}\left. \left( B|A \right) \right. = \frac{\mathsf{\mathrm{P}}\left. \left( A\,~|~\, B \right) \right.\mathsf{\mathrm{P}}\left. (B) \right.}{\mathsf{\mathrm{P}}\left. \left( A\,~|~\, B \right) \right.\mathsf{\mathrm{P}}\left. (B) \right. + \mathsf{\mathrm{P}}\left. \left( A\,~|~\, B^{\complement} \right) \right.\mathsf{\mathrm{P}}\left. \left( B^{\complement} \right) \right.},\] Similarly with a partition $B_{1}$, $\ldots$, $B_{n}$.
Applications of Bayes' formula.

Lecture 8, 02/07. Independence (based on section 2.3)

Independence of two events $A$ and $B$: $\mathsf{\mathrm{P}}\left. (AB) \right. = \mathsf{\mathrm{P}}\left. (A) \right.\mathsf{\mathrm{P}}\left. (B) \right.$.
If $A$ and $B$ are independent then the same is true for (i) $A$ and $B^{\complement}$ (ii) $A^{\complement}$ and $B$ (iii) $A^{\complement}$ and $B^{\complement}$.
Independence on $n$ events, given $A_{1}$, $\ldots$, $A_{n}$: \[\mathsf{\mathrm{P}}\left. \left( A_{i_{1}}\ldots A_{i_{k}} \right) \right. = \mathsf{\mathrm{P}}\left. \left( A_{i_{1}} \right) \right. \cdot \ldots \cdot P\left. \left( A_{i_{k}} \right) \right.\] for all indices $1 \leq i_{1} < i_{2} < \ldots < i_{k} \leq n$.
How to compute probabilities of combinations of independent events.
Example: sampling with and without replacement.
Independence of events constructed from independent events e.g.: if $A$, $B$ and $C$ are independent then $A \cup B$ and $C^{\complement}$ are independent.

Week 4

Lecture 9, 02/10. Repeated Independent Trials (based on 2.4)

Independence of the random variables $X_{1}$, $\ldots$, $X_{n}$: for any collection of sets $B_{1},\ldots,B_{n} \subset \mathbb{R}$: \[\mathsf{\mathrm{P}}\left. \left( X_{1} \in B_{1},\ldots,X_{n} \in B_{n} \right) \right. = \mathsf{\mathrm{P}}\left. \left( X_{1} \in B_{1} \right) \right. \cdot \ldots \cdot \mathsf{\mathrm{P}}\left. \left( X_{n} \in B_{n} \right) \right..\]
Equivalent definition for the independence of discrete random variables $X_{1}$, $\ldots$, $X_{n}$: \[\mathsf{\mathrm{P}}\left. \left( X_{1} = x_{1},\ldots,X_{n} = x_{n} \right) \right. = \mathsf{\mathrm{P}}\left. \left( X_{1} = k_{1} \right) \right. \cdot \ldots \cdot \mathsf{\mathrm{P}}\left. \left( X_{i} = k_{i} \right) \right.\] for any $k_{1}$, $\ldots$, $k_{n}$.
The probability space of repeated independent trials with the same success probability. Named distributions constructed from a sequence of independent trials with the same success probability:
- Bernoulli with parameter $p$ (the outcome of a single trial).
- Binomial with parameters $n$ and $p$ (the number of successes out of $n$ trials).
- Geometric distribution with parameter $p$ (the number of trials needed for the first success).

Lecture 10, 02/12. Hypergeometric distribution and constructions from independent random variables (based on section 2.5)

Hypergeometric with parameters $N$, $N_{A}$, $n$ (the number of good balls in sampling).
Independence of random variables which are functions of independent variables.
Examples with different distributions.

Lecture 11, 02/14. Conditional Independence (based on section 2.5)

Conditional independence of events.
Conditional independence vs. independence.
The birthday problem: exact formula, and how to estimate it using the fact that $e^{- x} \approx 1 - x$ for small $x$.

Week 5

Lecture 12, 02/17. Continuous distributions and probability density function (based on section 3.1)

Continuous distributions: the definition of the probability density function.
Basic properties of the probability density function, how to identify a p.d.f.
How to compute probabilities using a p.d.f.
The uniform distribution on $\lbrack a,b\rbrack$.
How the value of the probability density function can be connected to the probability of the random variable being in a small interval.
Geometric example (sample point from 2D shape).

Lecture 13, 02/19. Cumulative distribution function (based on section 3.2)

The cumulative distribution function of a random variable, definition, basic properties.
How does the CDF of a discrete and a continuous random variable look like.
How to identify the probability mass function from the CDF and vice versa.
How to identify the CDF of a continuous random variable, and how to compute its p.d.f.~from the CDF.

Lecture 14, 02/21. Expectation I (based on section 3.3)

The expected value of a random variable as the weighted average of possible values.
Computing the expected value for discrete and continuous random variables.
Computing the expected value for uniform and Bernoulli random variables.
Computing the expected value for geometric random variable.
Random variables with infinite or undefined expectation
Expectation of a function of a random variable: computing $\operatorname{E}\left. \left\lbrack g\left. (X) \right. \right\rbrack \right.$ for discrete and continuous r.v. $X$.
The $n^{\text{th}}$ moment of a random variable.

Week 6

Lecture 15, 02/24. Expectation II (based on 3.3, 8.1, 8.2)

How does the expectation changes under a linear function: $\operatorname{E}\lbrack a\rbrack = a$, $\operatorname{E}\lbrack aX\rbrack = a\operatorname{E}\lbrack X\rbrack$, $\operatorname{E}\left. \lbrack aX + b\rbrack \right. = a\operatorname{E}\left. \lbrack X\rbrack \right. + b$.
Linearity of expectation: if $X_{1}$, $\ldots$, $X_{n}$ are random variables then $\operatorname{E}\left. \left\lbrack g_{1}\left. \left( X_{1} \right) \right. + \ldots + g_{n}\left. \left( X_{n} \right) \right. \right\rbrack \right. = \operatorname{E}\left. \left\lbrack g_{1}\left. \left( X_{1} \right) \right. \right\rbrack \right. + \ldots + \operatorname{E}\left. \left\lbrack g_{n}\left. \left( X_{n} \right) \right. \right\rbrack \right.$. (If all expectations are well-defined.) In particular $\operatorname{E}\left. \left\lbrack X_{1} + \ldots + X_{n} \right\rbrack \right. = \operatorname{E}\left. \left\lbrack X_{1} \right\rbrack \right. + \ldots + E\left. \left\lbrack X_{n} \right\rbrack \right.$.
Expected value of an indicator random variable.
Computing the expected value of binomial random variable.
The indicator method: if the random variable $X$ is nonnegative integer valued then often it can be represented as the sum of indicator random variables (i.e. random variables that are 0 or 1). Then the expected value of $X$ is just the sum of the expectations of the indicators. Since the expectation of an indicator is just the probability that it is equal to 1, this method can lead to simpler computation then going through the original definition of the expectation (using the pmf).
Examples for the indicator method.
Expectation and independence: if $X_{1}$, $\ldots$, $X_{n}$ are independent then $\operatorname{E}\left. \left\lbrack g_{1}\left. \left( X_{1} \right) \right. \cdot \ldots \cdot g_{n}\left. \left( X_{n} \right) \right. \right\rbrack \right. = \operatorname{E}\left. \left\lbrack g_{1}\left. \left( X_{1} \right) \right. \right\rbrack \right. \cdot \ldots \cdot \operatorname{E}\left. \left\lbrack g_{n}\left. \left( X_{n} \right) \right. \right\rbrack \right.$. (If all expectations are well-defined.) In particular, $\operatorname{E}\left. \left\lbrack X_{1}X_{2} \cdot \ldots \cdot X_{n} \right\rbrack \right. = \operatorname{E}\left. \left\lbrack X_{1} \right\rbrack \right. \cdot \ldots \cdot \operatorname{E}\left. \left\lbrack X_{n} \right\rbrack \right.$. (Independence is crucial here, the statement does not hold in general without it!)
Median

Lecture 16, 02/26. Varience (based on 3.4, 8.2)

The variance of a random variable: $\operatorname{Var}(X) = \operatorname{E}\left\lbrack \left( X - \operatorname{E}X \right)^{2} \right\rbrack$.
Another way to compute the variance: $\operatorname{Var}(X) = \operatorname{E}\left\lbrack X^{2} \right\rbrack{- \left( \operatorname{E}\lbrack X\rbrack \right)}^{2}$.
Variance of Bernoulli random variable.
Variance uniform random variables.
How does the expectation and variance changes under a linear function: $\operatorname{E}\lbrack aX + b\rbrack = a\operatorname{E}\lbrack X\rbrack + b$, $\operatorname{Var}(aX + b) = a^{2}\operatorname{Var}(X)$.
Variance of binomial random variable.
If $X_{1}$, $\ldots$, $X_{n}$ are independent with finite variances then $\operatorname{Var}\left. \left( X_{1} + \ldots + X_{n} \right) \right. = \operatorname{Var}\left. \left( X_{1} \right) \right. + \ldots + \operatorname{Var}\left. \left( X_{n} \right) \right.$.

Lecture 17, 02/28. Normal Distribution (based on 3.5)

The standard normal (or gaussian) distribution (the p.d.f. $\varphi$ and CDF $\Phi$, expectation and variance).
The symmetry of the standard normal: $\Phi\left. ( - x) \right. = 1 - \Phi\left. (x) \right.$.
How to compute probabilities involving a standard normal random variable using the table in the Appendix.
The normal distribution with parameters $\mu$ and $\sigma^{2} > 0$.
If $Z \sim \mathsf{\mathrm{N}}\left. (0,1) \right.$ then $\sigma Z + \mu \sim \mathsf{\mathrm{N}}\left. \left( \mu,\sigma^{2} \right) \right.$.
If $X \sim \mathsf{\mathrm{N}}\left. \left( \mu,\sigma^{2} \right) \right.$ then $\frac{X - \mu}{\sigma} \sim \mathsf{\mathrm{N}}\left. (0,1) \right.$.
Expressing probabilities involving a general normal random variable in terms of $\Phi$.
Why Normal distribution appears everywhere.

Week 7

Lecture 18, 03/03. CLT for binomial random variable (based on section 4.1)

The Central Limit Theorem for binomial random variables: if $0 < p < 1$ is fixed and $S_{n} \sim \operatorname{Bin}\left. (n,p) \right.$ then for any $a < b$ we have \[\lim\limits_{n \rightarrow \infty}\mathsf{\mathrm{P}}\left. \left( a \leq \frac{S_{n} - np}{\sqrt{np\left. (1 - p) \right.}} \leq b \right) \right. = \Phi\left. (b) \right. - \Phi\left. (a) \right..\]
In words: if we center a binomial random variable with its mean and divide it with its standard deviation then this scaled random variable will get closer and closer to a standard normal distribution as $n \rightarrow \infty$.
Practical use: if $np\left. (1 - p) \right. > 10$ and $S_{n} \sim \operatorname{Bin}(n,p)$ then any probability involving $\frac{S_{n} - np}{\sqrt{np\left. (1 - p) \right.}}$ can be approximated with the same probability involving a standard normal. E.g. \[\mathsf{\mathrm{P}}\left. \left( a \leq \frac{S_{n} - np}{\sqrt{np\left. (1 - p) \right.}} \leq b \right) \right. \approx \Phi\left. (b) \right. - \Phi\left. (a) \right.,\quad\quad\mathsf{\mathrm{P}}\left. \left( \frac{S_{n} - np}{\sqrt{np\left. (1 - p) \right.}} \leq b \right) \right. \approx \Phi\left. (b) \right..\]
Continuity correction: if $np\left. (1 - p) \right. > 10$ and $S_{n} \sim$Bin$\left. (n,p) \right.$ and we want to estimate a probability of the form of $\mathsf{\mathrm{P}}\left. \left( k_{1} \leq S_{n} \leq k_{2} \right) \right.$ with $k_{1}$, $k_{2}$ integers, then it often helps to rewrite the probability as \[\mathsf{\mathrm{P}}\left. \left( k_{1} \leq S_{n} \leq k_{2} \right) \right. = \mathsf{\mathrm{P}}\left. \left( k_{1} - \frac{1}{2} \leq S_{n} \leq k_{2} + \frac{1}{2} \right) \right.\] and use the normal approximation for the modified endpoints.

Lecture 19, 03/05. Law of large numbers and applications of CLT (based on sections 4.2, 4.3)

The weak law of large numbers for binomial random variables. Fix $0 < p < 1$ and $0 < \varepsilon$. Then if $S_{n} \sim \operatorname{Bin}(n,p)$ then \[\lim\limits_{n \rightarrow \infty}\mathsf{\mathrm{P}}\left. \left( \left| \frac{S_{n}}{n} - p \right| < \varepsilon \right) \right. = 1.\]
Let $S_{n} \sim \operatorname{Bin}\left. (n,p) \right.$ denote the number of successes in a sequence of independent trials with an unknown success probability $p$. Then a natural estimate for $p$ is ${\hat{p}}_{n} = \frac{S_{n}}{n}$ (the frequency of successes) and we can estimate the error using the normal approximation as \[\begin{aligned} \mathsf{\mathrm{P}}\left. \left( \left| {\hat{p}}_{n} - p \right| \leq \varepsilon \right) \right. & \geq 2\Phi\left. \left( 2\varepsilon\sqrt{n} \right) \right. - 1. \end{aligned}\]
The definition of a confidence interval corresponding to a certain percentage.
Application to polling: for a large population sampling without replacement (which would give a hypergeometric distribution) is close to sampling with replacement (which gives a binomial distribution), and thus we can apply the normal approximation.

Lecture 20, 03/07. Poisson approximation (based on section 4.4)

The Poisson$\left. (\lambda) \right.$ distribution: probability mass function.
If $\lambda > 0$ is fixed then the p.m.f.~of $\operatorname{Bin}\left. (n,\lambda/n) \right.$ converges to the p.m.f.~of a $\operatorname{Poisson}\left. (\lambda) \right.$ distribution.
Mean and variance of the Poisson distribution.
The Poisson approximation of the binomial distribution: if $np^{2} < 0.2$, then a $\operatorname{Bin}\left. (n,p) \right.$ distribution can be approximated with a $\operatorname{Poisson}\left. (np) \right.$ random variable.
The Poisson distribution as a model for counting rare events.

Week 8

Lecture 21, 03/10. Overview of Poisson process (based on section 4.6)

A quick overview of the Poisson process. (Not needed for the exams!)
The gamma distribution. (Not needed for the exams!)
The exponential distribution with parameter $\lambda > 0$: CDF, p.d.f.

Lecture 22, 03/12. Exponential distribution and Moment generating function (based on sections 4.5, 5.1)

The expected value and variance of the exponential.
$T \sim \operatorname{Exp}(\lambda) \Longrightarrow aT \sim \operatorname{Exp}(\lambda/a)$.
The memoryless property of the exponential.
The memoryless property of the geometrical.
The exponential distribution with parameter $\lambda > 0$ as the limit of $\frac{T_{n}}{n}$ where $T_{n} \sim \operatorname{Geom}(\lambda/n)$. (waiting for a rare event in continuous time)
The moment generating function of the random variable $X$: $M_{X}\left. (t) \right. = \operatorname{E}\left. \left( e^{tX} \right) \right.$.
Identifying the p.m.f. of a discrete random variable from the moment generating function.
Computing the moment generating function of various random variables.

Lecture 23, 03/14. Moment generating function and distributions (based on section 5.1, 8.3)

Computing the moment generating function of various random variables.
Computing the moments of a random variable using the moment generating function: \[\operatorname{E}\left. \left( X^{n} \right) \right. = \left. \frac{d^{n}}{dt^{n}}M(t) \right|_{t = 0}\] if the moment generating function is finite in the neighborhood of 0.
High moments of exponential and Gaussian distributions.
The moment generating function identifies the distribution of the random variable (if it is finite in a neighborhood of 0).
If $X$, $Y$ are independent then the moment generating function of $X + Y$ is the product of the moment generating functions of $X$ and $Y$ : $M_{X + Y}\left. (t) \right. = M_{X}\left. (t) \right.M_{Y}\left. (t) \right.$.
Sum of two independent Poisson random variables is Poisson.
If $X \sim \mathsf{\mathrm{N}}\left. \left( \mu_{1},\sigma_{1}^{2} \right) \right.$ and $Y \sim \mathsf{\mathrm{N}}\left. \left( \mu_{2},\sigma_{2}^{2} \right) \right.$ are independent then $X + Y \sim \mathsf{\mathrm{N}}\left. \left( \mu_{1} + \mu_{2},\sigma_{1}^{2} + \sigma_{2}^{2} \right) \right.$.

Week 9

Lecture 24, 03/17. Function of random variable (based on section 5.2)

Computing the p.m.f. of $g\left. (X) \right.$ if $X$ is discrete
Computing the p.d.f. of $Y = g\left. (X) \right.$ using the CDF method if $X$ has a density function, $g$ has a nonzero derivative apart from maybe finitely many points.
Outline:
- Write down the p.d.f. and CDF of $X$ (if possible),
- Identify the support of $Y$ (if possible),
- Compute the cdf of $Y$ by rewriting $\mathsf{\mathrm{P}}\left. (Y \leq y) \right. = \mathsf{\mathrm{P}}\left. \left( g\left. (X) \right. \leq y \right) \right.$ in terms of $X$ and the CDF of $X$, (You will need to solve the inequality $g\left. (X) \right. \leq y$ for $X$.)
- Differentiate the CDF of $Y$ to get the pdf. (Be careful with case defined functions!)

Lecture 25, 03/19. Joint distribution of discrete random variables (based on section 6.1)

Random vectors.
Joint distribution of random variables.
The joint probability mass function of random variables.
How to use the joint pmf to compute various probabilities about random variables.
How to compute the marginal pmf from the joint pmf.
How to check for independence of discrete random variables form the joint pmf.
How to compute the expectation of a function of several discrete random variables.
The multinomial distribution.

Lecture 26, 03/21. Joint continuous distribution of random variables (based on section 6.2)

Jointly continuous random variables, the joint probability density function.
The joint cumulative distribution function. Connection to the joint pdf for jointly continuous random variables.
How to compute a probability involving jointly continuous random variables using the joint pdf.
How to compute the expectation of a function of several jointly continuous random variables.
How to compute the marginal density function from the joint pdf.
The uniform distribution on 2 or 3 dimensional regions.

Week 10

Lecture 27, 03/31. Joint distribution and independence (based on 6.3)

How to check for independence of discrete random variables form the joint pmf.
How to check for independence of jointly continuous random variables.
Finding the minimum of two independent exponential (or geometric) random variables.
Zero probability events for jointly continuous random variables.
Example for two continuous random variables that are not jointly continuous.

Lecture 28, 04/02. Sum of independent random variables I (based on 7.1)

Sum of two independent discrete or continuous random variables.
$X$, $Y$ are independent and discrete then \[\mathsf{\mathrm{P}}\left. (X + Y = n) \right. = \sum_{k}\mathsf{\mathrm{P}}\left. (X = k) \right.\mathsf{\mathrm{P}}\left. (Y = n - k) \right..\]
Sum of independent geometric random variables with the same success probability: the negative binomial distribution.
Sum of independent Poisson random variables is Poisson.
Sum of independent Bernoulli with the same success probability is binomial.
Sum of independent binomials with the same success probability is also binomial.
$X$, $Y$ are independent and continuous then the p.d.f. of $X + Y$ is given by \[f_{X + Y}\left. (z) \right. = \int_{- \infty}^{\infty}f_{X}\left. (x) \right.f_{Y}\left. (z - x) \right.dx.\]

Lecture 29, 04/04. Sum of independent random variables II (based on 7.1)

Sum of two independent discrete or continuous random variables.
The sum of two independent uniform random variables.
The sum of two independent exponentials with the same parameter.
The gamma distribution.
If $X \sim \mathsf{\mathrm{N}}\left. \left( \mu_{1},\sigma_{1}^{2} \right) \right.$ and $Y \sim \mathsf{\mathrm{N}}\left. \left( \mu_{2},\sigma_{2}^{2} \right) \right.$ are independent then $X + Y \sim \mathsf{\mathrm{N}}\left. \left( \mu_{1} + \mu_{2},\sigma_{1}^{2} + \sigma_{2}^{2} \right) \right.$.
The linear combination of independent normals is also normal.\\

Week 11

Lecture 30, 04/07. Exchangibility (based on 7.2)

Exchangeable random variables: definition, properties
If $X_{1}$, $\ldots$, $X_{k}$ are independent and identically distributed (i.i.d.) then they are exchangeable.
If $X_{1}$, $\ldots$, $X_{k}$ is a sample without replacement from the set $\left\{ 1,2,\ldots,n \right\}$ then the random variables are exchangeable.
Applications of exchangeability.
Indicator random variable.
The indicator method: if the random variable $X$ is nonnegative integer valued then often it can be represented as the sum of indicator random variables (i.e. random variables that are 0 or 1). Then the expected value of $X$ is just the sum of the expectations of the indicators. Since the expectation of an indicator is just the probability that it is equal to 1, this method can lead to simpler computation then going through the original definition of the expectation (using the p.m.f.).
The expected value of the number of aces in a randomly chosen hand of 5.
The expected value of a hypergeometric.

Lecture 31, 04/09. The Indicator method (based on 8.1)

Examples for the indicator method.
The expected value of a negative hypergeometric.
The coupon collector's problem.

Lecture 32, 04/11. Covariance and correlation I (based on 8.4)

The definition of covariance $\operatorname{Cov}(X,Y) = \operatorname{E}\left. \left\lbrack \left( X - \operatorname{E}\lbrack X\rbrack \right)\left( Y - \operatorname{E}\lbrack Y\rbrack \right) \right\rbrack \right. = \operatorname{E}\lbrack XY\rbrack - \operatorname{E}\lbrack X\rbrack\operatorname{E}\lbrack Y\rbrack$.
$\operatorname{Cov}(X,X) = \operatorname{Var}(X)$, $\operatorname{Cov}(X,Y) = \operatorname{Cov}(Y,X)$
The definition of uncorrelated, negatively/positively correlated.
The covariance of indicator random variables.
The variance of $X_{1} + \ldots + X_{n}$ in the general case: \[\operatorname{Var}(X_{1} + \ldots + X_{n}) = \sum_{i = 1}^{n}\operatorname{Var}(X_{i}) + 2\sum_{1 \leq i < j \leq n}\operatorname{Cov}(X_{i},X_{j})\]
Example: using the indicator method to compute the variance of the hypergeometric distribution,

Week 12

Lecture 33, 04/14. Covariance and correlation II (based on 8.4)

$\operatorname{Cov}(aX + b,Y) = a\operatorname{Cov}(X,Y)$,

\[\operatorname{Cov}(\sum_{i = 1}^{n}a_{i}X_{i},\sum_{j = 1}^{m}b_{j}Y_{j}) = \sum_{i = 1}^{n}\sum_{j = 1}^{m}a_{i}b_{j}\operatorname{Cov}(X_{i},Y_{j}).\]

The definition of the correlation coefficient: \[\operatorname{Corr}(X,Y) = \frac{\operatorname{Cov}(X,Y)}{\sqrt{\operatorname{Var}(X)}\sqrt{\operatorname{Var}(Y)}}.\]
Properties of the correlation coefficient: $- 1 \leq \operatorname{Corr}(X,Y) \leq 1$, if $\operatorname{Corr}(X,Y) = \pm 1$ then there is a linear relation between $X$ and $Y$
Covariance and correlation of multinomial random variables

Lecture 34, 04/16. Tail Probabilities and The Law of Large Numbers (based on 9.1, 9.2)

Estimating probabilities of the form $\mathsf{\mathrm{P}}\left. (X \geq c) \right.$ using Markov's inequality.
Using Chebyshev's inequality to estimate $\mathsf{\mathrm{P}}\left. (X \geq c) \right.$, improved estimate in case of a symmetric random variable
The expected value and variance of the sample mean: if $X_{1}$, $\ldots$, $X_{n}$ are iid with expectation $\mu$ and variance $\sigma^{2}$ then \[\operatorname{E}\left. \left\lbrack \frac{X_{1} + \ldots + X_{n}}{n} \right\rbrack \right. = \mu,\quad\quad\operatorname{Var}\left. \left( \frac{X_{1} + \ldots + X_{n}}{n} \right) \right. = \frac{\sigma^{2}}{n}.\]
The weak law of large numbers for i.i.d.~random variables with a finite variance.
The strong law of large numbers

Lecture 35, 04/18. Central Limit Theorem (based on 9.2, 9.3)

The Central Limit Theorem for i.i.d.~random variables with a finite mean and variance.
Examples of usage of LLN and CLT.
Random walk and necessity of conditional expectation.

Week 13

Lecture 36, 04/21. Conditional distribution for discrete random variable I (based on 10.1)

Conditional probability mass function and conditional expectation of a discrete random variable with respect to an event $B$ with $\mathsf{\mathrm{P}}\left. (B) \right. > 0$.
The averaging principle: how to get the unconditional pmf or expectation from the conditional ones if we have a partition.
How to compute the expected value of a geometric random variable using conditioning
Conditioning a discrete random variable with respect to the outcome of another discrete random variable.

Lecture 37, 04/23. Conditional distribution for discrete random variable II (based on 10.1)

Discrete random variables $X$ and $Y$ are independent if $p_{X|Y}\left. \left( x|y \right) \right. = p_{X}\left. (x) \right.$ for all $x$, $y$ with $p_{Y}\left. (y) \right. > 0$.
Example of using conditional distribution: conditional binomial.
Example of using conditional distribution: first step analysis.
Example of using conditional distribution: Poisson process.

Lecture 38, 04/25. Conditional distribution for continuous random variable (based on 10.2)

Conditional distribution of jointly continuous random variables. The conditional probability density function of $X$ given $Y = y$.
Computing conditional probabilities and conditional expectations using the conditional probability density function.
Jointly continuous random variables $X$ and $Y$ are independent if $f_{X|Y}\left. \left( x|y \right) \right. = f_{X}\left. (x) \right.$ for all $x$, $y$ if $f_{Y}\left. (y) \right. > 0$.

Week 14

Lecture 39, 04/28. Conditional Expectation (based on 10.3)

The conditional expectation of $X$ given $Y$: the random variable $\operatorname{E}\left. \left\lbrack X|Y \right\rbrack \right.$.
Basic properties of $\operatorname{E}\left. \left\lbrack X|Y \right\rbrack \right.$.
$\operatorname{E}\left. \left\lbrack \operatorname{E}\left. \left\lbrack X|Y \right\rbrack \right. \right\rbrack \right. = \operatorname{E}\left. \lbrack X\rbrack \right.$.
Further conditional expectation examples.

Lecture 40, 04/30. Conditional Expectation (based on 10.3)

Further conditional expectation examples.
Examples from statistics.
Examples form financial mathematics.
Examples from Poisson processes.

Lecture 41, 05/02. Final Review

Review.