Covered topics for Math 431
Disclaimer: although the plan is to have a fairly
detailed list of the covered topics, the list below might not cover
everything that we discussed in class.
Week 1
Lecture 1, 01/22. Introduction and definition of probability
space (based on section 1.1)
- Syllabus
- Introduction
-
Outcomes, Sample space.
-
Examples: coin flip, two coin flips, die rolls, waiting for the bus.
-
What is an event?
-
Probability space (sample space, collection of events, probability
measure).
-
Axioms of probability (properties of the probability measure).
Lecture 2, 01/24. Constructions of probability spaces (based
on sections 1.1, 1.2)
-
Direct product of sets.
-
Experiments with equally likely outcomes.
-
Uniformly chosen point from the interval or 2D shape.
-
Sampling (with replacement, without replacement, group sampling).
Week 2
Lecture 3, 01/27. Counting and geometric distribution (based
on sections 1.2, 1.3)
-
Sampling examples.
-
Discrete probability space with infinitely many outcomes.
-
The probability of eventually getting head in a sequence of coin flips
is 1.
-
Review of geometric series.
Lecture 4, 01/29. Computing probability by decomposing (based
on sections 1.4)
-
Decomposing an event as the disjoint union of simpler events: the
probability of getting heads in an even number of coin flips.
-
$\mathsf{\mathrm{P}}\left. \left( A^{\complement} \right) \right. = 1 - \mathsf{\mathrm{P}}\left. (A) \right.$.
-
If $B \subseteq A$ then
$\mathsf{\mathrm{P}}\left. (B) \right. \leq \mathsf{\mathrm{P}}\left. (A) \right.$.
-
How to prove
$P(\text{eventually we will get heads with a fair coin}) = 1$ using
the monotonicity property of the probability measure.
-
Inclusion-exclusion to compute the probability of union of two events:
$\mathsf{\mathrm{P}}\left. (A \cup B) \right. = \mathsf{\mathrm{P}}\left. (A) \right. + \mathsf{\mathrm{P}}\left. (B) \right. - \mathsf{\mathrm{P}}\left. (AB) \right.$.
Lecture 5, 01/31. Inclusion--Exclusion and Random variables
(based on sections 1.4, 1.5)
-
Inclusion-exclusion formula for three and $n$ events.
-
Inclusion-exclusion examples.
-
Definition of a random variable.
-
Distribution of a random variable.
-
Discrete random variable, probability mass function.
-
Example of continuous random variable.
Week 3
Lecture 6, 02/03. Conditional Probability (based on section 2.1)
-
Conditional probability of $A$ given $B$ (with
$\mathsf{\mathrm{P}}\left.(B) \right. > 0$):
\[
\mathsf{\mathrm{P}}\left( A\,~|~\, B \right) = \frac{\mathsf{\mathrm{P}}\left. (AB) \right.}{\mathsf{\mathrm{P}}\left. (B) \right.}.\]
- $\mathsf{\mathrm{P}}\left. \left( \, \cdot |\, B \right) \right.$ is
a probability measure for any fixed $B$ with
$\mathsf{\mathrm{P}}(B) > 0$.
-
The multiplication rule for conditional probabilities
\[\begin{aligned}
\mathsf{\mathrm{P}}\left. (AB) \right. & = \mathsf{\mathrm{P}}\left. \left( A\,~|~\, B \right) \right.\mathsf{\mathrm{P}}\left. (B) \right., \\
\mathsf{\mathrm{P}}\left. \left( A_{1}A_{2}\ldots A_{n} \right) \right. & = \mathsf{\mathrm{P}}\left. \left( A_{1} \right) \right.\mathsf{\mathrm{P}}\left. \left( A_{2}\,~|~\, A_{1} \right) \right.\mathsf{\mathrm{P}}\left. \left( A_{3}\,~|~\, A_{1}A_{2} \right) \right.\ldots\mathsf{\mathrm{P}}\left. \left( A_{1}\,~|~\, A_{2}\ldots A_{n - 1} \right) \right..
\end{aligned}\]
-
Law of total probability for two sets:
$\mathsf{\mathrm{P}}(A) = \mathsf{\mathrm{P}}(B)\mathsf{\mathrm{P}}(A\,~|~\, B) + P\left. \left( B^{\complement} \right) \right.P( A| B^{\complement}).$
Lecture 7, 02/05. Bayes' formula (based on section 2.2)
-
Definition of a partition.
-
If $B_{1}$, $\ldots$, $B_{n}$ is a partition with
$\mathsf{\mathrm{P}}\left. \left( B_{i} \right) \right. > 0$ then we
have the law of total probability for partitions
\[\mathsf{\mathrm{P}}\left. (A) \right. = \sum_{i = 1}^{n}\mathsf{\mathrm{P}}\left. \left( A\,~|~\, B_{i} \right) \right.\mathsf{\mathrm{P}}\left. \left( B_{i} \right) \right.\]
-
Bayes' formula:
\[\mathsf{\mathrm{P}}\left. \left( B|A \right) \right. = \frac{\mathsf{\mathrm{P}}\left. \left( A\,~|~\, B \right) \right.\mathsf{\mathrm{P}}\left. (B) \right.}{\mathsf{\mathrm{P}}\left. \left( A\,~|~\, B \right) \right.\mathsf{\mathrm{P}}\left. (B) \right. + \mathsf{\mathrm{P}}\left. \left( A\,~|~\, B^{\complement} \right) \right.\mathsf{\mathrm{P}}\left. \left( B^{\complement} \right) \right.},\]
Similarly with a partition $B_{1}$, $\ldots$, $B_{n}$.
-
Applications of Bayes' formula.
Lecture 8, 02/07. Independence (based on section 2.3)
-
Independence of two events $A$ and $B$:
$\mathsf{\mathrm{P}}\left. (AB) \right. = \mathsf{\mathrm{P}}\left. (A) \right.\mathsf{\mathrm{P}}\left. (B) \right.$.
-
If $A$ and $B$ are independent then the same is true for (i) $A$
and $B^{\complement}$ (ii) $A^{\complement}$ and $B$ (iii)
$A^{\complement}$ and $B^{\complement}$.
-
Independence on $n$ events, given $A_{1}$, $\ldots$, $A_{n}$:
\[\mathsf{\mathrm{P}}\left. \left( A_{i_{1}}\ldots A_{i_{k}} \right) \right. = \mathsf{\mathrm{P}}\left. \left( A_{i_{1}} \right) \right. \cdot \ldots \cdot P\left. \left( A_{i_{k}} \right) \right.\]
for all indices $1 \leq i_{1} < i_{2} < \ldots < i_{k} \leq n$.
-
How to compute probabilities of combinations of independent events.
-
Example: sampling with and without replacement.
-
Independence of events constructed from independent events e.g.: if
$A$, $B$ and $C$ are independent then $A \cup B$ and
$C^{\complement}$ are independent.
Week 4
Lecture 9, 02/10. Repeated Independent Trials
(based on 2.4)
-
Independence of the random variables $X_{1}$, $\ldots$, $X_{n}$:
for any collection of sets $B_{1},\ldots,B_{n} \subset \mathbb{R}$:
\[\mathsf{\mathrm{P}}\left. \left( X_{1} \in B_{1},\ldots,X_{n} \in B_{n} \right) \right. = \mathsf{\mathrm{P}}\left. \left( X_{1} \in B_{1} \right) \right. \cdot \ldots \cdot \mathsf{\mathrm{P}}\left. \left( X_{n} \in B_{n} \right) \right..\]
-
Equivalent definition for the independence of discrete random
variables $X_{1}$, $\ldots$, $X_{n}$:
\[\mathsf{\mathrm{P}}\left. \left( X_{1} = x_{1},\ldots,X_{n} = x_{n} \right) \right. = \mathsf{\mathrm{P}}\left. \left( X_{1} = k_{1} \right) \right. \cdot \ldots \cdot \mathsf{\mathrm{P}}\left. \left( X_{i} = k_{i} \right) \right.\]
for any $k_{1}$, $\ldots$, $k_{n}$.
-
The probability space of repeated independent trials with the same
success probability. Named distributions constructed from a sequence
of independent trials with the same success probability:
-
Bernoulli with parameter $p$ (the outcome of a single trial).
-
Binomial with parameters $n$ and $p$ (the number of successes
out of $n$ trials).
-
Geometric distribution with parameter $p$ (the number of trials
needed for the first success).
Lecture 10, 02/12. Hypergeometric distribution and
constructions from independent random variables (based on section 2.5)
-
Hypergeometric with parameters $N$, $N_{A}$, $n$ (the number of
good balls in sampling).
-
Independence of random variables which are functions of independent
variables.
-
Examples with different distributions.
Lecture 11, 02/14. Conditional Independence (based on section
2.5)
-
Conditional independence of events.
-
Conditional independence vs. independence.
-
The birthday problem: exact formula, and how to estimate it using the
fact that $e^{- x} \approx 1 - x$ for small $x$.
Week 5
Lecture 12, 02/17. Continuous distributions and probability
density function (based on section 3.1)
-
Continuous distributions: the definition of the probability density
function.
-
Basic properties of the probability density function, how to identify
a p.d.f.
-
How to compute probabilities using a p.d.f.
-
The uniform distribution on $\lbrack a,b\rbrack$.
-
How the value of the probability density function can be connected to
the probability of the random variable being in a small interval.
-
Geometric example (sample point from 2D shape).
Lecture 13, 02/19. Cumulative distribution function (based on
section 3.2)
-
The cumulative distribution function of a random variable, definition,
basic properties.
-
How does the CDF of a discrete and a continuous random variable look
like.
-
How to identify the probability mass function from the CDF and vice
versa.
-
How to identify the CDF of a continuous random variable, and how to
compute its p.d.f.~from the CDF.
Lecture 14, 02/21. Expectation I (based on section 3.3)
-
The expected value of a random variable as the weighted average of
possible values.
-
Computing the expected value for discrete and continuous random
variables.
-
Computing the expected value for uniform and Bernoulli random
variables.
-
Computing the expected value for geometric random variable.
-
Random variables with infinite or undefined expectation
-
Expectation of a function of a random variable: computing
$\operatorname{E}\left. \left\lbrack g\left. (X) \right. \right\rbrack \right.$
for discrete and continuous r.v. $X$.
-
The $n^{\text{th}}$ moment of a random variable.
Week 6
Lecture 15, 02/24. Expectation II (based on 3.3, 8.1, 8.2)
-
How does the expectation changes under a linear function:
$\operatorname{E}\lbrack a\rbrack = a$,
$\operatorname{E}\lbrack aX\rbrack = a\operatorname{E}\lbrack X\rbrack$,
$\operatorname{E}\left. \lbrack aX + b\rbrack \right. = a\operatorname{E}\left. \lbrack X\rbrack \right. + b$.
-
Linearity of expectation: if $X_{1}$, $\ldots$, $X_{n}$ are
random variables then
$\operatorname{E}\left. \left\lbrack g_{1}\left. \left( X_{1} \right) \right. + \ldots + g_{n}\left. \left( X_{n} \right) \right. \right\rbrack \right. = \operatorname{E}\left. \left\lbrack g_{1}\left. \left( X_{1} \right) \right. \right\rbrack \right. + \ldots + \operatorname{E}\left. \left\lbrack g_{n}\left. \left( X_{n} \right) \right. \right\rbrack \right.$.
(If all expectations are well-defined.) In particular
$\operatorname{E}\left. \left\lbrack X_{1} + \ldots + X_{n} \right\rbrack \right. = \operatorname{E}\left. \left\lbrack X_{1} \right\rbrack \right. + \ldots + E\left. \left\lbrack X_{n} \right\rbrack \right.$.
-
Expected value of an indicator random variable.
-
Computing the expected value of binomial random variable.
-
The indicator method: if the random variable $X$ is nonnegative
integer valued then often it can be represented as the sum of
indicator random variables (i.e. random variables that are 0 or 1).
Then the expected value of $X$ is just the sum of the expectations
of the indicators. Since the expectation of an indicator is just the
probability that it is equal to 1, this method can lead to simpler
computation then going through the original definition of the
expectation (using the pmf).
-
Examples for the indicator method.
-
Expectation and independence: if $X_{1}$, $\ldots$, $X_{n}$ are
independent then
$\operatorname{E}\left. \left\lbrack g_{1}\left. \left( X_{1} \right) \right. \cdot \ldots \cdot g_{n}\left. \left( X_{n} \right) \right. \right\rbrack \right. = \operatorname{E}\left. \left\lbrack g_{1}\left. \left( X_{1} \right) \right. \right\rbrack \right. \cdot \ldots \cdot \operatorname{E}\left. \left\lbrack g_{n}\left. \left( X_{n} \right) \right. \right\rbrack \right.$.
(If all expectations are well-defined.) In particular,
$\operatorname{E}\left. \left\lbrack X_{1}X_{2} \cdot \ldots \cdot X_{n} \right\rbrack \right. = \operatorname{E}\left. \left\lbrack X_{1} \right\rbrack \right. \cdot \ldots \cdot \operatorname{E}\left. \left\lbrack X_{n} \right\rbrack \right.$.
(Independence is crucial here, the statement does not hold in general
without it!)
-
Median
Lecture 16, 02/26. Varience (based on 3.4, 8.2)
-
The variance of a random variable:
$\operatorname{Var}(X) = \operatorname{E}\left\lbrack \left( X - \operatorname{E}X \right)^{2} \right\rbrack$.
-
Another way to compute the variance:
$\operatorname{Var}(X) = \operatorname{E}\left\lbrack X^{2} \right\rbrack{- \left( \operatorname{E}\lbrack X\rbrack \right)}^{2}$.
-
Variance of Bernoulli random variable.
-
Variance uniform random variables.
-
How does the expectation and variance changes under a linear function:
$\operatorname{E}\lbrack aX + b\rbrack = a\operatorname{E}\lbrack X\rbrack + b$,
$\operatorname{Var}(aX + b) = a^{2}\operatorname{Var}(X)$.
-
Variance of binomial random variable.
-
If $X_{1}$, $\ldots$, $X_{n}$ are independent with finite
variances then
$\operatorname{Var}\left. \left( X_{1} + \ldots + X_{n} \right) \right. = \operatorname{Var}\left. \left( X_{1} \right) \right. + \ldots + \operatorname{Var}\left. \left( X_{n} \right) \right.$.
Lecture 17, 02/28. Normal Distribution (based on 3.5)
-
The standard normal (or gaussian) distribution (the p.d.f. $\varphi$
and CDF $\Phi$, expectation and variance).
-
The symmetry of the standard normal:
$\Phi\left. ( - x) \right. = 1 - \Phi\left. (x) \right.$.
-
How to compute probabilities involving a standard normal random
variable using the table in the Appendix.
-
The normal distribution with parameters $\mu$ and
$\sigma^{2} > 0$.
-
If $Z \sim \mathsf{\mathrm{N}}\left. (0,1) \right.$ then
$\sigma Z + \mu \sim \mathsf{\mathrm{N}}\left. \left( \mu,\sigma^{2} \right) \right.$.
-
If
$X \sim \mathsf{\mathrm{N}}\left. \left( \mu,\sigma^{2} \right) \right.$
then
$\frac{X - \mu}{\sigma} \sim \mathsf{\mathrm{N}}\left. (0,1) \right.$.
-
Expressing probabilities involving a general normal random variable in
terms of $\Phi$.
-
Why Normal distribution appears everywhere.
Week 7
Lecture 18, 03/03. CLT for binomial random variable (based on
section 4.1)
-
The Central Limit Theorem for binomial random variables: if
$0 < p < 1$ is fixed and
$S_{n} \sim \operatorname{Bin}\left. (n,p) \right.$ then for any
$a < b$ we have
\[\lim\limits_{n \rightarrow \infty}\mathsf{\mathrm{P}}\left. \left( a \leq \frac{S_{n} - np}{\sqrt{np\left. (1 - p) \right.}} \leq b \right) \right. = \Phi\left. (b) \right. - \Phi\left. (a) \right..\]
-
In words: if we center a binomial random variable with its mean and
divide it with its standard deviation then this scaled random variable
will get closer and closer to a standard normal distribution as
$n \rightarrow \infty$.
-
Practical use: if $np\left. (1 - p) \right. > 10$ and
$S_{n} \sim \operatorname{Bin}(n,p)$ then any probability involving
$\frac{S_{n} - np}{\sqrt{np\left. (1 - p) \right.}}$ can be
approximated with the same probability involving a standard normal.
E.g.
\[\mathsf{\mathrm{P}}\left. \left( a \leq \frac{S_{n} - np}{\sqrt{np\left. (1 - p) \right.}} \leq b \right) \right. \approx \Phi\left. (b) \right. - \Phi\left. (a) \right.,\quad\quad\mathsf{\mathrm{P}}\left. \left( \frac{S_{n} - np}{\sqrt{np\left. (1 - p) \right.}} \leq b \right) \right. \approx \Phi\left. (b) \right..\]
-
Continuity correction: if $np\left. (1 - p) \right. > 10$ and
$S_{n} \sim$Bin$\left. (n,p) \right.$ and we want to estimate a
probability of the form of
$\mathsf{\mathrm{P}}\left. \left( k_{1} \leq S_{n} \leq k_{2} \right) \right.$
with $k_{1}$, $k_{2}$ integers, then it often helps to rewrite the
probability as
\[\mathsf{\mathrm{P}}\left. \left( k_{1} \leq S_{n} \leq k_{2} \right) \right. = \mathsf{\mathrm{P}}\left. \left( k_{1} - \frac{1}{2} \leq S_{n} \leq k_{2} + \frac{1}{2} \right) \right.\]
and use the normal approximation for the modified endpoints.
Lecture 19, 03/05. Law of large numbers and applications of
CLT (based on sections 4.2, 4.3)
-
The weak law of large numbers for binomial random variables. Fix
$0 < p < 1$ and $0 < \varepsilon$. Then if
$S_{n} \sim \operatorname{Bin}(n,p)$ then
\[\lim\limits_{n \rightarrow \infty}\mathsf{\mathrm{P}}\left. \left( \left| \frac{S_{n}}{n} - p \right| < \varepsilon \right) \right. = 1.\]
-
Let $S_{n} \sim \operatorname{Bin}\left. (n,p) \right.$ denote the
number of successes in a sequence of independent trials with an
unknown success probability $p$. Then a natural estimate for $p$
is ${\hat{p}}_{n} = \frac{S_{n}}{n}$ (the frequency of successes)
and we can estimate the error using the normal approximation as
\[\begin{aligned}
\mathsf{\mathrm{P}}\left. \left( \left| {\hat{p}}_{n} - p \right| \leq \varepsilon \right) \right. & \geq 2\Phi\left. \left( 2\varepsilon\sqrt{n} \right) \right. - 1.
\end{aligned}\]
-
The definition of a confidence interval corresponding to a certain
percentage.
-
Application to polling: for a large population sampling without
replacement (which would give a hypergeometric distribution) is close
to sampling with replacement (which gives a binomial distribution),
and thus we can apply the normal approximation.
Lecture 20, 03/07. Poisson approximation (based on section
4.4)
-
The Poisson$\left. (\lambda) \right.$ distribution: probability mass
function.
-
If $\lambda > 0$ is fixed then the p.m.f.~of
$\operatorname{Bin}\left. (n,\lambda/n) \right.$ converges to the
p.m.f.~of a $\operatorname{Poisson}\left. (\lambda) \right.$
distribution.
-
Mean and variance of the Poisson distribution.
-
The Poisson approximation of the binomial distribution: if
$np^{2} < 0.2$, then a $\operatorname{Bin}\left. (n,p) \right.$
distribution can be approximated with a
$\operatorname{Poisson}\left. (np) \right.$ random variable.
-
The Poisson distribution as a model for counting rare events.
Week 8
Lecture 21, 03/10. Overview of Poisson process (based on
section 4.6)
-
A quick overview of the Poisson process. (Not needed for the exams!)
-
The gamma distribution. (Not needed for the exams!)
-
The exponential distribution with parameter $\lambda > 0$: CDF,
p.d.f.
Lecture 22, 03/12. Exponential distribution and Moment
generating function (based on sections 4.5, 5.1)
-
The expected value and variance of the exponential.
-
$T \sim \operatorname{Exp}(\lambda) \Longrightarrow aT \sim \operatorname{Exp}(\lambda/a)$.
-
The memoryless property of the exponential.
-
The memoryless property of the geometrical.
-
The exponential distribution with parameter $\lambda > 0$ as the
limit of $\frac{T_{n}}{n}$ where
$T_{n} \sim \operatorname{Geom}(\lambda/n)$. (waiting for a rare
event in continuous time)
-
The moment generating function of the random variable $X$:
$M_{X}\left. (t) \right. = \operatorname{E}\left. \left( e^{tX} \right) \right.$.
-
Identifying the p.m.f. of a discrete random variable from the moment
generating function.
-
Computing the moment generating function of various random variables.
Lecture 23, 03/14. Moment generating function and
distributions (based on section 5.1, 8.3)
-
Computing the moment generating function of various random variables.
-
Computing the moments of a random variable using the moment generating
function:
\[\operatorname{E}\left. \left( X^{n} \right) \right. = \left. \frac{d^{n}}{dt^{n}}M(t) \right|_{t = 0}\]
if the moment generating function is finite in the neighborhood of 0.
-
High moments of exponential and Gaussian distributions.
-
The moment generating function identifies the distribution of the
random variable (if it is finite in a neighborhood of 0).
-
If $X$, $Y$ are independent then the moment generating function of
$X + Y$ is the product of the moment generating functions of $X$
and $Y$ :
$M_{X + Y}\left. (t) \right. = M_{X}\left. (t) \right.M_{Y}\left. (t) \right.$.
-
Sum of two independent Poisson random variables is Poisson.
-
If
$X \sim \mathsf{\mathrm{N}}\left. \left( \mu_{1},\sigma_{1}^{2} \right) \right.$
and
$Y \sim \mathsf{\mathrm{N}}\left. \left( \mu_{2},\sigma_{2}^{2} \right) \right.$
are independent then
$X + Y \sim \mathsf{\mathrm{N}}\left. \left( \mu_{1} + \mu_{2},\sigma_{1}^{2} + \sigma_{2}^{2} \right) \right.$.
Week 9
Lecture 24, 03/17. Function of random variable (based on
section 5.2)
-
Computing the p.m.f. of $g\left. (X) \right.$ if $X$ is discrete
-
Computing the p.d.f. of $Y = g\left. (X) \right.$ using the CDF
method if $X$ has a density function, $g$ has a nonzero derivative
apart from maybe finitely many points.
-
Outline:
-
Write down the p.d.f. and CDF of $X$ (if possible),
-
Identify the support of $Y$ (if possible),
-
Compute the cdf of $Y$ by rewriting
$\mathsf{\mathrm{P}}\left. (Y \leq y) \right. = \mathsf{\mathrm{P}}\left. \left( g\left. (X) \right. \leq y \right) \right.$
in terms of $X$ and the CDF of $X$, (You will need to solve the
inequality $g\left. (X) \right. \leq y$ for $X$.)
-
Differentiate the CDF of $Y$ to get the pdf. (Be careful with case
defined functions!)
Lecture 25, 03/19. Joint distribution of discrete random
variables (based on section 6.1)
-
Random vectors.
-
Joint distribution of random variables.
-
The joint probability mass function of random variables.
-
How to use the joint pmf to compute various probabilities about random
variables.
-
How to compute the marginal pmf from the joint pmf.
-
How to check for independence of discrete random variables form the
joint pmf.
-
How to compute the expectation of a function of several discrete
random variables.
-
The multinomial distribution.
Lecture 26, 03/21. Joint continuous distribution of random
variables (based on section 6.2)
-
Jointly continuous random variables, the joint probability density
function.
-
The joint cumulative distribution function. Connection to the joint
pdf for jointly continuous random variables.
-
How to compute a probability involving jointly continuous random
variables using the joint pdf.
-
How to compute the expectation of a function of several jointly
continuous random variables.
-
How to compute the marginal density function from the joint pdf.
-
The uniform distribution on 2 or 3 dimensional regions.
Week 10
Lecture 27, 03/31. Joint distribution and independence (based on 6.3)
-
How to check for independence of discrete random variables form the
joint pmf.
-
How to check for independence of jointly continuous random variables.
-
Finding the minimum of two independent exponential (or geometric)
random variables.
-
Zero probability events for jointly continuous random variables.
-
Example for two continuous random variables that are not jointly
continuous.
Lecture 28, 04/02. Sum of independent random variables I
(based on 7.1)
-
Sum of two independent discrete or continuous random variables.
-
$X$, $Y$ are independent and discrete then
\[\mathsf{\mathrm{P}}\left. (X + Y = n) \right. = \sum_{k}\mathsf{\mathrm{P}}\left. (X = k) \right.\mathsf{\mathrm{P}}\left. (Y = n - k) \right..\]
-
Sum of independent geometric random variables with the same success
probability: the negative binomial distribution.
-
Sum of independent Poisson random variables is Poisson.
-
Sum of independent Bernoulli with the same success probability is
binomial.
-
Sum of independent binomials with the same success probability is also
binomial.
-
$X$, $Y$ are independent and continuous then the p.d.f. of
$X + Y$ is given by
\[f_{X + Y}\left. (z) \right. = \int_{- \infty}^{\infty}f_{X}\left. (x) \right.f_{Y}\left. (z - x) \right.dx.\]
Lecture 29, 04/04. Sum of independent random variables II
(based on 7.1)
- Sum of two independent discrete or continuous random variables.
-
The sum of two independent uniform random variables.
-
The sum of two independent exponentials with the same parameter.
-
The gamma distribution.
-
If
$X \sim \mathsf{\mathrm{N}}\left. \left( \mu_{1},\sigma_{1}^{2} \right) \right.$
and
$Y \sim \mathsf{\mathrm{N}}\left. \left( \mu_{2},\sigma_{2}^{2} \right) \right.$
are independent then
$X + Y \sim \mathsf{\mathrm{N}}\left. \left( \mu_{1} + \mu_{2},\sigma_{1}^{2} + \sigma_{2}^{2} \right) \right.$.
-
The linear combination of independent normals is also normal.\\
Week 11
Lecture 30, 04/07. Exchangibility (based on 7.2)
- Exchangeable random variables: definition, properties
-
If $X_{1}$, $\ldots$, $X_{k}$ are independent and identically
distributed (i.i.d.) then they are exchangeable.
-
If $X_{1}$, $\ldots$, $X_{k}$ is a sample without replacement
from the set $\left\{ 1,2,\ldots,n \right\}$ then the random
variables are exchangeable.
-
Applications of exchangeability.
-
Indicator random variable.
-
The indicator method: if the random variable $X$ is nonnegative
integer valued then often it can be represented as the sum of
indicator random variables (i.e. random variables that are 0 or 1).
Then the expected value of $X$ is just the sum of the expectations
of the indicators. Since the expectation of an indicator is just the
probability that it is equal to 1, this method can lead to simpler
computation then going through the original definition of the
expectation (using the p.m.f.).
-
The expected value of the number of aces in a randomly chosen hand of
5.
-
The expected value of a hypergeometric.
Lecture 31, 04/09. The Indicator method (based on 8.1)
-
Examples for the indicator method.
-
The expected value of a negative hypergeometric.
-
The coupon collector's problem.
Lecture 32, 04/11. Covariance and correlation I (based on
8.4)
-
The definition of covariance
$\operatorname{Cov}(X,Y) = \operatorname{E}\left. \left\lbrack \left( X - \operatorname{E}\lbrack X\rbrack \right)\left( Y - \operatorname{E}\lbrack Y\rbrack \right) \right\rbrack \right. = \operatorname{E}\lbrack XY\rbrack - \operatorname{E}\lbrack X\rbrack\operatorname{E}\lbrack Y\rbrack$.
-
$\operatorname{Cov}(X,X) = \operatorname{Var}(X)$,
$\operatorname{Cov}(X,Y) = \operatorname{Cov}(Y,X)$
-
The definition of uncorrelated, negatively/positively correlated.
-
The covariance of indicator random variables.
-
The variance of $X_{1} + \ldots + X_{n}$ in the general case:
\[\operatorname{Var}(X_{1} + \ldots + X_{n}) = \sum_{i = 1}^{n}\operatorname{Var}(X_{i}) + 2\sum_{1 \leq i < j \leq n}\operatorname{Cov}(X_{i},X_{j})\]
-
Example: using the indicator method to compute the variance of the
hypergeometric distribution,
Week 12
Lecture 33, 04/14. Covariance and correlation II (based on
8.4)
-
$\operatorname{Cov}(aX + b,Y) = a\operatorname{Cov}(X,Y)$,
\[\operatorname{Cov}(\sum_{i = 1}^{n}a_{i}X_{i},\sum_{j = 1}^{m}b_{j}Y_{j}) = \sum_{i = 1}^{n}\sum_{j = 1}^{m}a_{i}b_{j}\operatorname{Cov}(X_{i},Y_{j}).\]
-
The definition of the correlation coefficient:
\[\operatorname{Corr}(X,Y) = \frac{\operatorname{Cov}(X,Y)}{\sqrt{\operatorname{Var}(X)}\sqrt{\operatorname{Var}(Y)}}.\]
-
Properties of the correlation coefficient:
$- 1 \leq \operatorname{Corr}(X,Y) \leq 1$, if
$\operatorname{Corr}(X,Y) = \pm 1$ then there is a linear relation
between $X$ and $Y$
-
Covariance and correlation of multinomial random variables
Lecture 34, 04/16. Tail Probabilities and The Law of Large
Numbers (based on 9.1, 9.2)
- Estimating probabilities of the form
$\mathsf{\mathrm{P}}\left. (X \geq c) \right.$ using Markov's
inequality.
- Using Chebyshev's inequality to estimate
$\mathsf{\mathrm{P}}\left. (X \geq c) \right.$, improved estimate in
case of a symmetric random variable
-
The expected value and variance of the sample mean: if $X_{1}$,
$\ldots$, $X_{n}$ are iid with expectation $\mu$ and variance
$\sigma^{2}$ then
\[\operatorname{E}\left. \left\lbrack \frac{X_{1} + \ldots + X_{n}}{n} \right\rbrack \right. = \mu,\quad\quad\operatorname{Var}\left. \left( \frac{X_{1} + \ldots + X_{n}}{n} \right) \right. = \frac{\sigma^{2}}{n}.\]
-
The weak law of large numbers for i.i.d.~random variables with a
finite variance.
-
The strong law of large numbers
Lecture 35, 04/18. Central Limit Theorem (based on 9.2, 9.3)
-
The Central Limit Theorem for i.i.d.~random variables with a finite
mean and variance.
-
Examples of usage of LLN and CLT.
-
Random walk and necessity of conditional expectation.
Week 13
Lecture 36, 04/21. Conditional distribution for discrete
random variable I (based on 10.1)
- Conditional probability mass function and conditional expectation of a
discrete random variable with respect to an event $B$ with
$\mathsf{\mathrm{P}}\left. (B) \right. > 0$.
-
The averaging principle: how to get the unconditional pmf or
expectation from the conditional ones if we have a partition.
-
How to compute the expected value of a geometric random variable using
conditioning
-
Conditioning a discrete random variable with respect to the outcome of
another discrete random variable.
Lecture 37, 04/23. Conditional distribution for discrete
random variable II (based on 10.1)
-
Discrete random variables $X$ and $Y$ are independent if
$p_{X|Y}\left. \left( x|y \right) \right. = p_{X}\left. (x) \right.$
for all $x$, $y$ with $p_{Y}\left. (y) \right. > 0$.
-
Example of using conditional distribution: conditional binomial.
-
Example of using conditional distribution: first step analysis.
-
Example of using conditional distribution: Poisson process.
Lecture 38, 04/25. Conditional distribution for continuous
random variable (based on 10.2)
-
Conditional distribution of jointly continuous random variables. The
conditional probability density function of $X$ given $Y = y$.
- Computing conditional probabilities and conditional expectations using
the conditional probability density function.
-
Jointly continuous random variables $X$ and $Y$ are independent if
$f_{X|Y}\left. \left( x|y \right) \right. = f_{X}\left. (x) \right.$
for all $x$, $y$ if $f_{Y}\left. (y) \right. > 0$.
Week 14
Lecture 39, 04/28. Conditional Expectation (based on 10.3)
-
The conditional expectation of $X$ given $Y$: the random variable
$\operatorname{E}\left. \left\lbrack X|Y \right\rbrack \right.$.
-
Basic properties of
$\operatorname{E}\left. \left\lbrack X|Y \right\rbrack \right.$.
-
$\operatorname{E}\left. \left\lbrack \operatorname{E}\left. \left\lbrack X|Y \right\rbrack \right. \right\rbrack \right. = \operatorname{E}\left. \lbrack X\rbrack \right.$.
-
Further conditional expectation examples.
Lecture 40, 04/30. Conditional Expectation (based on 10.3)
-
Further conditional expectation examples.
-
Examples from statistics.
-
Examples form financial mathematics.
-
Examples from Poisson processes.
Lecture 41, 05/02. Final Review