{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# TOPIC 0 \n", "\n", "# Introduction\n", "\n", "## 1 Review\n", "\n", "***\n", "*Course:* [Math 535](http://www.math.wisc.edu/~roch/mmids/) - Mathematical Methods in Data Science (MMiDS) \n", "*Author:* [Sebastien Roch](http://www.math.wisc.edu/~roch/), Department of Mathematics, University of Wisconsin-Madison \n", "*Updated:* Sep 21, 2020 \n", "*Copyright:* © 2020 Sebastien Roch\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "We first review a few basic concepts." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 1.1 Vectors and norms" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For a vector \n", "\n", "$$\n", "\\mathbf{x} \n", "= \n", "\\begin{bmatrix}\n", "x_1 \\\\\n", "x_2 \\\\\n", "\\vdots \\\\\n", "x_d\n", "\\end{bmatrix}\n", "\\in \\mathbb{R}^d\n", "$$\n", "\n", "the Euclidean norm of $\\mathbf{x}$ is defined as\n", "\n", "$$\n", "\\|\\mathbf{x}\\|_2 \n", "= \n", "\\sqrt{\n", "\\sum_{i=1}^d x_i^2\n", "} \n", "= \n", "\\sqrt{\\mathbf{x}^T \\mathbf{x}}\n", "= \n", "\\sqrt{\\langle \\mathbf{x}, \\mathbf{x}\\rangle}\n", "$$\n", "\n", "where $\\mathbf{x}^T$ denotes the transpose of $\\mathbf{x}$ (seen as a single-column matrix) and \n", "\n", "$$\n", "\\langle \\mathbf{u}, \\mathbf{v} \\rangle = \\sum_{i=1}^d u_i v_i\n", "$$ \n", "\n", "is the [inner product](https://en.wikipedia.org/wiki/Inner_product_space) of $\\mathbf{u}$ and $\\mathbf{v}$.\n", "This is also known as the $2$-norm. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "More generally, for $p \\geq 1$, the [$p$-norm](https://en.wikipedia.org/wiki/Lp_space#The_p-norm_in_countably_infinite_dimensions_and_ℓ_p_spaces) of $\\mathbf{x}$ is given by\n", "\n", "$$\n", "\\|\\mathbf{x}\\|_p \n", "= \n", "\\left(\n", "\\sum_{i=1}^d |x_i|^p\n", "\\right)^{1/p}.\n", "$$\n", "\n", "[Here](https://commons.wikimedia.org/wiki/File:Lp_space_animation.gif#/media/File:Lp_space_animation.gif) is a nice visualization of the unit ball, that is, the set $\\{\\mathbf{x}:\\|x\\|_p \\leq 1\\}$, under varying $p$. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "There exist many more norms. Formally: \n", "\n", "**Definition (Norm):** A norm is a function $\\ell$ from $\\mathbb{R}^d$ to $\\mathbb{R}_+$ that satisfies for all $a \\in \\mathbb{R}$,\n", "$\\mathbf{u}, \\mathbf{v} \\in \\mathbb{R}^d$\n", "\n", "- *(Homogeneity):* $\\ell(a \\mathbf{u}) = |a| \\ell(\\mathbf{u})$\n", "- *(Triangle inequality):* $\\ell(\\mathbf{u}+\\mathbf{v}) \\leq \\ell(\\mathbf{u}) + \\ell(\\mathbf{v})$\n", "- *(Point-separating):* $\\ell(u) = 0$ implies $\\mathbf{u} =0$.\n", "\n", "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "The triangle inequality for the $2$-norm [follows](https://en.wikipedia.org/wiki/Cauchy–Schwarz_inequality#Analysis) from the [Cauchy–Schwarz inequality](https://en.wikipedia.org/wiki/Cauchy–Schwarz_inequality).\n", "\n", "***\n", "**Theorem (Cauchy–Schwarz):** For all $\\mathbf{u}, \\mathbf{v} \\in \\mathbb{R}^d$\n", "\n", "$$\n", "|\\langle \\mathbf{u}, \\mathbf{v} \\rangle| \n", "\\leq \\|\\mathbf{u}\\|_2 \\|\\mathbf{v}\\|_2.\n", "$$\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) between two vectors $\\mathbf{u}$ and $\\mathbf{v}$ in $\\mathbb{R}^d$ is the $2$-norm of their difference\n", "\n", "$$\n", "d(\\mathbf{u},\\mathbf{v})\n", "= \\|\\mathbf{u} - \\mathbf{v}\\|_2.\n", "$$\n", "\n", "Throughout we use the notation $\\|\\mathbf{x}\\| = \\|\\mathbf{x}\\|_2$ to indicate the $2$-norm of $\\mathbf{x}$ unless specified otherwise." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We will often work with collections of $n$ vectors $\\mathbf{x}_1, \\ldots, \\mathbf{x}_n$ in $\\mathbb{R}^d$ and it will be convenient to stack them up into a matrix\n", "\n", "$$\n", "X =\n", "\\begin{bmatrix}\n", "\\mathbf{x}_1^T \\\\\n", "\\mathbf{x}_2^T \\\\\n", "\\vdots \\\\\n", "\\mathbf{x}_n^T \\\\\n", "\\end{bmatrix}\n", "=\n", "\\begin{bmatrix}\n", "x_{11} & x_{12} & \\cdots & x_{1d} \\\\\n", "x_{21} & x_{22} & \\cdots & x_{2d} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "x_{n1} & x_{n2} & \\cdots & x_{nd} \\\\\n", "\\end{bmatrix},\n", "$$\n", "\n", "where $^T$ indicates the [transpose](https://en.wikipedia.org/wiki/Transpose):\n", "\n", "![transpose](https://upload.wikimedia.org/wikipedia/commons/e/e4/Matrix_transpose.gif)\n", "\n", "([Source](https://commons.wikimedia.org/wiki/File:Matrix_transpose.gif))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**NUMERICAL CORNER** In Julia, a vector can be obtained in different ways. The following method gives a row vector as a two-dimensional array." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# Julia version: 1.5.1\n", "using LinearAlgebra, Statistics, Plots" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1×3 Array{Float64,2}:\n", " 1.0 3.0 5.0" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t = [1. 3. 5.]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "To turn it into a one-dimensional array, use [vec](https://docs.julialang.org/en/v1/base/arrays/#Base.vec)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 1.0\n", " 3.0\n", " 5.0" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vec(t)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To construct a one-dimensional array directly, use commas to separate the entries." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "4-element Array{Float64,1}:\n", " 1.0\n", " 3.0\n", " 5.0\n", " 7.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u = [1., 3., 5., 7.]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To obtain the norm of a vector, we can use the function [norm](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.norm) (which requires the LinearAlgebra package):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "9.16515138991168" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "norm(u)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "which we can check \"by hand\"" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "9.16515138991168" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sqrt(sum(u.^2))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The . above is called [broadcasting](https://docs.julialang.org/en/v1.2/manual/mathematical-operations/#man-dot-operators-1). It applies the operator following it (in this case taking a square) element-wise." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "*Exercise:* Compute the inner product of $u = (1,2,3,4)$ and $v = (5, 4, 3, 2)$ without using the function [dot](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.dot). Hint: The product of two real numbers $a$ and $b$ is a * b." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# Try it!\n", "u = [1., 2., 3., 4.];\n", "# EDIT THIS LINE: define v\n", "# EDIT THIS LINE: compute the inner product between u and v" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To create a matrix out of two vectors, we use the function [hcat](https://docs.julialang.org/en/v1.2/base/arrays/#Base.hcat) and transpose." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2×4 Adjoint{Float64,Array{Float64,2}}:\n", " 1.0 3.0 5.0 7.0\n", " 2.0 4.0 6.0 8.0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u = [1., 3., 5., 7.];\n", "v = [2., 4., 6., 8.];\n", "X = hcat(u,v)'" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With more than two vectors, we can use the [reduce](https://docs.julialang.org/en/v1/base/collections/#Base.reduce-Tuple{Any,Any}) function." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "3×4 Adjoint{Float64,Array{Float64,2}}:\n", " 1.0 3.0 5.0 7.0\n", " 2.0 4.0 6.0 8.0\n", " 9.0 8.0 7.0 6.0" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u = [1., 3., 5., 7.];\n", "v = [2., 4., 6., 8.];\n", "w = [9., 8., 7., 6.];\n", "X = reduce(hcat, [u, v, w])'" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 1.2 Multivariable calculus" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "#### 1.2.1 Limits and continuity\n", "\n", "Throughout this section, we use the Euclidean norm $\\|\\mathbf{x}\\| = \\sqrt{\\sum_{i=1}^d x_i^2}$ for $\\mathbf{x} = (x_1,\\ldots, x_d)^T \\in \\mathbb{R}^d$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The open $r$-ball around $\\mathbf{x} \\in \\mathbb{R}^d$ is the set of points within Euclidean distance $r$ of $\\mathbf{x}$, that is,\n", "\n", "$$\n", "B_r(\\mathbf{x}) = \\{\\mathbf{y} \\in \\mathbb{R}^d \\,:\\, \\|\\mathbf{y} - \\mathbf{x}\\| < r\\}.\n", "$$\n", "\n", "A point $\\mathbf{x} \\in \\mathbb{R}^d$ is a limit point (or accumulation point) of a set $A \\subseteq \\mathbb{R}^d$ if every open ball around $\\mathbf{x}$ contains an element $\\mathbf{a}$ of $A$ such that $\\mathbf{a} \\neq \\mathbf{x}$. A set $A$ is closed if every limit point of $A$ belongs to $A$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![limit point](https://1.bp.blogspot.com/-hc5TCEO6Sdo/WkBJtvWoTvI/AAAAAAAAALg/noT83CnNSxQ30_W4xDivMOIMS1Y9k6YHACLcBGAs/s1600/topological_space_accum%2B%25281%2529.png)\n", "\n", "([Source](https://www.math212.com/2017/12/limit-point-closure.html))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "A point $\\mathbf{x} \\in \\mathbb{R}^d$ is an interior point of a set $A \\subseteq \\mathbb{R}^d$ if there exists an $r > 0$ such that $B_r(\\mathbf{x}) \\subseteq A$.\n", "A set $A$ is open if it consists entirely of interior points." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "![Open set](https://upload.wikimedia.org/wikipedia/commons/thumb/d/de/Open_set_-_example.png/320px-Open_set_-_example.png)\n", "\n", "([Source](https://commons.wikimedia.org/wiki/File:Open_set_-_example.png))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "A set $A \\subseteq \\mathbb{R}^d$ is bounded if there exists an $r > 0$ such that $A \\subseteq B_r(\\mathbf{0})$, where $\\mathbf{0} = (0,\\ldots,0)^T$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Definition (Limits of a Function):** Let $f: D \\to \\mathbb{R}$ be a real-valued function on $D \\subseteq \\mathbb{R}^d$. Then $f$ is said to have a limit $L \\in \\mathbb{R}$ as $\\mathbf{x}$ approaches $\\mathbf{a}$ if: for any $\\epsilon > 0$, there exists a $\\delta > 0$ such that $|f(\\mathbf{x}) - L| < \\epsilon$ for all $\\mathbf{x} \\in D \\cap B_\\delta(\\mathbf{a})\\setminus \\{\\mathbf{a}\\}$. This is written as \n", "\n", "$$\n", "\\lim_{\\mathbf{x} \\to \\mathbf{a}} f(\\mathbf{x}) = L.\n", "$$\n", "\n", "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Note that we explicitly exclude $\\mathbf{a}$ itself from having to satisfy the condition $|f(\\mathbf{x}) - L| < \\epsilon$. In particular, we may have $f(\\mathbf{a}) \\neq L$. We also do not restrict $\\mathbf{a}$ to be in $D$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "![Limit of a function](http://tutorial.math.lamar.edu/Classes/CalcI/TheLimit_Files/image002.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Definition (Continuous Function):** Let $f: D \\to \\mathbb{R}$ be a real-valued function on $D \\subseteq \\mathbb{R}^d$. Then $f$ is said to be continuous at $\\mathbf{a} \\in D$ if \n", "\n", "$$\n", "\\lim_{\\mathbf{x} \\to \\mathbf{a}} f(\\mathbf{x}) = f(\\mathbf{a}).\n", "$$\n", "\n", "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "![Continuous function](https://upload.wikimedia.org/wikipedia/commons/thumb/7/7f/Example_of_continuous_function.svg/232px-Example_of_continuous_function.svg.png)\n", "\n", "([Source](https://commons.wikimedia.org/wiki/File:Example_of_continuous_function.svg))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We will not prove the following fundamental analysis result, which will be useful below. See e.g. [Wikipedia](https://en.wikipedia.org/wiki/Extreme_value_theorem). Suppose $f : D \\to \\mathbb{R}$ is defined on a set $D \\subseteq \\mathbb{R}^d$. We say that $f$ attains a maximum value $M$ at $\\mathbf{z}^*$ if $f(\\mathbf{z}^*) = M$ and $M \\geq f(\\mathbf{x})$ for all $\\mathbf{x} \\in D$. Similarly, we say $f$ attains a minimum value $m$ at $\\mathbf{z}_*$ if $f(\\mathbf{z}_*) = m$ and $m \\geq f(\\mathbf{x})$ for all $\\mathbf{x} \\in D$.\n", "\n", "***\n", "\n", "**Theorem (Extreme Value):** Let $f : D \\to \\mathbb{R}$ be a real-valued, continuous function on a nonempty, closed, bounded set $D\\subseteq \\mathbb{R}^d$. Then $f$ attains a maximum and a minimum on $D$. \n", "\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### 1.2.2 Derivatives\n", "\n", "We begin by reviewing the single-variable case. Recall that the derivative of a function of a real variable is the rate of change of the function with respect to the change in the variable. Formally:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**Definition (Derivative):** Let $f : D \\to \\mathbb{R}$ where $D \\subseteq \\mathbb{R}$ and let $x_0 \\in D$ be an interior point of $D$. The derivative of $f$ at $x_0$ is \n", "\n", "$$\n", "f'(x_0) \n", "= \\frac{\\mathrm{d} f (x_0)}{\\mathrm{d} x}\n", "= \\lim_{h \\to 0} \\frac{f(x_0 + h) - f(x_0)}{h}\n", "$$\n", "\n", "provided the limit exists. $\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![Derivative](https://upload.wikimedia.org/wikipedia/commons/thumb/0/0f/Tangent_to_a_curve.svg/320px-Tangent_to_a_curve.svg.png)\n", "\n", "([Source](https://commons.wikimedia.org/wiki/File:Tangent_to_a_curve.svg))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "*Exercise:* Let $f$ and $g$ have derivatives at $x$ and let $\\alpha$ and $\\beta$ be constants. Show that \n", "\n", "$$\n", "[\\alpha f(x) + \\beta g(x)]' = \\alpha f'(x) + \\beta g'(x).\n", "$$\n", "\n", "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The following lemma encapsulates a key insight about the derivative of $f$ at $x_0$: it tells us where to find smaller values." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "***\n", "\n", "**Lemma (Descent Direction):** Let $f : D \\to \\mathbb{R}$ with $D \\subseteq \\mathbb{R}$ and let $x_0 \\in D$ be an interior point of $D$ where $f'(x_0)$ exists. If $f'(x_0) > 0$, then there is an open ball $B_\\delta(x_0) \\subseteq D$ around $x_0$\n", "such that for each $x$ in $B_\\delta(x_0)$:\n", "\n", "(a) $f(x) > f(x_0)$ if $x > x_0$,\n", "(b) $f(x) < f(x_0)$ if $x < x_0$.\n", "\n", "If instead $f'(x_0) < 0$, the opposite holds.\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "*Proof idea:* Follows from the definition of the derivative by taking $\\epsilon$ small enough that $f'(x_0) - \\epsilon > 0$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*Proof:* Take $\\epsilon = f'(x_0)/2$. By definition of the derivative, there is $\\delta > 0$ such that \n", "\n", "$$\n", "f'(x_0)\n", "-\n", "\\frac{f(x_0 + h) - f(x_0)}{h}\n", "< \\epsilon\n", "$$\n", "\n", "for all $0 < h < \\delta$. Rearranging gives\n", "\n", "$$\n", "f(x_0 + h) \n", "> f(x_0) + [f'(x_0) - \\epsilon] h\n", "> f(x_0)\n", "$$\n", "\n", "by our choice of $\\epsilon$. The other direction is similar. $\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "For functions of several variables, we have the following generalization. As before, we let $\\mathbf{e}_i \\in \\mathbb{R}^d$ be the $i$-th standard basis vector." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**Definition (Partial Derivative):** Let $f : D \\to \\mathbb{R}$ where $D \\subseteq \\mathbb{R}^d$ and let $\\mathbf{x}_0 \\in D$ be an interior point of $D$. The partial derivative of $f$ at $\\mathbf{x}_0$ with respect to $x_i$ is \n", "\n", "$$\n", "\\frac{\\partial f (\\mathbf{x}_0)}{\\partial x_i} \n", "= \\lim_{h \\to 0} \\frac{f(\\mathbf{x}_0 + h \\mathbf{e}_i) - f(\\mathbf{x}_0)}{h}\n", "$$\n", "\n", "provided the limit exists. If $\\frac{\\partial f (\\mathbf{x}_0)}{\\partial x_i}$ exists and is continuous in an open ball around $\\mathbf{x}_0$ for all $i$, then we say that $f$ continuously differentiable at $\\mathbf{x}_0$. $\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Definition (Jacobian):** Let $\\mathbf{f} = (f_1, \\ldots, f_m) : D \\to \\mathbb{R}^m$ where $D \\subseteq \\mathbb{R}^d$ and let $\\mathbf{x}_0 \\in D$ be an interior point of $D$ where $\\frac{\\partial f_j (\\mathbf{x}_0)}{\\partial x_i}$ exists for all $i, j$. The Jacobian of $\\mathbf{f}$ at $\\mathbf{x}_0$ is the $d \\times m$ matrix\n", "\n", "$$\n", "\\mathbf{J}_{\\mathbf{f}}(\\mathbf{x}_0)\n", "= \\begin{pmatrix}\n", "\\frac{\\partial f_1 (\\mathbf{x}_0)}{\\partial x_1}\n", "& \\ldots & \\frac{\\partial f_1 (\\mathbf{x}_0)}{\\partial x_d}\\\\\n", "\\vdots & \\ddots & \\vdots\\\\\n", "\\frac{\\partial f_m (\\mathbf{x}_0)}{\\partial x_1}\n", "& \\ldots & \\frac{\\partial f_m (\\mathbf{x}_0)}{\\partial x_d}\n", "\\end{pmatrix}.\n", "$$\n", "\n", "For a real-valued function $f : D \\to \\mathbb{R}$, the Jacobian reduces to the row vector\n", "\n", "$$\n", "\\mathbf{J}_{f}(\\mathbf{x}_0)\n", "= \\nabla f(\\mathbf{x}_0)^T\n", "$$\n", "\n", "where the vector\n", "\n", "$$\n", "\\nabla f(\\mathbf{x}_0) \n", "= \\left(\\frac{\\partial f (\\mathbf{x}_0)}{\\partial x_1}, \\ldots, \\frac{\\partial f (\\mathbf{x}_0)}{\\partial x_d}\\right)^T\n", "$$\n", "\n", "is the gradient of $f$ at $\\mathbf{x}_0$. $\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Example:** Consider the affine function \n", "\n", "$$\n", "f(\\mathbf{x})\n", "= \\mathbf{q}^T \\mathbf{x} + r\n", "$$\n", "\n", "where $\\mathbf{x} = (x_1, \\ldots, x_d)^T, \\mathbf{q} = (q_1, \\ldots, q_d)^T \\in \\mathbb{R}^d$. The partial derivatives of the linear term are given by\n", "\n", "$$\n", "\\frac{\\partial}{\\partial x_i}\n", "[\\mathbf{q}^T \\mathbf{x}]\n", "= \\frac{\\partial}{\\partial x_i}\n", "\\left[\\sum_{j=1}^d q_j x_j\n", "\\right]\n", "= q_i.\n", "$$\n", "\n", "So the gradient of $f$ is\n", "\n", "$$\n", "\\nabla f(\\mathbf{x})\n", "= \\mathbf{q}.\n", "$$\n", "\n", "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Example:** Consider the quadratic function \n", "\n", "$$\n", "f(\\mathbf{x})\n", "= \\frac{1}{2} \\mathbf{x}^T P \\mathbf{x} + \\mathbf{q}^T \\mathbf{x} + r.\n", "$$\n", "\n", "where $\\mathbf{x} = (x_1, \\ldots, x_d)^T, \\mathbf{q} = (q_1, \\ldots, q_d)^T \\in \\mathbb{R}^d$ and $P \\in \\mathbb{R}^{d \\times d}$. The partial derivatives of the quadratic term are given by\n", "\n", "\n", "\\begin{align*}\n", "\\frac{\\partial}{\\partial x_i}\n", "[\\mathbf{x}^T P \\mathbf{x}]\n", "&= \\frac{\\partial}{\\partial x_i}\n", "\\left[\\sum_{j, k=1}^d P_{jk} x_j x_k\n", "\\right]\\\\\n", "&= \\frac{\\partial}{\\partial x_i}\n", "\\left[P_{ii} x_i^2 \n", "+ \\sum_{j=1, j\\neq i}^d P_{ji} x_j x_i\n", "+ \\sum_{k=1, k\\neq i}^d P_{ik} x_i x_k\n", "\\right]\\\\\n", "&= 2 P_{ii} x_i\n", "+ \\sum_{j=1, j\\neq i}^d P_{ji} x_j\n", "+ \\sum_{k=1, k\\neq i}^d P_{ik} x_k\\\\\n", "&= \\sum_{j=1}^d [P^T]_{ij} x_j + \\sum_{k=1}^d [P]_{ik} x_k.\n", "\\end{align*}\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "So the gradient of $f$ is\n", "\n", "$$\n", "\\nabla f(\\mathbf{x})\n", "= \\frac{1}{2}[P + P^T] \\,\\mathbf{x}\n", "+ \\mathbf{q}.\n", "$$\n", "\n", "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### 1.2.3 Optimization" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Optimization problems play an important role in data science. Here we look at unconstrained optimization problems of the form:\n", "\n", "$$\n", "\\min_{\\mathbf{x} \\in \\mathbb{R}^d} f(\\mathbf{x})\n", "$$\n", "\n", "where $f : \\mathbb{R}^d \\to \\mathbb{R}$. Ideally, we would like to find a global minimizer to the optimization problem above." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Definition (Global Minimizer):** Let $f : \\mathbb{R}^d \\to \\mathbb{R}$. The point $\\mathbf{x}^* \\in \\mathbb{R}^d$ is a global minimizer of $f$ over $\\mathbb{R}^d$ if \n", "\n", "$$\n", "f(\\mathbf{x}) \n", "\\geq f(\\mathbf{x}^*), \\quad \\forall \\mathbf{x} \\in \\mathbb{R}^d. \n", "$$\n", "\n", "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Example:** The function $f(x) = x^2$ over $\\mathbb{R}$ has a global minimizer at $x^* = 0$. Indeed, we clearly have $f(x) \\geq 0$ for all $x$ while $f(0) = 0$." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f(x) = x^2\n", "x = LinRange(-2,2,100)\n", "y = f.(x)\n", "plot(x, y, lw=2, legend=false)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The function $f(x) = e^x$ over $\\mathbb{R}$ does not have a global minimizer. Indeed, $f(x) > 0$ but no $x$ achieves $0$. And, for any $m > 0$, there is $x$ small enough such that $f(x) < m$. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f(x) = exp(x)\n", "x = LinRange(-2,2, 100)\n", "y = f.(x)\n", "plot(x, y, lw=2, legend=false, ylim = (0,5))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The function $f(x) = (x+1)^2 (x-1)^2$ over $\\mathbb{R}$ has two global minimizers at $x^* = -1$ and $x^{**} = 1$. Indeed, $f(x) \\geq 0$ and $f(x) = 0$ if and only $x = x^*$ or $x = x^{**}$." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f(x) = (x+1)^2*(x-1)^2\n", "x = LinRange(-2,2, 100)\n", "y = f.(x)\n", "plot(x, y, lw=2, legend=false, ylim = (0,5))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "$\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "In general, finding a global minimizer and certifying that one has been found can be difficult unless some special structure is present. Therefore weaker notions of solution have been introduced." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Definition (Local Minimizer):** Let $f : \\mathbb{R}^d \\to \\mathbb{R}$. The point $\\mathbf{x}^* \\in \\mathbb{R}^d$ is a local minimizer of $f$ over $\\mathbb{R}^d$ if there is $\\delta > 0$ such that \n", "\n", "$$\n", "f(\\mathbf{x}) \n", "\\geq f(\\mathbf{x}^*), \\quad \\forall \\mathbf{x} \\in B_{\\delta}(\\mathbf{x}^*) \\setminus \\{\\mathbf{x}^*\\}. \n", "$$\n", "\n", "If the inequality is strict, we say that $\\mathbf{x}^*$ is a strict local minimizer. $\\lhd$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In words, $\\mathbf{x}^*$ is a local minimizer if there is open ball around $\\mathbf{x}^*$ where it attains the minimum value. The difference between global and local minimizers is illustrated in the next figure.\n", "\n", "![Local minimizer](https://upload.wikimedia.org/wikipedia/commons/thumb/6/68/Extrema_example_original.svg/300px-Extrema_example_original.svg.png)\n", "\n", "([Source](https://commons.wikimedia.org/wiki/File:Extrema_example_original.svg))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Local minimizers can be characterized in terms of the gradient, at least in terms of a necessary condition. We will prove this result later in the course." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "***\n", "\n", "**Theorem (First-Order Necessary Condition):** Let $f : \\mathbb{R}^d \\to \\mathbb{R}$ be continuously differentiable on $\\mathbb{R}^d$. If $\\mathbf{x}_0$ is a local minimizer, then $\\nabla f(\\mathbf{x}_0) = 0$.\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 1.3 Probability" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "#### 1.3.1 Expectation, variance and Chebyshev's inequality" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Recall that the [expectation](https://en.wikipedia.org/wiki/Expected_value) (or mean) of a function $h$ of a discrete random variable $X$ taking values in $\\mathcal{X}$ is given by\n", "\n", "$$\n", "\\mathbb{E}[h(X)]\n", "= \\sum_{x \\in \\mathcal{X}} h(x)\\,p_X(x)\n", "$$\n", "\n", "where $p_X(x) = \\mathbb{P}[X = x]$ is the [probability mass function](https://en.wikipedia.org/wiki/Probability_mass_function) (PMF) of $X$.\n", "In the continuous case, we have\n", "\n", "$$\n", "\\mathbb{E}[h(X)]\n", "= \\int h(x) f_X(x)\\,\\mathrm{d}x\n", "$$\n", "\n", "if $f_X$ is the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function) (PDF) of $X$. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Two key properties of the expectation: \n", "\n", "- *linearity*, that is,\n", "\n", "$$\n", "\\mathbb{E}[\\alpha h(X) + \\beta] = \\alpha \\,\\mathbb{E}[h(X)] + \\beta\n", "$$\n", "\n", "- *monotonicity*, that is, if $h_1(x) \\leq h_2(x)$ for all $x$ then\n", "\n", "$$\n", "\\mathbb{E}[h_1(X)] \\leq \\mathbb{E}[h_2(X)].\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The [variance](https://en.wikipedia.org/wiki/Variance) of a real-valued random variable $X$ is\n", "\n", "$$\n", "\\mathrm{Var}[X] = \\mathbb{E}[(X - \\mathbb{E}[X])^2]\n", "$$\n", "\n", "and its standard deviation is $\\sigma_X = \\sqrt{\\mathrm{Var}[X]}$. The variance does not satisfy linearity, but we have the following property\n", "\n", "$$\n", "\\mathrm{Var}[\\alpha X + \\beta] = \\alpha^2 \\,\\mathrm{Var}[X].\n", "$$\n", "\n", "The variance is a measure of the typical deviation of $X$ around its mean.\n", "A quantified version of this statement is given by Chebyshev's inequality." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "***\n", "**Lemma (Chebyshev)** For a random variable $X$ with finite variance, we have for any $\\alpha > 0$\n", "\n", "$$\n", "\\mathbb{P}[|X - \\mathbb{E}[X]| \\geq \\alpha] \n", "\\leq \\frac{\\mathrm{Var}[X]}{\\alpha^2}.\n", "$$\n", "***\n", "\n", "The intuition is the following: if the expected squared deviation from the mean is small, then the deviation from the mean is unlikely to be large." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To formalize this we prove a more general inequality, Makov's inequality. In words, if a non-negative random variable has a small expectation then it is unlikely to be large.\n", "\n", "***\n", "**Lemma (Markov)** Let $Z$ be a non-negative random variable with finite expectation. Then, for any $\\beta > 0$,\n", "\n", "$$\n", "\\mathbb{P}[Z \\geq \\beta] \\leq \\frac{\\mathbb{E}[Z]}{\\beta}.\n", "$$\n", "***\n", "\n", "*Proof idea:* The quantity $\\beta \\,\\mathbb{P}[Z \\geq \\beta]$ is a lower bound on the expectation of $Z$ restricted to the range $\\{Z\\geq \\beta\\}$, which by non-negativity is itself lower bounded by $\\mathbb{E}[Z]$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*Proof:* Formally, let $\\mathbf{1}_A$ be the indicator of the event $A$, that is, it is the random variable that is $1$ when $A$ occurs and $0$ otherwise. By definition, the expectation of $\\mathbf{1}_A$ is \n", "\n", "$$\n", "\\mathbb{E}[A] = 0\\,\\mathbb{P}[\\mathbf{1}_A = 0] + 1\\,\\mathbb{P}[\\mathbf{1}_A = 1] = \\mathbb{P}[A]\n", "$$\n", "where $A^c$ is the complement of $A$. Hence, by linearity and monotonicity,\n", "\n", "$$\n", "\\beta \\,\\mathbb{P}[Z \\geq \\beta] \n", "= \\beta \\,\\mathbb{E}[\\mathbf{1}_{Z \\geq \\beta}]\n", "= \\mathbb{E}[\\beta \\mathbf{1}_{Z \\geq \\beta}]\n", "\\leq \\mathbb{E}[Z].\n", "$$\n", "\n", "Rearranging gives the claim. $\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Finally we return to the proof of *Chebyshev*.\n", "\n", "*Proof idea (Chebyshev):* Simply apply Markov to the squared deviation of $X$ from its mean. \n", "\n", "*Proof (Chebyshev):* Let $Z = (X - \\mathbb{E}[X])^2$, which is non-negative by definition. Hence, by *Markov*, for any $\\beta = \\alpha^2 > 0$\n", "\n", "\n", "\\begin{align}\n", "\\mathbb{P}[|X - \\mathbb{E}[X]| \\geq \\alpha]\n", "&= \\mathbb{P}[(X - \\mathbb{E}[X])^2 \\geq \\alpha^2]\\\\\n", "&= \\mathbb{P}[Z \\geq \\beta]\\\\\n", "&\\leq \\frac{\\mathbb{E}[Z]}{\\beta}\\\\\n", "&= \\frac{\\mathrm{Var}[X]}{\\alpha^2}\n", "\\end{align}\n", "\n", "where we used the definition of the variance in the last equality.$\\square$\n", "\n", "Chebyshev's inequality is particularly useful when combined with independence." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "#### 1.3.2 Independence and limit theorems" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Recall that discrete random variables $X$ and $Y$ are independent if their joint PMF factorizes, that is\n", "\n", "$$\n", "p_{X,Y}(x,y) = p_X(x) \\,p_Y(y), \\qquad \\forall x, y\n", "$$\n", "\n", "where $p_{X,Y}(x,y) = \\mathbb{P}[X=x, Y=y]$. Similarly, continuous random variables $X$ and $Y$ are independent if their joint PDF factorizes. One consequence is that expectations of products of single-variable functions factorize as well, that is, for functions $g$ and $h$ we have\n", "\n", "$$\n", "\\mathbb{E}[g(X) h(Y)]\n", "= \\mathbb{E}[g(X)] \\,\\mathbb{E}[h(Y)].\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The latter has the following important implication for the variance. If $X_1, \\ldots, X_n$ are independent, real-valued random variables, then\n", "\n", "$$\n", "\\mathrm{Var}[X_1 + \\cdots + X_n]\n", "= \\mathrm{Var}[X_1] + \\cdots + \\mathrm{Var}[X_n].\n", "$$\n", "\n", "Notice that, unlike the case of the expectation, this equation for the variance requires independence in general." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Applied to the sample mean of $n$ independent, identically distributed (i.i.d.) random variables $X_1,\\ldots,X_n$, we obtain\n", "\n", "\n", "\\begin{align}\n", "\\mathrm{Var}\n", "\\left[\\frac{1}{n} \\sum_{i=1}^n X_i\\right]\n", "&= \\frac{1}{n^2} \\sum_{i=1}^n \\mathrm{Var}[X_i]\\\\\n", "&= \\frac{1}{n^2} n \\,\\mathrm{Var}[X_1]\\\\\n", "&= \\frac{\\mathrm{Var}[X_1]}{n}.\n", "\\end{align}\n", "\n", "\n", "So the variance of the sample mean decreases as $n$ gets large, while its expectation remains the same by linearity\n", "\n", "\n", "\\begin{align}\n", "\\mathbb{E}\n", "\\left[\\frac{1}{n} \\sum_{i=1}^n X_i\\right]\n", "&= \\frac{1}{n} \\sum_{i=1}^n \\mathbb{E}[X_i]\\\\\n", "&= \\frac{1}{n} n \\,\\mathbb{E}[X_1]\\\\\n", "&= \\mathbb{E}[X_1].\n", "\\end{align}\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Together with Chebyshev's inequality, we immediately get that the sample mean approaches its expectation in the following probabilistic sense.\n", "\n", "***\n", "**Theorem (Law of Large Numbers)** Let $X_1, \\ldots, X_n$ be i.i.d. For any $\\varepsilon > 0$, as $n \\to +\\infty$,\n", "\n", "$$\n", "\\mathbb{P}\\left[\\left|\\frac{1}{n} \\sum_{i=1}^n X_i \n", "- \\mathbb{E}[X_1]\\right| \\geq \\varepsilon\\right]\n", "\\to 0.\n", "$$\n", "***" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*Proof:* By *Chebyshev* and the formulas above,\n", "\n", "\n", "\\begin{align}\n", "\\mathbb{P}\\left[\\left|\\frac{1}{n} \\sum_{i=1}^n X_i \n", "- \\mathbb{E}[X_1]\\right| \\geq \\varepsilon\\right]\n", "&= \\mathbb{P}\\left[\\left|\\frac{1}{n} \\sum_{i=1}^n X_i \n", "- \\mathbb{E} \\left[\\frac{1}{n} \\sum_{i=1}^n X_i\\right]\\right| \\geq \\varepsilon\\right]\\\\\n", "&\\leq \\frac{\\mathrm{Var}\\left[\\frac{1}{n} \\sum_{i=1}^n X_i\\right]}{\\varepsilon^2}\\\\\n", "&= \\frac{\\mathrm{Var}[X_1]}{n \\varepsilon^2}\\\\\n", "&\\to 0\n", "\\end{align}\n", "\n", "\n", "as $n \\to +\\infty$. $\\square$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**NUMERICAL CORNER** We can use simulations to confirm the *Law of Large Numbers*. Recall that a uniform random variable over the interval $[a,b]$ has density\n", "\n", "$$\n", "f_{X}(x)\n", "= \\begin{cases}\n", "\\frac{1}{b-a} & x \\in [a,b] \\\\\n", "0 & \\text{o.w.}\n", "\\end{cases}\n", "$$\n", "\n", "We write $X \\sim \\mathrm{U}[a,b]$. We can obtain a sample from $\\mathrm{U}[0,1]$ by using the function [rand](https://docs.julialang.org/en/v1.2/stdlib/Random/#Base.rand) in Julia. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1-element Array{Float64,1}:\n", " 0.6833180625581647" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rand(1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now we take $n$ samples from $\\mathrm{U}[0,1]$ and compute their sample mean. We repeat $k$ times and display the empirical distribution of the sample means using an [histogram](https://en.wikipedia.org/wiki/Histogram)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "lln_unif (generic function with 1 method)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function lln_unif(n, k) \n", " sample_mean = [mean(rand(n)) for i=1:k]\n", " histogram(sample_mean, \n", " legend=false, title=\"n=$n\", xlims=(0,1), nbin=15) # \"$n\" is string with value n\n", "end " ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lln_unif(10, 1000)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Taking $n$ much larger leads to more concentration around the mean." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lln_unif(100, 1000)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "*Exercise:* Recall that the cumulative distribution function (CDF) of a random variable $X$ is defined as \n", "\n", "$$\n", "F_X(z) = \\mathbb{P}[X \\leq z], \\qquad \\forall z \\in \\mathbb{R}.\n", "$$\n", "\n", "(a) Let $\\mathcal{Z}$ be the interval where $F_X(z) \\in (0,1)$ and assume that $F_X$ is strictly increasing on $\\mathcal{Z}$. Let $U \\sim \\mathrm{U}[0,1]$. Show that \n", "\n", "$$\n", "\\mathbb{P}[F_X^{-1}(U) \\leq z] = F_X(z).\n", "$$\n", "\n", "(b) Generate a sample from $\\mathrm{U}[a,b]$ for arbitrary $a$, $b$ using rand and the observation in (a). This is called the inverse transform sampling method." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# Try it!\n", "a, b = -1, 1;\n", "X = rand(1);\n", "# EDIT THIS LINE: transform X to obtain a random variable Y ~ U[a,b]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "$\\lhd$" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.5.1", "language": "julia", "name": "julia-1.5" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.5.1" } }, "nbformat": 4, "nbformat_minor": 2 }