Course: Math 535 - Mathematical Methods in Data Science (MMiDS)
Author: Sebastien Roch, Department of Mathematics, University of Wisconsin-Madison
Updated: Oct 10, 2020
Copyright: © 2020 Sebastien Roch
We start with convex sets.
Definition (Convex Set): A set $D \subseteq \mathbb{R}^d$ is convex if for all $\mathbf{x}, \mathbf{y} \in D$ and all $\alpha \in [0,1]$
$$ (1-\alpha) \mathbf{x} + \alpha \mathbf{y} \in D. $$$\lhd$
Example: Any open ball in $\mathbb{R}^d$ is convex. Indeed, let $\delta > 0$ and $\mathbf{x}_0$. For any $\mathbf{x}, \mathbf{y} \in B_{\delta}(\mathbf{x}_0)$ and any $\alpha \in [0,1]$, we have
$$ \begin{align*} \|[(1-\alpha) \mathbf{x} + \alpha \mathbf{y}] - \mathbf{x}_0\| &= \|(1-\alpha) (\mathbf{x} - \mathbf{x}_0) + \alpha (\mathbf{y} - \mathbf{x}_0)\|\\ &\leq \|(1-\alpha) (\mathbf{x} - \mathbf{x}_0)\| + \|\alpha (\mathbf{y} - \mathbf{x}_0)\|\\ &= (1-\alpha) \|\mathbf{x} - \mathbf{x}_0\| + \alpha \|\mathbf{y} - \mathbf{x}_0\|\\ &\leq (1-\alpha) \delta + \alpha \delta\\ &= \delta \end{align*} $$where we used the triangle inequality on the second line. Hence we have established that $(1-\alpha) \mathbf{x} + \alpha \mathbf{y} \in B_{\delta}(\mathbf{x}_0)$. $\lhd$
A number of operations preserve convexity.
Lemma (Operations that Preserve Convexity): Let $S_1$ and $S_2$ be convex sets in $\mathbb{R}^d$. Let $\beta \in \mathbb{R}$ and $\mathbf{b} \in \mathbb{R}^d$. The following sets are also convex:
(a) $\beta S_1 = \{\beta \mathbf{x}\,:\, \mathbf{x} \in S_1\}$
(b) $S_1 + \mathbf{b} = \{\mathbf{x} + \mathbf{b}\,:\, \mathbf{x} \in S_1\}$
(c) $S_1 + S_2 = \{\mathbf{x}_1 + \mathbf{x}_2\,:\, \mathbf{x}_1 \in S_1 \text{ and } \mathbf{x}_2 \in S_2\}$
(d) $S_1 \times S_2 = \{(\mathbf{x}_1, \mathbf{x}_2) \,:\, \mathbf{x}_1 \in S_1 \text{ and } \mathbf{x}_2 \in S_2\}$
(e) $T = \{\mathbf{x}_1\,:\, (\mathbf{x}_1, \mathbf{x}_2) \in S_1 \times \mathbb{R}^d\}$
(f) $S_1 \cap S_2 = \{\mathbf{x}\,:\, \mathbf{x} \in S_1 \text{ and } \mathbf{x} \in S_2\}$
Proof: We only prove (f). The other statements are left as an exercise. Suppose $\mathbf{x}, \mathbf{y} \in S_1 \cap S_2$ and $\alpha \in [0,1]$. Then, by the convexity of $S_1$, $(1-\alpha) \mathbf{x} + \alpha \mathbf{y} \in S_1$ and, by the convexity of $S_2$, $(1-\alpha) \mathbf{x} + \alpha \mathbf{y} \in S_2$. Hence
$$ (1-\alpha) \mathbf{x} + \alpha \mathbf{y} \in S_1 \cap S_2. $$$\square$
Our main interest is in convex functions.
Definition (Convex Function): A function $f : \mathbb{R}^d \to \mathbb{R}$ is convex if, for all $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$ and all $\alpha \in [0,1]$
$$ f((1-\alpha) \mathbf{x} + \alpha \mathbf{y}) \leq (1-\alpha) f(\mathbf{x}) + \alpha f(\mathbf{y}). $$More generally, a function $f : D \to \mathbb{R}$ over a convex domains $D \subseteq \mathbb{R}^d$ is convex if the definition above holds over all $\mathbf{x}, \mathbf{y} \in D$. $\lhd$
(Source)
Lemma (Affine Functions are Convex): Let $\mathbf{w} \in \mathbb{R}^d$ and $b \in \mathbb{R}$. The function $f(\mathbf{x}) = \mathbf{w}^T \mathbf{x} + b$ is convex.
Proof: For any $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$ and $\alpha \in [0,1]$,
$$ f((1-\alpha) \mathbf{x} + \alpha \mathbf{y}) = \mathbf{w}^T [(1-\alpha) \mathbf{x} + \alpha \mathbf{y}] + b = (1-\alpha)[\mathbf{w}^T \mathbf{x} + b] + \alpha [\mathbf{w}^T \mathbf{y} + b] $$which proves the claim. $\square$
Lemma (First-Order Convexity Condition): Let $f : \mathbb{R}^d \to \mathbb{R}$ be continuously differentiable. Then $f$ is convex if and only if for all $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$
$$ f(\mathbf{y}) \geq f(\mathbf{x}) + \nabla f(\mathbf{x})^T (\mathbf{y}-\mathbf{x}). $$Proof (First-Order Convexity Condition): Suppose first that $f(\mathbf{z}_2) \geq f(\mathbf{z}_1) + \nabla f(\mathbf{z}_1)^T (\mathbf{z}_2-\mathbf{z}_1)$ for all $\mathbf{z}_1, \mathbf{z}_2$. For any $\mathbf{x}, \mathbf{y}$ and $\alpha \in [0,1]$, let $\mathbf{w} = (1-\alpha) \mathbf{x} + \alpha \mathbf{y}$. Then taking $\mathbf{z}_1 = \mathbf{w}$ and $\mathbf{z}_2 = \mathbf{x}$ gives
$$ f(\mathbf{x}) \geq f(\mathbf{w}) + \nabla f(\mathbf{w})^T (\mathbf{x}-\mathbf{w}) $$and taking $\mathbf{z}_1 = \mathbf{w}$ and $\mathbf{z}_2 = \mathbf{y}$ gives
$$ f(\mathbf{y}) \geq f(\mathbf{w}) + \nabla f(\mathbf{w})^T (\mathbf{y}-\mathbf{w}). $$Multiplying the first inequality by $(1-\alpha)$ and the second one by $\alpha$, and adding them up gives
$$ (1-\alpha) f(\mathbf{x}) + \alpha f(\mathbf{y}) \geq f(\mathbf{w}) + \nabla f(\mathbf{w})^T ([(1-\alpha) \mathbf{x} + \alpha \mathbf{y}] - \mathbf{w}) = f(\mathbf{w}) $$proving convexity.
For the other direction, assume that $f$ is convex. For any $\mathbf{x}, \mathbf{y}$ and $\alpha \in (0,1)$, by the Multivariate Mean Value Theorem, for some $\xi \in (0,1)$ it holds that
$$ f(\mathbf{w}) = f(\mathbf{x} + \alpha (\mathbf{y} - \mathbf{x})) = f(\mathbf{x}) + \alpha (\mathbf{y} - \mathbf{x})^T \nabla f (\mathbf{x} + \xi \alpha (\mathbf{y} - \mathbf{x})) $$while convexity implies
$$ f(\mathbf{w}) \leq (1-\alpha) f(\mathbf{x}) + \alpha f(\mathbf{y}). $$Combining, rearranging and dividing by $\alpha$ gives
$$ (\mathbf{y} - \mathbf{x})^T \nabla f (\mathbf{x} + \xi \alpha (\mathbf{y} - \mathbf{x})) \leq f(\mathbf{y}) - f(\mathbf{x}). $$Taking $\alpha \to 0$ gives the claim. $\square$
Lemma (Second-Order Convexity Condition): Let $f : \mathbb{R}^d \to \mathbb{R}$ be twice continuously differentiable. Then $f$ is convex if and only if, for all $\mathbf{x} \in \mathbb{R}^d$, $\mathbf{H}_f(\mathbf{x})$ is positive semidefinite.
Example: Consider the quadratic function
$$ f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^T P \mathbf{x} + \mathbf{q}^T \mathbf{x} + r. $$We showed previously that the Hessian is
$$ \mathbf{H}_f(\mathbf{x}) = \frac{1}{2}[P + P^T]. $$So $f$ is convex if and only if the matrix $\frac{1}{2}[P + P^T]$ has only nonnegative eigenvalues. $\lhd$
A key property of convex functions is the following.
Theorem (Global Minimizers of Convex Functions): Let $f : \mathbb{R}^d \to \mathbb{R}$ be a convex function. Then any local minimizer of $f$ is also a global minimizer.
Proof: By contradiction, suppose $\mathbf{x}_0$ is a local minimizer, but not a global minimizer. Then there is $\mathbf{y}$ such that
$$ f(\mathbf{y}) < f(\mathbf{x}_0). $$By convexity, for any $\alpha \in (0,1)$
$$ f(\mathbf{x}_0 + \alpha (\mathbf{y} - \mathbf{x}_0)) \leq (1-\alpha) f(\mathbf{x}_0) + \alpha f(\mathbf{y}) < f(\mathbf{x}_0). $$But that implies that every open ball around $\mathbf{x}_0$ contains a point taking a smaller value than $f(\mathbf{x}_0)$, a contradiction. $\square$
Theorem (First-Order Sufficient Condition for Convex Functions): Let $f : \mathbb{R}^d \to \mathbb{R}$ be a continuously differentiable, convex function. If $\nabla f(\mathbf{x}_0) = \mathbf{0}$ then $\mathbf{x}_0$ is a global minimizer.
Proof: Assume $\nabla f(\mathbf{x}_0) = \mathbf{0}$. By the First-Order Convexity Condition, for any $\mathbf{y}$
$$ f(\mathbf{y}) - f(\mathbf{x}_0) \geq \nabla f(\mathbf{x}_0)^T (\mathbf{y} - \mathbf{x}_0) = 0. $$That proves the claim. $\square$
Example: Consider the quadratic function
$$ f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^T P \mathbf{x} + \mathbf{q}^T \mathbf{x} + r $$where $P$ is symmetric and positive definite. The Hessian is then
$$ \mathbf{H}_f(\mathbf{x}) = \frac{1}{2}[P + P^T] = P $$for any $\mathbf{x}$. So $f$ is convex. Further the gradient is
$$ \nabla f(\mathbf{x}) = P\mathbf{x} + \mathbf{q} $$for all $\mathbf{x}$.
Any $\mathbf{x}$ satisfying
$$ P\mathbf{x} + \mathbf{q} = \mathbf{0} $$is a global minimizer. If $P = Q \Lambda Q^T$ is a spectral decomposition of $P$, where all diagonal entries of $\Lambda$ are stricly positive, then $P^{-1} = Q \Lambda^{-1} Q^T$ where the diagonal entries of $\Lambda^{-1}$ are the inverses of those of $\Lambda$. That can be seen by checking that
$$ Q \Lambda Q^T Q \Lambda^{-1} Q^T = Q I_{d\times d} Q^T = I_{d \times d}. $$So the following is a global minimizer
$$ \mathbf{x}^* = - Q \Lambda^{-1} Q^T \mathbf{q}. $$$\lhd$