# Optimality, convexity, and gradient descent¶

## 3 Convexity¶

Course: Math 535 - Mathematical Methods in Data Science (MMiDS)
Author: Sebastien Roch, Department of Mathematics, University of Wisconsin-Madison
Updated: Oct 10, 2020

### 3.1 Convex sets¶

Definition (Convex Set): A set $D \subseteq \mathbb{R}^d$ is convex if for all $\mathbf{x}, \mathbf{y} \in D$ and all $\alpha \in [0,1]$

$$(1-\alpha) \mathbf{x} + \alpha \mathbf{y} \in D.$$

$\lhd$

The following is a convex set:

(Source)

The following is not a convex set:

(Source)

Example: Any open ball in $\mathbb{R}^d$ is convex. Indeed, let $\delta > 0$ and $\mathbf{x}_0$. For any $\mathbf{x}, \mathbf{y} \in B_{\delta}(\mathbf{x}_0)$ and any $\alpha \in [0,1]$, we have

\begin{align*} \|[(1-\alpha) \mathbf{x} + \alpha \mathbf{y}] - \mathbf{x}_0\| &= \|(1-\alpha) (\mathbf{x} - \mathbf{x}_0) + \alpha (\mathbf{y} - \mathbf{x}_0)\|\\ &\leq \|(1-\alpha) (\mathbf{x} - \mathbf{x}_0)\| + \|\alpha (\mathbf{y} - \mathbf{x}_0)\|\\ &= (1-\alpha) \|\mathbf{x} - \mathbf{x}_0\| + \alpha \|\mathbf{y} - \mathbf{x}_0\|\\ &\leq (1-\alpha) \delta + \alpha \delta\\ &= \delta \end{align*}

where we used the triangle inequality on the second line. Hence we have established that $(1-\alpha) \mathbf{x} + \alpha \mathbf{y} \in B_{\delta}(\mathbf{x}_0)$. $\lhd$

A number of operations preserve convexity.

Lemma (Operations that Preserve Convexity): Let $S_1$ and $S_2$ be convex sets in $\mathbb{R}^d$. Let $\beta \in \mathbb{R}$ and $\mathbf{b} \in \mathbb{R}^d$. The following sets are also convex:

(a) $\beta S_1 = \{\beta \mathbf{x}\,:\, \mathbf{x} \in S_1\}$

(b) $S_1 + \mathbf{b} = \{\mathbf{x} + \mathbf{b}\,:\, \mathbf{x} \in S_1\}$

(c) $S_1 + S_2 = \{\mathbf{x}_1 + \mathbf{x}_2\,:\, \mathbf{x}_1 \in S_1 \text{ and } \mathbf{x}_2 \in S_2\}$

(d) $S_1 \times S_2 = \{(\mathbf{x}_1, \mathbf{x}_2) \,:\, \mathbf{x}_1 \in S_1 \text{ and } \mathbf{x}_2 \in S_2\}$

(e) $T = \{\mathbf{x}_1\,:\, (\mathbf{x}_1, \mathbf{x}_2) \in S_1 \times \mathbb{R}^d\}$

(f) $S_1 \cap S_2 = \{\mathbf{x}\,:\, \mathbf{x} \in S_1 \text{ and } \mathbf{x} \in S_2\}$

Proof: We only prove (f). The other statements are left as an exercise. Suppose $\mathbf{x}, \mathbf{y} \in S_1 \cap S_2$ and $\alpha \in [0,1]$. Then, by the convexity of $S_1$, $(1-\alpha) \mathbf{x} + \alpha \mathbf{y} \in S_1$ and, by the convexity of $S_2$, $(1-\alpha) \mathbf{x} + \alpha \mathbf{y} \in S_2$. Hence

$$(1-\alpha) \mathbf{x} + \alpha \mathbf{y} \in S_1 \cap S_2.$$

$\square$

### 3.2 Convex functions¶

Our main interest is in convex functions.

Definition (Convex Function): A function $f : \mathbb{R}^d \to \mathbb{R}$ is convex if, for all $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$ and all $\alpha \in [0,1]$

$$f((1-\alpha) \mathbf{x} + \alpha \mathbf{y}) \leq (1-\alpha) f(\mathbf{x}) + \alpha f(\mathbf{y}).$$

More generally, a function $f : D \to \mathbb{R}$ over a convex domains $D \subseteq \mathbb{R}^d$ is convex if the definition above holds over all $\mathbf{x}, \mathbf{y} \in D$. $\lhd$

(Source)

Lemma (Affine Functions are Convex): Let $\mathbf{w} \in \mathbb{R}^d$ and $b \in \mathbb{R}$. The function $f(\mathbf{x}) = \mathbf{w}^T \mathbf{x} + b$ is convex.

Proof: For any $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$ and $\alpha \in [0,1]$,

$$f((1-\alpha) \mathbf{x} + \alpha \mathbf{y}) = \mathbf{w}^T [(1-\alpha) \mathbf{x} + \alpha \mathbf{y}] + b = (1-\alpha)[\mathbf{w}^T \mathbf{x} + b] + \alpha [\mathbf{w}^T \mathbf{y} + b]$$

which proves the claim. $\square$

Lemma (First-Order Convexity Condition): Let $f : \mathbb{R}^d \to \mathbb{R}$ be continuously differentiable. Then $f$ is convex if and only if for all $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$

$$f(\mathbf{y}) \geq f(\mathbf{x}) + \nabla f(\mathbf{x})^T (\mathbf{y}-\mathbf{x}).$$

The condition is illustrated below.

(Source)

Proof (First-Order Convexity Condition): Suppose first that $f(\mathbf{z}_2) \geq f(\mathbf{z}_1) + \nabla f(\mathbf{z}_1)^T (\mathbf{z}_2-\mathbf{z}_1)$ for all $\mathbf{z}_1, \mathbf{z}_2$. For any $\mathbf{x}, \mathbf{y}$ and $\alpha \in [0,1]$, let $\mathbf{w} = (1-\alpha) \mathbf{x} + \alpha \mathbf{y}$. Then taking $\mathbf{z}_1 = \mathbf{w}$ and $\mathbf{z}_2 = \mathbf{x}$ gives

$$f(\mathbf{x}) \geq f(\mathbf{w}) + \nabla f(\mathbf{w})^T (\mathbf{x}-\mathbf{w})$$

and taking $\mathbf{z}_1 = \mathbf{w}$ and $\mathbf{z}_2 = \mathbf{y}$ gives

$$f(\mathbf{y}) \geq f(\mathbf{w}) + \nabla f(\mathbf{w})^T (\mathbf{y}-\mathbf{w}).$$

Multiplying the first inequality by $(1-\alpha)$ and the second one by $\alpha$, and adding them up gives

$$(1-\alpha) f(\mathbf{x}) + \alpha f(\mathbf{y}) \geq f(\mathbf{w}) + \nabla f(\mathbf{w})^T ([(1-\alpha) \mathbf{x} + \alpha \mathbf{y}] - \mathbf{w}) = f(\mathbf{w})$$

proving convexity.

For the other direction, assume that $f$ is convex. For any $\mathbf{x}, \mathbf{y}$ and $\alpha \in (0,1)$, by the Multivariate Mean Value Theorem, for some $\xi \in (0,1)$ it holds that

$$f(\mathbf{w}) = f(\mathbf{x} + \alpha (\mathbf{y} - \mathbf{x})) = f(\mathbf{x}) + \alpha (\mathbf{y} - \mathbf{x})^T \nabla f (\mathbf{x} + \xi \alpha (\mathbf{y} - \mathbf{x}))$$

while convexity implies

$$f(\mathbf{w}) \leq (1-\alpha) f(\mathbf{x}) + \alpha f(\mathbf{y}).$$

Combining, rearranging and dividing by $\alpha$ gives

$$(\mathbf{y} - \mathbf{x})^T \nabla f (\mathbf{x} + \xi \alpha (\mathbf{y} - \mathbf{x})) \leq f(\mathbf{y}) - f(\mathbf{x}).$$

Taking $\alpha \to 0$ gives the claim. $\square$

Lemma (Second-Order Convexity Condition): Let $f : \mathbb{R}^d \to \mathbb{R}$ be twice continuously differentiable. Then $f$ is convex if and only if, for all $\mathbf{x} \in \mathbb{R}^d$, $\mathbf{H}_f(\mathbf{x})$ is positive semidefinite.

$$f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^T P \mathbf{x} + \mathbf{q}^T \mathbf{x} + r.$$

We showed previously that the Hessian is

$$\mathbf{H}_f(\mathbf{x}) = \frac{1}{2}[P + P^T].$$

So $f$ is convex if and only if the matrix $\frac{1}{2}[P + P^T]$ has only nonnegative eigenvalues. $\lhd$

### 3.3 Convexity and unconstrained optimization¶

A key property of convex functions is the following.

Theorem (Global Minimizers of Convex Functions): Let $f : \mathbb{R}^d \to \mathbb{R}$ be a convex function. Then any local minimizer of $f$ is also a global minimizer.

Proof: By contradiction, suppose $\mathbf{x}_0$ is a local minimizer, but not a global minimizer. Then there is $\mathbf{y}$ such that

$$f(\mathbf{y}) < f(\mathbf{x}_0).$$

By convexity, for any $\alpha \in (0,1)$

$$f(\mathbf{x}_0 + \alpha (\mathbf{y} - \mathbf{x}_0)) \leq (1-\alpha) f(\mathbf{x}_0) + \alpha f(\mathbf{y}) < f(\mathbf{x}_0).$$

But that implies that every open ball around $\mathbf{x}_0$ contains a point taking a smaller value than $f(\mathbf{x}_0)$, a contradiction. $\square$

Theorem (First-Order Sufficient Condition for Convex Functions): Let $f : \mathbb{R}^d \to \mathbb{R}$ be a continuously differentiable, convex function. If $\nabla f(\mathbf{x}_0) = \mathbf{0}$ then $\mathbf{x}_0$ is a global minimizer.

Proof: Assume $\nabla f(\mathbf{x}_0) = \mathbf{0}$. By the First-Order Convexity Condition, for any $\mathbf{y}$

$$f(\mathbf{y}) - f(\mathbf{x}_0) \geq \nabla f(\mathbf{x}_0)^T (\mathbf{y} - \mathbf{x}_0) = 0.$$

That proves the claim. $\square$

$$f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^T P \mathbf{x} + \mathbf{q}^T \mathbf{x} + r$$

where $P$ is symmetric and positive definite. The Hessian is then

$$\mathbf{H}_f(\mathbf{x}) = \frac{1}{2}[P + P^T] = P$$

for any $\mathbf{x}$. So $f$ is convex. Further the gradient is

$$\nabla f(\mathbf{x}) = P\mathbf{x} + \mathbf{q}$$

for all $\mathbf{x}$.

Any $\mathbf{x}$ satisfying

$$P\mathbf{x} + \mathbf{q} = \mathbf{0}$$

is a global minimizer. If $P = Q \Lambda Q^T$ is a spectral decomposition of $P$, where all diagonal entries of $\Lambda$ are stricly positive, then $P^{-1} = Q \Lambda^{-1} Q^T$ where the diagonal entries of $\Lambda^{-1}$ are the inverses of those of $\Lambda$. That can be seen by checking that

$$Q \Lambda Q^T Q \Lambda^{-1} Q^T = Q I_{d\times d} Q^T = I_{d \times d}.$$

So the following is a global minimizer

$$\mathbf{x}^* = - Q \Lambda^{-1} Q^T \mathbf{q}.$$

$\lhd$