# Problem 1 [30 points] Maximum and minimum eigenvalues

Consider the maximum eigenvalue $\lambda_{\max}$ and the minimum eigenvalue $\lambda_{\min}$ for any $X\in\mathbb{S}^{n}$. 

The purpose of this exercise is to investigate whether $\lambda_{\max}(X)$ and $\lambda_{\min}(X)$ are convex functions of the matrix variable $X\in\mathbb{S}^{n}$.

## (a) [10 points] An inequality

For any $x\in\mathbb{R}^{n}$ and any $X\in\mathbb{S}^{n}$, **derive** the inequality:

$$\lambda_{\min}(X) \leq \dfrac{x^{\top}Xx}{x^{\top}x} \leq \lambda_{\max}(X).$$

(**Hint:** start from the spectral theorem applied to a symmetric matrix.)

## (b) [10 points] Variational characterization of $\lambda_{\max}$ and $\lambda_{\min}$

Use part (a), to **prove** the following two formula:

$$\lambda_{\max}(X) = \underset{\|x\|_{2}=1}{\sup} x^{\top}X x, \qquad \lambda_{\min}(X) = \underset{\|x\|_{2}=1}{\inf} x^{\top}X x, \qquad x\in\mathbb{R}^{n}, \quad X\in\mathbb{S}^{n}.$$

(**Hint:** think if and when the equalities are achieved in the weak inequality derived in part (a).)

## (c) [10 points] Function convexity

Use part (b) to argue which of the two functions $f(X) := \lambda_{\max}(X)$ and $g(X) = \lambda_{\min}(X)$ is/are convex over $\textbf{dom}(f) = \textbf{dom}(g) = \mathbb{S}^{n}$. Explain.

(**Hint:** think operations preserving function convexity/concavity from Lec. 8.)

# Problem 2 [30 points] Support function revisited

Consider the support function $h(y)$ of any compact convex set introduced in HW3, Prob. 1(b).

## (a) [1 + 9 = 10 points] Function convexity

Is the support function convex or concave or neither? Give reasons to explain your answer.

(**Hint:** again think operations preserving function convexity/concavity from Lec. 8.)

## (b) [(1 + 7) + 2 = 10 points] Convex conjugate

(i) What is the function $f(x)$ whose convex conjugate equals the support function $h(y)$? Give reasons to explain your answer.

(ii) What is the geometric interpretation of your result in 2(b)(i)?

## (c) [(1 + 4) + (1 + 4) = 10 points] Epigraph

(i) Is the epigraph of support function: $\text{epi}(h)$ a cone? Why/why not?

(ii) Is $\text{epi}(h)$ a convex cone? Why/why not? 

# Problem 3 [40 points] Bregman divergence

Given a **strictly convex differentiable function** $\psi$ over a closed convex set $\mathcal{X}$, the associated Bregman divergence $D_{\psi}:\mathcal{X}\times\mathcal{X}\mapsto\mathbb{R}_{\geq 0}$ is defined as

$$D_{\psi}(x,y) := \psi(x) - \psi(y) - \langle\nabla\psi(y), x-y\rangle, \qquad\forall x,y\in\mathcal{X}.$$

The above formula has a **clean geometric meaning**: the Bregman divergence quantifies the amount by which a strictly convex function lies above its linear approximation (tangent hyperplane).

For example, when $\psi(x) = \|x\|_{2}^{2}$, then for any closed $\mathcal{X}\subset\mathbb{R}^{n}$, we have $D_{\psi}(x,y) = \|x-y\|_{2}^{2}$, the squared Euclidean distance. (It is a good idea to verify this yourself!)

As another example, when $\psi(x) = \sum_{i=1}^{n}x_{i}\log x_{i}$, $\mathcal{X}\equiv \{x\in\mathbb{R}^{n}_{\geq 0}\mid \sum_{i=1}^{n}x_{i}=1\}$ (the standard simplex, see Lec. 5, p. 15), then $D_{\psi}(x,y) = \sum_{i=1}^{n}x_{i}\log(x_{i}/y_{i})$, the Kullback-Leibler divergence/relative entropy. (Again, good idea to verify this yourself!)

Even though $D_{\psi}(x,y)\neq D_{\psi}(y,x)$ in general, the Bregman divergence has [many nice properties](https://en.wikipedia.org/wiki/Bregman_divergence#Properties), and appears frequently in the machine learning applications.

## (a) [(1 + 4) + (2 + 3) + 5 = 15 points] Function convexity

(i) Is $D_{\psi}(x,y)$ **strictly** convex in $x$? Why/why not?

(ii) If $\psi$ is **$m$-strongly convex**, then prove that $D_{\psi}(x,y) \geq \frac{m}{2}\|x-y\|_{2}^{2}$, **and** that $D_{\psi}(x,y)$ is $m$-strongly convex in $x$. 

(iii) Construct a counterexample to demonstrate that $D_{\psi}(x,y)$ **need not be convex in its second argument $y$**.

## (b) [5 + 5 + 15 = 25 points] Distance-like function

The Bregman divergence $D_{\psi}(x,y)$ behaves like **the square of** a generalized distance between elements in $\mathcal{X}$.

(i) **Prove** that 
$$\nabla_{x}D_{\psi}(x,y) = \nabla_{x}\psi(x) - \nabla_{y}\psi(y),$$ 

where the subscript of the gradient operator $\nabla$ shows the gradient is taken with respect to which vector. 

(ii) **Prove** the generalized [law of cosines](https://en.wikipedia.org/wiki/Law_of_cosines):

$$D_{\psi}(x,y) = D_{\psi}(x,z) + D_{\psi}(z,y) - \langle x-z, \nabla\psi(y) - \nabla\psi(z)\rangle, \qquad \forall x,y,z\in\mathcal{X}.$$

**Remark:** Notice that for $\psi(x) = \|x\|_{2}^2$, this reduces to $\|x-y\|_{2}^{2} = \|x-z\|_{2}^{2} + \|z-y\|_{2}^{2} - 2\underbrace{\langle x-z, y-z\rangle}_{\|x-z\|_{2}\|y-z\|_{2}\cos\angle yzx}$.

(iii) Recall that $\mathcal{X}\equiv\textbf{dom}(\psi)$. Consider a closed convex set $\mathcal{C}$ such that $\mathcal{C}\cap\text{interior}\left(\mathcal{X}\right)\neq\emptyset$. For notational convenience, introduce $\mathcal{A}:=\mathcal{C}\cap\text{interior}\left(\mathcal{X}\right)$, which is a convex set.

Define the **Bregman projection** of $y$ onto $\mathcal{A}$ with respect to $\psi$ as 

$${\rm{proj}}_{\mathcal{A}}\left(y\right) := \underset{a\in\mathcal{A}}{\arg\min} \,D_{\psi}\left(a, y\right). \qquad $$

Using parts (i)-(ii), **prove** the genralized [Pythagorean theorem](https://en.wikipedia.org/wiki/Pythagorean_theorem):
$$D_{\psi}(x,y) \geq D_{\psi}(x, {\rm{proj}}_{\mathcal{A}}\left(y\right)) \: + \: D_{\psi}( {\rm{proj}}_{\mathcal{A}}\left(y\right),y).$$

**Hint:** If $a^{\text{opt}}$ is a local minimizer of the (possibly nonconvex) function $f(a)$ over $\mathcal{A}$, then the directional derivative 
$$\bigg\langle a - a^{\text{opt}}, \nabla_{a} f(a)\big\vert_{a=a^{\text{opt}}}\bigg\rangle \geq 0, \qquad \forall a\in\mathcal{A}.$$
If $f$ is convex over $\mathcal{A}$, then the above inequality is also sufficient to guarantee that $a^{\text{opt}}$ is a minimizer.

**Remark:** In the generalized Pythagorean theorem, the equality is achieved when the set $\mathcal{A}$ is affine.