Duality and Optimization

Titus Pinta <2025-12-17 Wed>

First Attempt
Second Attempt
General Perturbations
Fenchel Duality
Lagrangian Duality
Strong Duality*

We aim to solve a standard optimization problem $\inf_{x\in C\subseteq\mathbb{R}^{n}}f(x),$ for some, usually convex, $C$ and $f:C\to\overline{\mathbb{R}}$ . If we use a numerical solver, how do we know how far the approximate solution is from the true solution? We need something called a "certificate of optimality". The general framework is to consider an auxiliary function $g:D\subseteq\mathbb{R}^{m}\to\overline{\mathbb{R}}$ such that

$\forall x\in C\text{ and }\forall y\in D,\quad f(x)\geq g(y).$ This implies that $\inf_{x\in C}f(x)\geq\sup_{y\in D}g(y),$ suggesting the dual problem $\sup_{y\in D}g(y),$

This approach is useful because for $x^{*}$ a global solution for the primal, $y^{*}$ a global solution for the dual and $x^{n}$ and $y^{n}$ , approximations of the two produced by some numerical scheme, we have $f(x^{n})\geq f(x^{*})\geq g(y^{*})\geq g(y^{n}),$ and therefore $f(x^{n})-f(x^{*})\leq f(x^{n})-g(y^{n}).$

Not all primal dual pairs $(f,g)$ are created equal. The difference in the values is called the duality gap $\operatorname{gap}(f,g)=\inf_{x\in C}f(x)-\sup_{y\in D}g(y).$ In the lucky case that $\operatorname{gap}(f,g)=0$ we say that strong duality holds.

The only thing left to do is to figure out how to construct a good $g$ from a given $f$ .

First Attempt

The most naive approach is to use the Fenchel-Young inequality. Recall the definition of the convex conjugate $f_{*}(y)=\sup_{x\in\mathbb{R}^{n}}\langle x,y\rangle-f(x)$ From this definition we can immediately see that $\forall x,y\in\mathbb{R}^{n},\quad f_{*}(y)\geq\langle x,y\rangle-f(x).$

This is not exactly the same form as the duality principle from (1), but this is also true for $x^{*}$ the global minimum of $f$ , so $\forall y\in\mathbb{R}^{n},\quad f(x^{*})\geq\langle x^{*},y\rangle-f_{*}(y),$ and clearly $f(x)\geq f(x^{*})$ , yielding $\forall x,y\in\mathbb{R}^{n},\quad f(x)\geq\langle x^{*},y\rangle-f_{*}(y).$

We now have a function $g$ in the correct form. Unfortunately

$g(y)=\langle x^{*},y\rangle-f_{*}(y)$ so the expression for $g$ involves $x^{*}$ and is uncomputable in general. But this attempt was a step in the right direction, we now need to investigate how to control the inner product part.

Let $f:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ be convex lower semi continous with a unique minimizer at $x^{*}$ and consider $g:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ defined by (2). Compute $\operatorname{gap}(f,g)$ , i.e. $\operatorname{gap}(f,g)=\inf_{x\in\mathbb{R}^{n}}f(x)-\sup_{y\in\mathbb{R}^{n}% }g(y).$

Since $x^{*}$ is the minimum of $f$ we know that $0\in\partial f(x^{*})$ and the Fenchel-Young theorem guarantees the equality $f(x^{*})=\langle x^{*},0\rangle-f_{*}(0),$ therefore $f(x^{*})=g(0)$ and $\operatorname{gap}(f,g)=0$ .

Second Attempt

In our previous attempt we were left with a hanging $\langle x^{*},y\rangle$ term. In order to avoid it, let's apply Fenchel-Young to a slightly perturbed function $\varphi:\mathbb{R}^{n}\times\mathbb{R}^{n}\to\overline{\mathbb{R}}$ , $\varphi(x,u)=f(x)+\langle x,u\rangle.$ The hope is that adding this inner product term might give us something to cancel $\langle x^{*},y\rangle$ .

Fenchel-Young for $\varphi$ reads $\varphi(x,u)\geq\langle x,v\rangle+\langle u,y\rangle-\varphi_{*}(v,y).$ It is worth pointing out that $\varphi(x,0)=f(x)$ , so maybe a strategic evaluation of $\varphi$ and $\varphi_{*}$ can yield a useful form of duality. Setting $u=0$ and $v=0$ gives the promising $\varphi(x,0)\geq-\varphi_{*}(0,y),$ since the inner product part is $\langle x,0\rangle+\langle 0,y\rangle$ .

This is exactly the form we want, with $g(y)=-\varphi_{*}(0,y)$ . We hope that $\varphi_{*}$ might be easier to compute. Looking back at the development in this section, we have never used the exact definition of $\varphi$ . In the next section we will see that maybe choosing $\varphi$ in a smarter way can be better.

General Perturbations

In this section, we consider general function $\varphi:\mathbb{R}^{n}\times\mathbb{R}^{n}\to\overline{\mathbb{R}}$ with the property that $\varphi(x,0)=f(x)$ We will call such a function $\varphi$ a perturbation function. The same Fenchel-Young argument gives $\varphi(x,u)\geq\langle x,v\rangle+\langle u,y\rangle-\varphi_{*}(v,y),$ and after the same clever evaluation $\varphi(x,0)\geq-\varphi_{*}(0,y).$

This again has the form of a duality theory, where

$\inf_{x\in\mathbb{R}^{n}}\varphi(x,0)\geq\sup_{y\in\mathbb{R}^{n}}-\varphi_{*}% (0,y).$ For a general convex optimization problem we can advance no further. The next step is to assume some special structure of the problem and to construct a perturbation function such that $\varphi_{*}(0,y)$ is actually computable.

Fenchel Duality

The particular structured problem we will consider in this section is $\inf_{x\in\mathbb{R}^{n}}f_{1}(x)+f_{2}(x),$ with $f_{1}$ and $f_{2}$ convex. With experience, I can see that $\varphi(x,u)=f_{1}(x)+f_{2}(x-u)$ is a promising candidate for a perturbation function.

Let's investigate $\begin{split}\varphi_{*}(0,y)&=\sup_{x,u\in\mathbb{R}^{n}}\langle x,0\rangle+% \langle u,y\rangle-\varphi(x,u)\\ &=\sup_{x,u\in\mathbb{R}^{n}}\langle u,y\rangle-f_{1}(x)-f_{2}(x-u)\\ &=\sup_{x,u\in\mathbb{R}^{n}}\langle u-x,y\rangle+\langle x,y\rangle-f_{1}(x)-% f_{2}(x-u)\\ &=\sup_{x,u\in\mathbb{R}^{n}}\langle x-u,-y\rangle+\langle x,y\rangle-f_{1}(x)% -f_{2}(x-u)\\ &=\sup_{\begin{subarray}{c}x,u\in\mathbb{R}^{n}\end{subarray}_{w=x-u}}\langle w% ,-y\rangle+\langle x,y\rangle-f_{1}(x)-f_{2}(w)\\ &=\sup_{x,w\in\mathbb{R}^{n}}\langle w,-y\rangle+\langle x,y\rangle-f_{1}(x)-f% _{2}(w)\\ &=\sup_{x\in\mathbb{R}^{n}}\langle x,y\rangle-f_{1}(x)+\sup_{w\in\mathbb{R}^{n% }}\langle w,-y\rangle-f_{2}(w)\\ &={f_{1}}_{*}(y)+{f_{2}}_{*}(-y).\end{split}$

Substituting in (3) gives

$\inf_{x\in\mathbb{R}^{n}}f_{1}(x)+f_{2}(x)\geq\sup_{y\in\mathbb{R}^{n}}-{f_{1}% }_{*}(y)-{f_{2}}_{*}(-y).$ This inequality is called Fenchel's duality formula. In the last section we will see a Theorem that allows use to compute $\operatorname{gap}(f_{1}+f_{2},-{f_{1}}_{*}-{f_{2}}_{*}(-\cdot))$ .

For an arbitrary convex $f:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ it is clear that $f=f_{1}+f_{2}$ where $f_{1}=f$ and $f_{2}=0$ . Use this in (4).

After the direct substitution and using an abuse of notation to denote the constant zero function by $0$ , we get $\inf_{x\in\mathbb{R}^{n}}f(x)\geq\sup_{y\in\mathbb{R}^{n}}-f_{*}(y)-0_{*}(-y).$ This seems very promising, but let's compute $0_{*}$ $0_{*}(-y)=\sup_{x\in\mathbb{R}^{n}}-\langle x,y\rangle=\left\{\begin{array}[]{% lr}0&\quad\text{ if }y=0\\ \infty&\quad\text{ else }\end{array}\right..$ This guarantees that the supremum will be attained for $y=0$ , because otherwise the right hand side is $-\infty$ , so $\inf_{x\in\mathbb{R}^{n}}f(x)\geq-f_{*}(0).$ Does this resemble something? It is the exact same formula as in the first section.

Lagrangian Duality

Another well understood case is that of constrained optimization. Here we consider the problem

$\inf_{x\in\{z~{}|~{}G(z)=0\}}f(x)$ for some $f:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ and $G:\mathbb{R}^{n}\to\mathbb{R}^{m}$ .

The perturbation function we will consider is $\varphi(x,u)=\left\{\begin{array}[]{lr}f(x)&\quad\text{ if }G(x)=u\\ \infty&\quad\text{ else }\end{array}\right..$ Let's compute $\varphi_{*}(0,y)=\sup_{x\in\mathbb{R}^{n},u\in\mathbb{R}^{m}}\langle u,y% \rangle-\varphi(x,u).$ Since if $G(x)\neq u$ we would have $-\varphi(x,u)=-\infty$ , we can conclude that $G(x)=u$ and that

$\varphi_{*}(0,y)=\sup_{x\in\mathbb{R}^{n}}\langle G(x),y\rangle-f(x).$

The argument the supremum in (6) appears often enough in optimization that it received a name: the lagrangian. More concretly: the function $\ell:\mathbb{R}^{n}\times\mathbb{R}^{m}\to\overline{\mathbb{R}}$ defined by $\ell(x,y)=f(x)-\langle y,G(x)\rangle$ is called the lagrangian of (5). This object plays a vital role in the first-order theory of constrained optimization.

Armed with the lagrangian we can compute $\varphi_{*}(0,y)=\sup_{x\in\mathbb{R}^{n}}-\ell(x,y)=-\inf_{x\in\mathbb{R}^{n}% }\ell(x,y),$ and with this we obtain the duality formula $\inf_{x\in\{z~{}|~{}G(z)=0\}}f(x)\geq\sup_{y\in\mathbb{R}^{m}}\inf_{x\in% \mathbb{R}^{n}}\ell(x,y).$

Let's investigate one more thing. Observe that $\sup_{y\in\mathbb{R}^{n}}\ell(x,y)=\left\{\begin{array}[]{lr}f(x)&\quad\text{ % if }G(x)=0\\ \infty&\quad\text{ else }\end{array}\right..$ Therefore $\inf_{x\in\{z~{}|~{}G(z)=0\}}f(x)=\inf_{x\in\mathbb{R}^{n}}\sup_{y\in\mathbb{R% }^{m}}\ell(x,y),$ and the final formulation of the lagrangian duality is $\inf_{x\in\mathbb{R}^{n}}\sup_{y\in\mathbb{R}^{m}}\ell(x,y)\geq\sup_{y\in% \mathbb{R}^{m}}\inf_{x\in\mathbb{R}^{n}}\ell(x,y).$ In other words, duality comes from commuting $\inf$ and $\sup$ for the lagrangian function and strong duality holds exactly for those problems whit lagrangians for which $\inf$ and $\sup$ commute.

The problem from (5) can be equivalently written as $\inf_{x\in\mathbb{R}^{n}}f(x)+\iota_{0}(G(x)),$ where $\iota_{0}(x)=\left\{\begin{array}[]{lr}0&\quad\text{ if }x=0\\ \infty&\quad\text{ else }\end{array}\right..$ This form of the problem is suitable for the ideas from Fenchel duality. Use the perturbation function $\varphi(x,u)=f(x)+\iota_{0}(G(x)-u)$ to derive a duality formula. What do you observe?

We simply compute $\begin{split}\varphi_{*}(0,y)&=\sup_{x\in\mathbb{R}^{n},u\in\mathbb{R}^{m}}% \langle x,0\rangle+\langle u,y\rangle-\varphi(x,u)\\ &=\sup_{x\in\mathbb{R}^{n},u\in\mathbb{R}^{m}}\langle u,y\rangle-f(x)-\iota(G(% x)-u)\\ &=\sup_{\begin{subarray}{c}x\in\mathbb{R}^{n},u\in\mathbb{R}^{m}\end{subarray}% _{w=G(x)-u}}\langle w,-y\rangle+\langle G(x),y\rangle-f(x)-\iota_{0}(w)\\ &=\sup_{x\in\mathbb{R}^{n}}\langle G(x),y\rangle-f(x)+\sup_{w\in\mathbb{R}^{m}% }\langle w,-y\rangle-\iota_{0}(w)\\ &=\sup_{x\in\mathbb{R}^{n}}-\ell(x,y)+{\iota_{0}}_{*}(-y)\\ &=-\inf_{x\in\mathbb{R}^{n}}-\ell(x,y)+{\iota_{0}}_{*}(-y).\end{split}$ Investigating ${\iota_{0}}_{*}$ , we see that ${\iota_{0}}_{*}(-y)=\sup_{x\in\mathbb{R}^{m}}\langle x,-y\rangle-\iota_{0}(x)=0,$ so ${\iota_{0}}_{*}$ is the constant 0 function. With all this properties, we get the duality $\inf_{x\in\mathbb{R}^{n}}\varphi(x,0)\geq\sup_{y\in\mathbb{R}^{m}}\varphi_{*}(% 0,y)=\sup_{y\in\mathbb{R}^{m}}\inf_{x\in\mathbb{R}^{n}}\ell(x,y),$ and this is the same result as for lagrangian duality.

Consider $A\in\mathbb{R}^{m\times n},c\in\mathbb{R}^{n}$ and $b\in\mathbb{R}^{m}$ and the problem $\left\{\begin{array}[]{ll}\inf_{x\in\mathbb{R}^{n}}&\langle c,x\rangle\\ \text{s. t. }&Ax=b\end{array}\right..$ What is the dual of this problem?

The lagrangian is $\ell(x,y)=\langle c,x\rangle-\langle Ax-b,y\rangle=\langle c,x\rangle-\langle Ax% ,y\rangle+\langle b,y\rangle,$ and the dual problem is $\begin{split}\sup_{y\in\mathbb{R}^{m}}\inf_{x\in\mathbb{R}^{n}}\ell(x,y)&=\sup% _{y\in\mathbb{R}^{m}}\inf_{x\in\mathbb{R}^{n}}\langle c,x\rangle-\langle Ax,y% \rangle+\langle b,y\rangle\\ &=\sup_{y\in\mathbb{R}^{m}}\inf_{x\in\mathbb{R}^{n}}\langle c,x\rangle-\langle x% ,A^{T}y\rangle+\langle b,y\rangle\\ &=\sup_{y\in\mathbb{R}^{m}}\inf_{x\in\mathbb{R}^{n}}\langle x,-A^{T}y+c\rangle% +\langle b,y\rangle\\ \end{split}$ If $A^{T}y-c\neq 0$ we get $\inf_{x\in\mathbb{R}^{n}}\langle-x,A^{T}y-c\rangle=-\infty$ , so the dual can be rewritten as $\sup_{y\in\mathbb{R}^{m}}\inf_{x\in\mathbb{R}^{n}}\ell(x,y)=\sup_{y\in\{z\in% \mathbb{R}^{m}~{}|~{}A^{T}z=c\}}\langle b,y\rangle,$ and the optimization dual problem is $\left\{\begin{array}[]{ll}\sup_{x\in\mathbb{R}^{m}}&\langle b,y\rangle\\ \text{s. t. }&A^{T}y=c\end{array}\right..$

Strong Duality*

In this section we will see some sufficient conditions on the perturbation function for strong duality to hold. This section is marked with * because this result is rather technical, so feel free to skip it if this is your first foray in duality theory for optimization.

Consider a perturbation function $\varphi:\mathbb{R}^{n}\times\mathbb{R}^{m}\to\overline{\mathbb{R}}$ . If $\varphi$ is proper and convex and there is $x\in\mathbb{R}^{n}$ such that $(x,0)\in\operatorname{dom}\varphi$ and $\varphi$ is continuous at $(x,0)$ , then $\inf_{x\in\mathbb{R}^{n}}\varphi(x,0)=\sup_{y\in\mathbb{R}^{m}}-\varphi_{*}(0,% y).$

A more detailed proof can be found in [BROKEN LINK: cite:Zal02Conv]. TODO

Give conditions for strong duality in Fenchel form.

In Fenchel form, the perturbation function looks like $\varphi(x,u)=f(x)+g(x-u)$ , so $(x,0)\in\operatorname{dom}\varphi$ if $x\in\operatorname{dom}f\cap\operatorname{dom}g$ . For continuity of $\varphi$ at $(x,0)$ , we need both $f$ and $g$ to be continuous at $x$ .

Titus Pința

Postdoc at ENSTA Paris