
numerical optimization and non-smooth analysis, from a geometric point of view
We aim to solve a standard optimization problem for some, usually convex, and . If we use a numerical solver, how do we know how far the approximate solution is from the true solution? We need something called a "certificate of optimality". The general framework is to consider an auxiliary function such that
This implies that suggesting the dual problem
This approach is useful because for a global solution for the primal, a global solution for the dual and and , approximations of the two produced by some numerical scheme, we have and therefore
Not all primal dual pairs are created equal. The difference in the values is called the duality gap In the lucky case that we say that strong duality holds.
The only thing left to do is to figure out how to construct a good from a given .
The most naive approach is to use the Fenchel-Young inequality. Recall the definition of the convex conjugate From this definition we can immediately see that
This is not exactly the same form as the duality principle from (1), but this is also true for the global minimum of , so and clearly , yielding
We now have a function in the correct form. Unfortunately
so the expression for involves and is uncomputable in general. But this attempt was a step in the right direction, we now need to investigate how to control the inner product part.
Let be convex lower semi continous with a unique minimizer at and consider defined by (2). Compute , i.e.
Since is the minimum of we know that and the Fenchel-Young theorem guarantees the equality therefore and .
In our previous attempt we were left with a hanging term. In order to avoid it, let's apply Fenchel-Young to a slightly perturbed function , The hope is that adding this inner product term might give us something to cancel .
Fenchel-Young for reads It is worth pointing out that , so maybe a strategic evaluation of and can yield a useful form of duality. Setting and gives the promising since the inner product part is .
This is exactly the form we want, with . We hope that might be easier to compute. Looking back at the development in this section, we have never used the exact definition of . In the next section we will see that maybe choosing in a smarter way can be better.
In this section, we consider general function with the property that We will call such a function a perturbation function. The same Fenchel-Young argument gives and after the same clever evaluation
This again has the form of a duality theory, where
For a general convex optimization problem we can advance no further. The next step is to assume some special structure of the problem and to construct a perturbation function such that is actually computable.
The particular structured problem we will consider in this section is with and convex. With experience, I can see that is a promising candidate for a perturbation function.
Let's investigate
Substituting in (3) gives
This inequality is called Fenchel's duality formula. In the last section we will see a Theorem that allows use to compute .
For an arbitrary convex it is clear that where and . Use this in (4).
After the direct substitution and using an abuse of notation to denote the constant zero function by , we get This seems very promising, but let's compute This guarantees that the supremum will be attained for , because otherwise the right hand side is , so Does this resemble something? It is the exact same formula as in the first section.
Another well understood case is that of constrained optimization. Here we consider the problem
for some and .
The perturbation function we will consider is Let's compute Since if we would have , we can conclude that and that
The argument the supremum in (6) appears often enough in optimization that it received a name: the lagrangian. More concretly: the function defined by is called the lagrangian of (5). This object plays a vital role in the first-order theory of constrained optimization.
Armed with the lagrangian we can compute and with this we obtain the duality formula
Let's investigate one more thing. Observe that Therefore and the final formulation of the lagrangian duality is In other words, duality comes from commuting and for the lagrangian function and strong duality holds exactly for those problems whit lagrangians for which and commute.
The problem from (5) can be equivalently written as where This form of the problem is suitable for the ideas from Fenchel duality. Use the perturbation function to derive a duality formula. What do you observe?
We simply compute Investigating , we see that so is the constant 0 function. With all this properties, we get the duality and this is the same result as for lagrangian duality.
Consider and and the problem What is the dual of this problem?
The lagrangian is and the dual problem is If we get , so the dual can be rewritten as and the optimization dual problem is
In this section we will see some sufficient conditions on the perturbation function for strong duality to hold. This section is marked with * because this result is rather technical, so feel free to skip it if this is your first foray in duality theory for optimization.
Consider a perturbation function . If is proper and convex and there is such that and is continuous at , then
A more detailed proof can be found in [BROKEN LINK: cite:Zal02Conv]. TODO
Give conditions for strong duality in Fenchel form.
In Fenchel form, the perturbation function looks like , so if . For continuity of at , we need both and to be continuous at .
[May. 2025] I will be organizing a Mini-Symposium at the SMAI conference
[Apr. 2025] My paper A Newton-type Method for Non-smooth Under-determined Systems of Equations
is on
arxiv
[Feb. 2025] My paper A Stochastic Newton-type Method for Non-smooth Optimization
is on
arxiv