Profile picture for facebook

Titus Pința

Postdoc at ENSTA Paris

numerical optimization and non-smooth analysis, from a geometric point of view

Duality and Optimization

Titus Pinta <2025-12-17 Wed>

Table of Contents

We aim to solve a standard optimization problem inf x C n f ( x ) , for some, usually convex, C and f : C ¯ . If we use a numerical solver, how do we know how far the approximate solution is from the true solution? We need something called a "certificate of optimality". The general framework is to consider an auxiliary function g : D m ¯ such that

x C  and  y D , f ( x ) g ( y ) . This implies that inf x C f ( x ) sup y D g ( y ) , suggesting the dual problem sup y D g ( y ) ,

This approach is useful because for x a global solution for the primal, y a global solution for the dual and x n and y n , approximations of the two produced by some numerical scheme, we have f ( x n ) f ( x ) g ( y ) g ( y n ) , and therefore f ( x n ) f ( x ) f ( x n ) g ( y n ) .

Not all primal dual pairs ( f , g ) are created equal. The difference in the values is called the duality gap gap ( f , g ) = inf x C f ( x ) sup y D g ( y ) . In the lucky case that gap ( f , g ) = 0 we say that strong duality holds.

The only thing left to do is to figure out how to construct a good g from a given f .

First Attempt

The most naive approach is to use the Fenchel-Young inequality. Recall the definition of the convex conjugate f ( y ) = sup x n x , y f ( x ) From this definition we can immediately see that x , y n , f ( y ) x , y f ( x ) .

This is not exactly the same form as the duality principle from (1), but this is also true for x the global minimum of f , so y n , f ( x ) x , y f ( y ) , and clearly f ( x ) f ( x ) , yielding x , y n , f ( x ) x , y f ( y ) .

We now have a function g in the correct form. Unfortunately

g ( y ) = x , y f ( y ) so the expression for g involves x and is uncomputable in general. But this attempt was a step in the right direction, we now need to investigate how to control the inner product part.

Let f : n ¯ be convex lower semi continous with a unique minimizer at x and consider g : n ¯ defined by (2). Compute gap ( f , g ) , i.e. gap ( f , g ) = inf x n f ( x ) sup y n g ( y ) .

Since x is the minimum of f we know that 0 f ( x ) and the Fenchel-Young theorem guarantees the equality f ( x ) = x , 0 f ( 0 ) , therefore f ( x ) = g ( 0 ) and gap ( f , g ) = 0 .

Second Attempt

In our previous attempt we were left with a hanging x , y term. In order to avoid it, let's apply Fenchel-Young to a slightly perturbed function φ : n × n ¯ , φ ( x , u ) = f ( x ) + x , u . The hope is that adding this inner product term might give us something to cancel x , y .

Fenchel-Young for φ reads φ ( x , u ) x , v + u , y φ ( v , y ) . It is worth pointing out that φ ( x , 0 ) = f ( x ) , so maybe a strategic evaluation of φ and φ can yield a useful form of duality. Setting u = 0 and v = 0 gives the promising φ ( x , 0 ) φ ( 0 , y ) , since the inner product part is x , 0 + 0 , y .

This is exactly the form we want, with g ( y ) = φ ( 0 , y ) . We hope that φ might be easier to compute. Looking back at the development in this section, we have never used the exact definition of φ . In the next section we will see that maybe choosing φ in a smarter way can be better.

General Perturbations

In this section, we consider general function φ : n × n ¯ with the property that φ ( x , 0 ) = f ( x ) We will call such a function φ a perturbation function. The same Fenchel-Young argument gives φ ( x , u ) x , v + u , y φ ( v , y ) , and after the same clever evaluation φ ( x , 0 ) φ ( 0 , y ) .

This again has the form of a duality theory, where

inf x n φ ( x , 0 ) sup y n φ ( 0 , y ) . For a general convex optimization problem we can advance no further. The next step is to assume some special structure of the problem and to construct a perturbation function such that φ ( 0 , y ) is actually computable.

Fenchel Duality

The particular structured problem we will consider in this section is inf x n f 1 ( x ) + f 2 ( x ) , with f 1 and f 2 convex. With experience, I can see that φ ( x , u ) = f 1 ( x ) + f 2 ( x u ) is a promising candidate for a perturbation function.

Let's investigate φ ( 0 , y ) = sup x , u n x , 0 + u , y φ ( x , u ) = sup x , u n u , y f 1 ( x ) f 2 ( x u ) = sup x , u n u x , y + x , y f 1 ( x ) f 2 ( x u ) = sup x , u n x u , y + x , y f 1 ( x ) f 2 ( x u ) = sup x , u n w = x u w , y + x , y f 1 ( x ) f 2 ( w ) = sup x , w n w , y + x , y f 1 ( x ) f 2 ( w ) = sup x n x , y f 1 ( x ) + sup w n w , y f 2 ( w ) = f 1 ( y ) + f 2 ( y ) .

Substituting in (3) gives

inf x n f 1 ( x ) + f 2 ( x ) sup y n f 1 ( y ) f 2 ( y ) . This inequality is called Fenchel's duality formula. In the last section we will see a Theorem that allows use to compute gap ( f 1 + f 2 , f 1 f 2 ( ) ) .

For an arbitrary convex f : n ¯ it is clear that f = f 1 + f 2 where f 1 = f and f 2 = 0 . Use this in (4).

After the direct substitution and using an abuse of notation to denote the constant zero function by 0 , we get inf x n f ( x ) sup y n f ( y ) 0 ( y ) . This seems very promising, but let's compute 0 0 ( y ) = sup x n x , y = { 0  if  y = 0  else  . This guarantees that the supremum will be attained for y = 0 , because otherwise the right hand side is , so inf x n f ( x ) f ( 0 ) . Does this resemble something? It is the exact same formula as in the first section.

Lagrangian Duality

Another well understood case is that of constrained optimization. Here we consider the problem

inf x { z | G ( z ) = 0 } f ( x ) for some f : n ¯ and G : n m .

The perturbation function we will consider is φ ( x , u ) = { f ( x )  if  G ( x ) = u  else  . Let's compute φ ( 0 , y ) = sup x n , u m u , y φ ( x , u ) . Since if G ( x ) u we would have φ ( x , u ) = , we can conclude that G ( x ) = u and that

φ ( 0 , y ) = sup x n G ( x ) , y f ( x ) .

The argument the supremum in (6) appears often enough in optimization that it received a name: the lagrangian. More concretly: the function : n × m ¯ defined by ( x , y ) = f ( x ) y , G ( x ) is called the lagrangian of (5). This object plays a vital role in the first-order theory of constrained optimization.

Armed with the lagrangian we can compute φ ( 0 , y ) = sup x n ( x , y ) = inf x n ( x , y ) , and with this we obtain the duality formula inf x { z | G ( z ) = 0 } f ( x ) sup y m inf x n ( x , y ) .

Let's investigate one more thing. Observe that sup y n ( x , y ) = { f ( x )  if  G ( x ) = 0  else  . Therefore inf x { z | G ( z ) = 0 } f ( x ) = inf x n sup y m ( x , y ) , and the final formulation of the lagrangian duality is inf x n sup y m ( x , y ) sup y m inf x n ( x , y ) . In other words, duality comes from commuting inf and sup for the lagrangian function and strong duality holds exactly for those problems whit lagrangians for which inf and sup commute.

The problem from (5) can be equivalently written as inf x n f ( x ) + ι 0 ( G ( x ) ) , where ι 0 ( x ) = { 0  if  x = 0  else  . This form of the problem is suitable for the ideas from Fenchel duality. Use the perturbation function φ ( x , u ) = f ( x ) + ι 0 ( G ( x ) u ) to derive a duality formula. What do you observe?

We simply compute φ ( 0 , y ) = sup x n , u m x , 0 + u , y φ ( x , u ) = sup x n , u m u , y f ( x ) ι ( G ( x ) u ) = sup x n , u m w = G ( x ) u w , y + G ( x ) , y f ( x ) ι 0 ( w ) = sup x n G ( x ) , y f ( x ) + sup w m w , y ι 0 ( w ) = sup x n ( x , y ) + ι 0 ( y ) = inf x n ( x , y ) + ι 0 ( y ) . Investigating ι 0 , we see that ι 0 ( y ) = sup x m x , y ι 0 ( x ) = 0 , so ι 0 is the constant 0 function. With all this properties, we get the duality inf x n φ ( x , 0 ) sup y m φ ( 0 , y ) = sup y m inf x n ( x , y ) , and this is the same result as for lagrangian duality.

Consider A m × n , c n and b m and the problem { inf x n c , x s. t.  A x = b . What is the dual of this problem?

The lagrangian is ( x , y ) = c , x A x b , y = c , x A x , y + b , y , and the dual problem is sup y m inf x n ( x , y ) = sup y m inf x n c , x A x , y + b , y = sup y m inf x n c , x x , A T y + b , y = sup y m inf x n x , A T y + c + b , y If A T y c 0 we get inf x n x , A T y c = , so the dual can be rewritten as sup y m inf x n ( x , y ) = sup y { z m | A T z = c } b , y , and the optimization dual problem is { sup x m b , y s. t.  A T y = c .

Strong Duality*

In this section we will see some sufficient conditions on the perturbation function for strong duality to hold. This section is marked with * because this result is rather technical, so feel free to skip it if this is your first foray in duality theory for optimization.

Consider a perturbation function φ : n × m ¯ . If φ is proper and convex and there is x n such that ( x , 0 ) dom φ and φ is continuous at ( x , 0 ) , then inf x n φ ( x , 0 ) = sup y m φ ( 0 , y ) .

A more detailed proof can be found in [BROKEN LINK: cite:Zal02Conv]. TODO

Give conditions for strong duality in Fenchel form.

In Fenchel form, the perturbation function looks like φ ( x , u ) = f ( x ) + g ( x u ) , so ( x , 0 ) dom φ if x dom f dom g . For continuity of φ at ( x , 0 ) , we need both f and g to be continuous at x .

News

[May. 2025] I will be organizing a Mini-Symposium at the SMAI conference

[Apr. 2025] My paper A Newton-type Method for Non-smooth Under-determined Systems of Equations is on arxiv

[Feb. 2025] My paper A Stochastic Newton-type Method for Non-smooth Optimization is on arxiv