reveal.js

A Newton-type Method for Constrained Optimization

Titus Pința

Newton differentiability subsumes all aspects of Newton's method

First Order Methods
Stochastic Optimization
Quasi-Newton Methods
Equations on Metric Spaces
Constrained Optimization

$\gdef\bbRn{\mathbb{R}^n}\mathcal{N}_{\mathcal{H}F}:\bbRn \rightrightarrows \bbRn$, \[ \mathcal{N}_{\mathcal{H}F} x = \{x + d~|~H \in \mathcal{H}F(x), Hd = -F(x)\}, \] is the Newton operator
$\gdef\bbN{\mathbb{N}}{\{x^k\}}_{k \in \bbN}$, $x^{k+1} \in \mathcal{N}_{\mathcal{H}F}x^k$ is the Newton-type method

$\gdef\xbar{\bar{x}}F(\xbar) = 0$
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|H^{-1}\| \le \Omega$

Then any Newton-type method converges superlinearly for any $x^0$close enough to $\xbar$

$\gdef\xbar{\bar{x}}F(\xbar) = 0$
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|H^{-1}\| \le \Omega$

Then any Newton-type method converges superlinearly for any $x^0$close enough to $\xbar$

$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$

$F$ is $\mathcal{C^{1}}$, $\mathcal{H}F = \nabla F$
$F$ is semismooth

$F$ is Lipschitz, has directional derivatives and \[ \lim_{x \to \xbar}\frac{\|F'(x; x - \xbar) - F'(\xbar; x - \xbar)\|}{\|x - \xbar\|} = 0 \]

with $\mathcal{H}F = F'$
$F$ is tame (definable in an $o$-minimal structure)

Collection of subsets of $\gdef\bbR{\mathbb{R}}\bbR^n$, for all $n$, closed under boolean operations, expansions, projections, and polynomial images

with $\mathcal{H}F = F'$

$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$

¹ \[ F(x, y) = \begin{bmatrix} 4(x^2 - y) + 2(x - 1) \\ -2(x^2 - y) \end{bmatrix} \quad \mathcal{H}F(x, y) = \left\{2 \begin{bmatrix} 1 + 4 x^2 & -2 x \\ -2 x & 1 \end{bmatrix}\right\} \]
¹ R. Bergmann, O. P. Ferreira, E. M. Santos, and J. C. O. Souza. "The difference of convex algorithm on Hadamard manifolds". In: J. Optim. Theory Appl. 201 (2021), pp. 221–251
\[\gdef\Id{\operatorname{Id}} F(x) = \left \{ \begin{array}{rl} x & \quad x \in \mathbb{Q}, \\ -x & \quad x \in \bbR \setminus \mathbb{Q} \end{array} \right. \quad \mathcal{H}F(x) = \left \{ \begin{array}{rl} 1 & \quad x \in \mathbb{Q}, \\ -1 & \quad x \in \bbR \setminus \mathbb{Q} \end{array} \right. \]
$F$ is Lipschitz, with $\mathcal{H}F = \alpha\Id$ (then $F$ is

weakly Newton differentiable
)

$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$

¹ \[ F(x, y) = \begin{bmatrix} 4(x^2 - y) + 2(x - 1) \\ -2(x^2 - y) \end{bmatrix} \quad \mathcal{H}F(x, y) = \left\{2 \begin{bmatrix} 1 + 4 x^2 & -2 x \\ -2 x & 1 \end{bmatrix}\right\} \]
¹ R. Bergmann, O. P. Ferreira, E. M. Santos, and J. C. O. Souza. "The difference of convex algorithm on Hadamard manifolds". In: J. Optim. Theory Appl. 201 (2021), pp. 221–251
\[\gdef\Id{\operatorname{Id}} F(x) = \left \{ \begin{array}{rl} x & \quad x \in \mathbb{Q}, \\ -x & \quad x \in \bbR \setminus \mathbb{Q} \end{array} \right. \quad \mathcal{H}F(x) = \left \{ \begin{array}{rl} 1 & \quad x \in \mathbb{Q}, \\ -1 & \quad x \in \bbR \setminus \mathbb{Q} \end{array} \right. \]
$F$ is Lipschitz, with $\mathcal{H}F = \alpha\Id$ (then $F$ is

weakly Newton differentiable
)

$F$ is Lipschitz, with $\mathcal{H}F = \alpha\Id$ (then $F$ is

weakly Newton differentiable
)

$F:\bbRn \to \bbRn$ is weakly Newton differentiable at $\xbar$: $\exists\gdef\setto{\rightrightarrows}\gdef\bbRnxn{\mathbb{R}^{n \times n}}\mathcal{H}F:\bbRn \setto \bbRnxn$ \[ \lim_{x \to \xbar}\sup_{H \in \mathcal{H}F(x)}\frac{\|F(x) - F(\xbar) - H(x - \xbar)\|}{\|x - \xbar\|} < \infty \]

$F(\xbar) = 0$
$F:\bbRn \to \bbRn$ is weakly Newton differentiable at $\xbar$ with $\mathcal{H}F$
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|H^{-1}\| \le \Omega$
$\Omega\lim_{x \to \xbar}\sup_{H \in \mathcal{H}F(x)}\|F(x) - F(\xbar) - H(x - \xbar)\| \le \|x - \xbar\|$

Then any Newton-type method converges linearly for any $x^0$ close enough to $\xbar$

$F$ is Lipschitz, with $\mathcal{H}F = \alpha\Id$

For $F = \nabla f$, then $\mathcal{N}_{\alpha \Id}x = x - \alpha^{-1} \nabla f(x)$

If, either

$\nabla f$ Lipschitz, hypo-monotone

\[\exists \alpha > 0, \forall x, y \in U, \quad \langle F(x) - F(y), x - y \rangle \ge -\alpha \|x - y\|^2\]

and metric subregular

\[\exists k > 0, \forall x \in U,\quad \|x - \xbar\| \le k\inf \{\|H^{-1}F(x)\|~|~H \in \mathcal{H}(x)\}\]
$f$ has the KŁ property

$\exists\phi:[0, \infty) \to [0, \infty)$ in $\mathcal{C}^{1}$ with $\phi(0) = 0$ and $\phi$ increasing \[\phi'(f(x) - f(\xbar))\|\nabla f(x)\| \ge 1 \]

and ${\{x^k\}}_{k \in \bbN}$ has sufficient decrease

\[\exists \rho > 0 \forall k \in \bbN,\quad \rho\|x^{k+1} - x^k\|^2 \le f(x^{k}) - f(x^{k+1}) \]

and a gradient lower bound

\[\exists \rho > 0 \forall k \in \bbN,\quad \|\nabla f(x^k)\| \le \rho\|x^{k + 1} - x^k\| \]

then $\nabla f$ is weakly Newton differentiable and Newton's method converges linearly

$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$

$F(\xbar) = 0$
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|H^{-1}\| \le \Omega$

Then any Newton-type method converges superlinearly for any $x^0$ close enough to $\xbar$

$F(\xbar) = 0$
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|H^{-1}\| \le \Omega$

Then any Newton-type method converges superlinearly for any $x^0$ close enough to $\xbar$

$F(\xbar) = 0$

If $F$ is Fréchet differentiable on a set, with $\nabla F$ Lipschitz continuous, \[\|\mathcal{H}F(x) - \mathcal{H}F(y)\| \le L\|x - y\|\vphantom{\|^{\gamma-1}} \]
$B := \|{\mathcal{H}F(x^0)}^{-1}\|$ and $\eta := \|{\mathcal{H}F(x^0)}^{-1}F(x^0)\|$,

\[LB\eta \le \frac{1}{2} \]

then $\exists \xbar$ such that $F(\xbar) = 0$

$F(\xbar) = 0$

If $F$ is Newton differentiable on a set, with $\gdef\tbar{\bar{t}}\mathcal{H}F$ Hölder continuous, \[\|\mathcal{H}F(x) - \mathcal{H}F(y)\| \le L\|x - y\|^{\gamma-1}\]
$B := \|{\mathcal{H}F(x^0)}^{-1}\|$ and $\eta := \|{\mathcal{H}F(x^0)}^{-1}F(x^0)\|$,

\[ \begin{aligned} &\frac{LB}{2}{\left(\frac{f(t)}{f'(t)}\right)}^\gamma \le f\left(t - \frac{f(t)}{f'(t)}\right), & f(0) = \eta,\quad f(t) > 0,\quad f(\tbar) = 0, \\ &f'(t) < 0,\quad f'(t) \ge {(1 + L t)}^{-1}, & f''(t) > 0 \end{aligned} \]'

then $\exists \xbar$ such that $F(\xbar) = 0$

Modified Newton Methods

$F$ is Newton differentiable with $\mathcal{H}F$:

is Newton differentiable with $\mathcal{G}F(x) = \mathcal{H}F(x) + B_{\varepsilon(x)}[0]$
is Newton differentiable with $\mathcal{G}F(x)=\mathcal{H} F(\phi(x))$ for $\lim_{x \to \xbar}\phi(x) \to \xbar$
is Newton differentiable with any $\mathcal{G}F$ with \[\gdef\Hbar{\bar{H}} \lim_{x \to \xbar} \sup_{H \in \mathcal{H}F(x), G \in \mathcal{G}F(x)} \frac{\|H(x - \xbar) - G(x - \xbar)\|}{\|x - \xbar\|} = 0 \]
is Newton differentiable for $H$ produced by a quasi-Newton update rule $\mathcal{U}:\bbRn \times \bbRnxn \to \bbRnxn$ with \[ \lim_{x, H \to \xbar, \Hbar} \frac{\|\mathcal{U}(x, H) - \Hbar\|_F}{\|H - \Hbar\|_F} = 0 \]

Stochastic Newton

In the setting of optimization $H$ is sampled randomly and we employ a backtracking strategy for sufficient decrease, i.e. \[f(x^k) - f(y) \ge c_k\|y - x^k\|\]

Assume for a given $x$, $\mathbb{P}(H \in \mathcal{H}F(x)) > 1 - \delta$ and let $K$ the number of iterations until $\gdef\bbE{\mathbb{E}}\nabla f(x^K) \le \varepsilon$.

Then \[\bbE(K) \le \frac{1}{1 - \delta}\left( \frac{f(x^0) - f(\xbar)}{\rho_{\min}{(d\varepsilon)}^2} + \frac{\log \frac{\rho_{\min}}{\rho^0}}{\log \alpha } \right) \]

In particular, for additive Gaussian noise \[\bbE(K) \le \frac{1}{1 - n^{-1}}\left(\frac{f(x^0) - f(\xbar)}{0.5(1-c){(\omega^{-1}\varepsilon)}^2} + \frac{\log \frac{0.5(1-c)}{\rho^0}}{\log \alpha } \right) \]

Quasi-Metric Spaces

$\gdef\bbM{\mathbb{M}}H:\bbM \times \bbM \to \bbRn$ is called pseudo-linear if $H(x, x) = 0$

$H^{-}:\bbM \times \bbRn \to \bbM$ is a quasi-inverse if $H^{-}(x, 0) = 0$

\[\gdef\dist{\operatorname{dist}} \begin{aligned} \exists \|H^{-}\| &\in [0, \infty) \\ &\dist(H^{-}(x, v), H^{-}(y, w)) \le \|H^{-}\|\|v - w - H(x, y)\| \end{aligned} \]

$F:\bbM \to \bbRn$ is Newton differentiable at $\xbar$ if there is $\mathcal{H}F$ with \[ \lim_{x \to \xbar}\sup_{H \in \mathcal{H}F(x)}\frac{\|F(x) - F(\xbar) - H(x, \xbar)\|}{\dist(x, \xbar)} = 0 \]

Convergence Theorem
- $F(\xbar) = 0$
- $F:\bbM \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
- $\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|H^{-}\| \le \Omega$
Then any Newton-type method converges superlinearly for any $x^0$ close enough to $\xbar$
Kantorovich Theorem
- If $F$ is Newton differentiable on a set, with $\gdef\tbar{\bar{t}}\mathcal{H}F$ H-continuous, \[ \dist({H(x)}^{-}(x, H(x^0)(z, y)), x) \le (1 + L{\dist(x, x^0)}^{\gamma -1})\dist(y, z). \]
- $B := \sup_{x \ne y \in \bbM}{\|H_F(x^0)(x, y)\|}/{\dist(x, y)}$ and
  $ \eta:=\dist(x^0, {H_F(x^0)}^{-}(x^0, F(x^0))) $,
  \[ \begin{aligned} &\frac{LB}{2}{\left(\frac{f(t)}{f'(t)}\right)}^\gamma \le f\left(t - \frac{f(t)}{f'(t)}\right), & f(0) = \eta,\quad f(t) > 0,\quad f(\tbar) = 0, \\ &f'(t) < 0,\quad f'(t) \ge {(1 + L t)}^{-1}, & f''(t) > 0 \end{aligned} \]'
then $\exists \xbar$ such that $F(\xbar) = 0$

Constrained Optimization

Warm Up

\[ \gdef\minim{\operatorname{minimize}} \gdef\st{\operatorname{subject\ to}} \gdef\minimize#1#2#3{{\begin{array}{ll}% \displaystyle{\underset{#1}{\minim}}\quad& #2 \\% \st & #3% \end{array}}} \gdef\minprob#1#2{\begin{array}{ll}% \displaystyle{\minim_{#1}}\quad& #2 \\% \end{array}} \minimize{x \in \bbRn}{f(x)}{Ax = b}\]

Consider $\tilde{f}:\bbR^{n - m} := \ker A \to \bbR$, $\tilde{f} = f|_{Ax = b}$

Remains to relate $\nabla \tilde{f}$ and $\nabla^2 \tilde{f}$ to $f$

\[\gdef\proj{\operatorname{P}} \nabla \tilde{f} = \proj_{\ker A} \nabla f \qquad \nabla^2 \tilde{f} = \proj_{\ker A} \circ \nabla^2 f \circ \proj_{\ker A} \]

\[ \mathcal{N}_{f}x := x - {(\proj_{\ker A} \circ \nabla^2 f(x) \circ \proj_{\ker A})}^{-1} \proj_{\ker A} \nabla f(x) \]

Problem setup:

Find $\xbar$ such that $F(\xbar) \perp \ker A$

Let $\mathcal{A} := \{x ~|~ Ax = b\}$

$\mathcal{N}_{\mathcal{H}F}:\bbRn \setto \bbRn$, \[ \mathcal{N}_{\mathcal{H}F}x = \left \{ \begin{aligned} \proj_{\mathcal{A}}x + d~|~H \in \mathcal{H}F(\proj_{\mathcal{A}}x),& \proj_{\ker A}H\proj_{\ker A}d = \\ &-\proj_{\ker A}F(\proj_{\mathcal{A}}x) \end{aligned}\right\}\] is the Newton operator
${\{x^k\}}_{k \in \bbN}$, $x^{k+1} \in \mathcal{N}_{\mathcal{H}F}x^k$ is the Newton-type method

Problem setup:

Find $\xbar$ such that $F(\xbar) \perp \ker A$

Let $\mathcal{A} := \{x ~|~ Ax = b\}$

$\mathcal{N}_{\mathcal{H}F}:\bbRn \setto \bbRn$, \[ \mathcal{N}_{\mathcal{H}F}x = \left \{ \begin{aligned} \proj_{\mathcal{A}}x + d~|~H \in \mathcal{H}F(\proj_{\mathcal{A}}x),& \proj_{\ker A}H\proj_{\ker A}d = \\ &-\proj_{\ker A}F(\proj_{\mathcal{A}}x) \end{aligned}\right\}\] is the Newton operator
${\{x^k\}}_{k \in \bbN}$, $x^{k+1} \in \mathcal{N}_{\mathcal{H}F}x^k$ is the Newton-type method

Problem setup:

Find $\xbar$ such that $F(\xbar) \perp \ker A$

Let $\mathcal{A} := \{x ~|~ Ax = b\}$

$\mathcal{N}_{\mathcal{H}F}:\bbRn \setto \bbRn$, \[ \mathcal{N}_{\mathcal{H}F}x = \left \{ \begin{aligned} \proj_{\mathcal{A}}x + d~|~H \in \mathcal{H}F(\proj_{\mathcal{A}}x),& \proj_{\ker A}H\proj_{\ker A}d = \\ &-\proj_{\ker A}F(\proj_{\mathcal{A}}x) \end{aligned}\right\}\] is the Newton operator
${\{x^k\}}_{k \in \bbN}$, $x^{k+1} \in \mathcal{N}_{\mathcal{H}F}x^k$ is the Newton-type method

$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$\proj_{\ker A} F(\xbar) = 0$
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|{\proj_{\ker A}H\proj_{\ker A}}^{-1}\| \le \Omega$

Then any Newton-type method converges superlinearly for any $x^0$ close enough to $\xbar$

Problem setup:

Find $\xbar$ such that $F(\xbar) \perp \ker A$

Let $\mathcal{A} := \{x ~|~ Ax = b\}$

$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$\proj_{\ker A} F(\xbar) = 0$
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|{\proj_{\ker A}H\proj_{\ker A}}^{-1}\| \le \Omega$

Then any Newton-type method converges superlinearly for any $x^0$
close enough to $\xbar$

Another Warm Up

\[\minimize{x \in \bbRn}{0}{G(x) = 0}\]

The set $\mathcal{A}(x) = \{y~|~G(x) + \nabla G(x)(y - x) = 0\}$ is not a singleton

We can project \[ x^{k + 1} = \proj_{\mathcal{A}(x^k)}x^k \]

When $\nabla G(x)$ has full rank, \[ x^{k + 1} = x^k - {\nabla G(x^k)}^{+}G(x) \]

$\mathcal{N}_{\mathcal{H}G}:\bbRn \setto \bbRn$, \[ \mathcal{N}_{\mathcal{H}G}x = \left \{ \begin{aligned} \proj_{\mathcal{A}(x, H)}x~|~&H \in \mathcal{H}G(x), \mathcal{A}(x, H) \\ &= \{y~|~G(x) + H(y - x) = 0\} \end{aligned}\right\}\] is the Newton operator
${\{x^k\}}_{k \in \bbN}$, $x^{k+1} \in \mathcal{N}_{\mathcal{H}G}x^k$ is the Newton-type method

$\mathcal{Z} := \{x~|~G(x) = 0\} \ne \emptyset$, $\proj_{\mathcal{Z}}$ is Lipschitz
$\gdef\bbRm{\mathbb{R}^m}G:\bbRn \to \bbRm$ is Newton differentiable near $\mathcal{Z}$ with $\mathcal{H}G$ and Lipschitz
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}G(x)}\|H^{+}\| \le \Omega$ and $H$ full rank

Denote $\mathcal{Z} = \{x~|~G(x) = 0\}$, $\mathcal{H}G$ is linearly geometrically compatible if \[\begin{aligned} \|\proj_{\mathcal{A}(x, H)}x - &\proj_{\mathcal{Z}}\proj_{\mathcal{A}(x, H)}x\| \\ &\le M\|\proj_{\mathcal{Z}}\proj_{\mathcal{A}(x, H)}x - \proj_{\mathcal{A}(x, H)}\proj_{\mathcal{Z}}\proj_{\mathcal{A}(x, H)}x\| \end{aligned}\]
$\mathcal{H}G$ is linearly geometrically compatible

Then the Newton-type method approaches $\mathcal{Z}$ superlinearly for any $x^0$ close enough to $\mathcal{Z}$

\[ \minimize{x \in \bbRn}{f(x)}{G(x) = 0} \]

First order optimality:

\[ \proj_{\ker \nabla G(\xbar)}\nabla f(\xbar) = 0 \]

Problem:

Given $F$, $\mathcal{H}F$ and $G$, $\mathcal{H}G$, find $\xbar$ with \[ \sup_{H_G \in \mathcal{H}G(\xbar)}\proj_{\ker H_G}F(\xbar) = 0 \]

The algorithm for linear equality constraints

\[ \mathcal{N}_{\mathcal{H}F}x = \{\proj_{\mathcal{A}}x + d~|~H_F \in \mathcal{H}F(\proj_{\mathcal{A}}x), \proj_{\ker A}H_F\proj_{\ker A}d = -\proj_{\ker A}F(\proj_{\mathcal{A}}x)\} \]

$\mathcal{N}_{\mathcal{H}F, \mathcal{H}G}:\bbRn \setto \bbRn$, \[\begin{aligned} \mathcal{N}_{\mathcal{H}F, \mathcal{H}G}x = \{\proj_{\mathcal{A}(x, H_G)}x + d~|~H_G \in \mathcal{H}G(x),H_F \in \mathcal{H}F(\proj_{\mathcal{A}(x, H_G)}x), \\ \proj_{\ker H_G}H_F\proj_{\ker H_G}d = -\proj_{\ker H_G}F(\proj_{\mathcal{A}(x, H_G)}x)\} \end{aligned}\] is the Newton operator
${\{x^k\}}_{k \in \bbN}$, $x^{k+1} \in \mathcal{N}_{\mathcal{H}F, \mathcal{H}G}x^k$ is the Newton-type method


using LinearAlgebra
using Kyrilov

function step(x, F, H, G, J)
    Jₓ = J(x)
    J⁺ = pinv(Jₓ)
    xᴾ = x - J⁺ * G(x)
    P = I - J⁺ * Jₓ

    b = P * F(xᴾ)
    A = P * H(xᴾ) * P

    d = Kyrilov.minres(A, -b)

    return xᴾ + d
end

$G(\xbar) = 0 $ and $\forall \Hbar \in \mathcal{H}G(\xbar), \proj_{\ker \Hbar} F(\xbar) = 0$
$\mathcal{Z} := \{x~|~G(x) = 0\} \ne \emptyset$, $\proj_{\mathcal{Z}}$ is Lipschitz
$G:\bbRn \to \bbRm$ is Newton differentiable near $\mathcal{Z}$ with $\mathcal{H}G$ and Lipschitz
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}G(x)}\|H^{+}\| \le \Omega$ and $H$ full rank
$\mathcal{H}G$ is linearly geometrically compatible
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$\sup_{x \in U} \sup_{H_F \in \mathcal{H}F(x), H_G \in \mathcal{H}G(x)} \|{(\proj_{\ker H_G}H_F\proj_{\ker H_G})}^{-1}\| \le \Omega$
$\displaystyle{\lim_{x \to \xbar}\sup_{H_G \in \mathcal{H}G(x)}\frac{\|\proj_{\ker H_G}F(\xbar)\|}{\|x - \xbar\|}} = 0$

Then any Newton-type method converges superlinearly for any $x^0$ close
enough to $\bar{x}$

\[\minimize{x,y \in \bbR}{{(x - 2)}^2 + {(y - 2)}^2}{x^2 + y^2 - 1 = 0}\]

\[\minimize{x,y \in \bbR}{{(y - x^2)}^2 + {(1 - x)}^2}{x^2 + y^2 - 1 = 0} \]

\[\gdef\sker{\operatorname{sker}} \minimize{x \in \bbRn}{f(x)}{G(x) \preceq 0} \]

Mutatis mutandis:

\[\sker A = \{x~|~ Ax \preceq 0\}, \quad \mathcal{B}(x, H) := \{y~|~G(x) + H(y - x) \preceq 0\}\]

First order optimality:

\[ \proj_{\sker \nabla G(\xbar)}\nabla f(\xbar) = 0 \]

Solution:

$\xbar$ such that $G(\xbar) = 0 $ and $\forall \Hbar \in \mathcal{H}G(\xbar), \proj_{\sker \Hbar} F(\xbar) = 0$

$\mathcal{N}_{\mathcal{H}F, \mathcal{H}G}:\bbRn \setto \bbRn$, \[\begin{aligned} \mathcal{N}_{\mathcal{H}F, \mathcal{H}G}x = \{\proj_{\mathcal{B}(x, H_G)}x + d~|~H_G \in \mathcal{H}G(x),H_F \in \mathcal{H}F(\proj_{\mathcal{B}(x, H_G)}x), \\ \proj_{\sker H_G}H_F\proj_{\sker H_G}d = -\proj_{\sker H_G}F(\proj_{\mathcal{B}(x, H_G)}x)\} \end{aligned}\] is the Newton operator
${\{x^k\}}_{k \in \bbN}$, $x^{k+1} \in \mathcal{N}_{\mathcal{H}F, \mathcal{H}G}x^k$ is the Newton-type method

$G(\xbar) \preceq 0 $ and $\forall \Hbar \in \mathcal{H}G(\xbar), \proj_{\sker \Hbar} F(\xbar) = 0$
$\mathcal{Z} := \{x~|~G(x) = 0\} \ne \emptyset$, $\proj_{\mathcal{Z}}$ is Lipschitz
$G:\bbRn \to \bbRm$ is Newton differentiable near $\mathcal{Z}$ with $\mathcal{H}G$ and Lipschitz
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}G(x)}\|H^{+}\| \le \Omega$ and $H$ full rank
$\mathcal{H}G$ is linearly geometrically compatible
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$\sup_{x \in U} \sup_{H_F \in \mathcal{H}F(x), H_G \in \mathcal{H}G(x)} \|{(\proj_{\sker H_G}H_F\proj_{\sker H_G})}^{-1}\| \le \Omega$
$\displaystyle{\lim_{x \to \xbar}\sup_{H_G \in \mathcal{H}G(x)}\frac{\|\proj_{\sker H_G}F(\xbar)\|}{\|x - \xbar\|}} = 0$
${\lim_{x \to \xbar} \frac{\| \proj_{\sker H_G}F(\proj_{\mathcal{B}(x, H_G)}x) - \proj_{\sker H_G}F(\xbar) - (\proj_{\sker H_G} H \proj_{\sker H_G})(x, \xbar) \|}{\|x - \xbar\|} = 0}$

Then any Newton-type method converges superlinearly for any $x^0$ close
enough to $\bar{x}$

$G(\xbar) \preceq 0 $ and $\forall \Hbar \in \mathcal{H}G(\xbar), \proj_{\sker \Hbar} F(\xbar) = 0$
$\mathcal{Z} := \{x~|~G(x) = 0\} \ne \emptyset$, $\proj_{\mathcal{Z}}$ is Lipschitz
$G:\bbRn \to \bbRm$ is Newton differentiable near $\mathcal{Z}$ with $\mathcal{H}G$ and Lipschitz
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}G(x)}\|H^{+}\| \le \Omega$ and $H$ full rank
$\mathcal{H}G$ is linearly geometrically compatible
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$\sup_{x \in U} \sup_{H_F \in \mathcal{H}F(x), H_G \in \mathcal{H}G(x)} \|{(\proj_{\sker H_G}H_F\proj_{\sker H_G})}^{-}\| \le \Omega$
$\displaystyle{\lim_{x \to \xbar}\sup_{H_G \in \mathcal{H}G(x)}\frac{\|\proj_{\sker H_G}F(\xbar)\|}{\|x - \xbar\|}} = 0$
${\lim_{x \to \xbar} \frac{\| \proj_{\sker H_G}F(\proj_{\mathcal{B}(x, H_G)}x) - \proj_{\sker H_G}F(\xbar) - (\proj_{\sker H_G} H \proj_{\sker H_G})(x, \xbar) \|}{\|x - \xbar\|} = 0}$

Then any Newton-type method converges superlinearly for any $x^0$ close
enough to $\bar{x}$

\[\minimize{x,y \in \bbR}{{(x - 1)}^2 + {y}^2}{(x-1)^2 + (y-1)^2 - 1 \le 0} \]

\[\minimize{x,y \in \bbR}{{(y - x^2)}^2 + {(1 - x)}^2}{(x-1)^2 + (y-1)^2 - 1 \le 0} \]

Regularity: Newton Differentiability

$+$

Operator Splitting

$\Rightarrow$

Constrained Optimization

$G(\xbar) = 0 $ and $\forall \Hbar \in \mathcal{H}G(\xbar), \proj_{\ker \Hbar} F(\xbar) = 0$
$\mathcal{Z} := \{x~|~G(x) = 0\} \ne \emptyset$, $\proj_{\mathcal{Z}}$ is Lipschitz
$G:\bbRn \to \bbRm$ is Newton differentiable near $\mathcal{Z}$ with $\mathcal{H}G$ and Lipschitz
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}G(x)}\|H^{+}\| \le \Omega$ and $H$ full rank
$\mathcal{H}G$ is linearly geometrically compatible
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$\sup_{x \in U} \sup_{H_F \in \mathcal{H}F(x), H_G \in \mathcal{H}G(x)} \|{(\proj_{\ker H_G}H_F\proj_{\ker H_G})}^{-1}\| \le \Omega$
$\displaystyle{\lim_{x \to \xbar}\sup_{H_G \in \mathcal{H}G(x)}\frac{\|\proj_{\ker H_G}F(\xbar)\|}{\|x - \xbar\|}} = 0$

Then any Newton-type method converges superlinearly for any $x^0$ close
enough to $\bar{x}$

A Newton-type Method for Constrained Optimization

Titus Pința

Newton differentiability subsumes all aspects of Newton's method

Modified Newton Methods

Stochastic Newton

Quasi-Metric Spaces

Constrained Optimization

Warm Up

Unconstrained Optimization

Problem setup:

Problem setup:

Problem setup:

Problem setup:

Another Warm Up

First order optimality:

Problem:

Mutatis mutandis:

First order optimality:

Solution:

Recall Quasi-Metric Spaces

Regularity: Newton Differentiability

$+$

Operator Splitting

$\Rightarrow$

Constrained Optimization