Newton differentiability and non-smooth Newton-type methods
Titus Pința
-
\(\gdef\bbRn{\mathbb{R}^n}\mathcal{N}_{\mathcal{H}F}:\bbRn \rightrightarrows \bbRn\),
\[
\mathcal{N}_{\mathcal{H}F} x = \{x + d~|~H \in \mathcal{H}F(x), Hd = -F(x)\},
\]
is the Newton operator
-
\(\gdef\bbN{\mathbb{N}}{\{x^k\}}_{k \in \bbN}\), \(x^{k+1} \in \mathcal{N}_{\mathcal{H}F}x^k\) is the Newton-type method
- $\gdef\xbar{\bar{x}}F(\xbar) = 0$
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
-
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|H^{-1}\| \le \Omega$
Then any Newton-type method converges
superlinearly
for any $x^0$ close enough to $\xbar$
- $\gdef\xbar{\bar{x}}F(\xbar) = 0$
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
-
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|H^{-1}\| \le \Omega$
Then any Newton-type method converges
superlinearly
for any $x^0$ close enough to $\xbar$
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$F:\bbRn \to \bbRn$ is $\mathcal{C}^1$ on a convex set $\Rightarrow$ $F$ Newton differentiable with $\mathcal{H}F = \nabla F$
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
- $\forall {\{t_k\}}_{k \in \bbN} > 0$, ${\{d^k\}}_{k \in \bbN}$ convergent
- ${\{H^k\}}_{k \in \bbN} \in \partial^c F(\xbar + t_k d^k)$, $\lim_{k \to \infty} H^k(\lim_{n \to \infty}d_n)$ exists
-
\[\lim_{x \to \xbar}\frac{\|F'(x; x - \xbar) - F'(\xbar; x - \xbar)\|}{\|x - \xbar\|} = 0\]
$\Rightarrow$ $F$ Newton differentiable with $\mathcal{H}F = \partial^C F$
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
\(\gdef\graph{\operatorname{Graph}}\)$\graph F \in \mathcal{O}^{2n}$,
- $\forall n$, $\mathcal{O}^n \subseteq \bbRn$ is a boolean algebra
- $S \in \mathcal{O}^n$ implies \(\gdef\bbR{\mathbb{R}}\)$S \times \bbR \in \mathcal{O}^{n+1}$ and
$\{(x_1, \dots, x_{n-1}~|~\exists y,~(x_1, \dots, x_{n-1}, y) \in S)\} \in \mathcal{O}^{n-1}$
- $\mathcal{O}^1$ is the set of semi-algebraic sets and $\{(x_1, x_2)~|~x_1, \le x_2\} \in \mathcal{O}^2$
$\Rightarrow$ $F$ Newton differentiable with $\mathcal{H}F = \partial^C F$
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
\[
F(x, y) =
\begin{bmatrix}
4(x^2 - y) + 2(x - 1) \\ -2(x^2 - y)
\end{bmatrix},
\]
\[
\mathcal{H}F(x, y) = \left\{2
\begin{bmatrix}
1 + 4 x^2 & -2 x \\ -2 x & 1
\end{bmatrix}\right\}
\]
$\Rightarrow$ $F$ Newton differentiable with $\mathcal{H}F$
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
\[
\gdef\Id{\operatorname{Id}}
F(x) = \left \{
\begin{array}{rl}
x & \quad x \in \mathbb{Q}, \\
-x & \quad x \in \bbR \setminus \mathbb{Q}
\end{array}
\right.
\]
\[
\mathcal{H}F(x) = \left \{
\begin{array}{rl}
1 & \quad x \in \mathbb{Q}, \\
-1 & \quad x \in \bbR \setminus \mathbb{Q}
\end{array}
\right.
\]
$\Rightarrow$ $F$ Newton differentiable with $\mathcal{H}F$
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
$F:\bbRn \to \bbRn$ is $L$-Lipschitz on a convex set
$\Rightarrow$ $F$ weakly Newton differentiable with $\mathcal{H}F = L\Id$
$\Rightarrow$ $F$ weakly Newton differentiable with $\mathcal{H}F = L\Id$
$F$ is weakly Newton differentiable at $\xbar$:
$\exists\gdef\setto{\rightrightarrows}\gdef\bbRnxn{\mathbb{R}^{n \times n}}\mathcal{H}:\bbRn \setto \bbRnxn$
\[ \lim_{x \to \xbar}\sup_{H \in \mathcal{H}(x)}\frac{\|F(x) - F(\xbar) - H(x - \xbar)\|}{\|x - \xbar\|} < \infty \]
$\Rightarrow$ $F$ weakly Newton differentiable with $\mathcal{H}F = L\Id$
- $F(\xbar) = 0$
- $F:\bbRn \to \bbRn$ is weakly Newton differentiable at $\xbar$ with $\mathcal{H}F$
- $\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|H^{-1}\| \le \Omega$
- $\Omega\lim_{x \to \xbar}\sup_{H \in \mathcal{H}F(x)}\|F(x) - F(\xbar) - H(x - \xbar)\| \le \|x - \xbar\|$
Then Newton-type method converges
linearly
for any $x^0$ close enough to $\xbar$
- $\Omega\lim_{x \to \xbar}\sup_{H \in \mathcal{H}F(x)}\|F(x) - F(\xbar) - H(x - \xbar)\| \le \|x - \xbar\|$
- $\lambda^{-1}\lim_{x \to \xbar}\|\nabla f(x) - \nabla f(\xbar) - \lambda (x - \xbar)\| \le \|x - \xbar\|$
Holds if
- $\nabla f$ is $L$-Lipschitz
- $\nabla f$ is strongly-monotone
\[ \exists \alpha > 0, \forall x, y \in U, \quad \langle \nabla f(x) - \nabla f(y), x - y \rangle \ge \alpha \|x - y\|^2 \]
- $2\lambda\alpha \ge L^2$
- $\lambda^{-1}\lim_{x \to \xbar}\|\nabla f(x) - \nabla f(\xbar) - \lambda (x - \xbar)\| \le \|x - \xbar\|$
Holds if
- $F$ is $L$-Lipschitz
- $F$ is hypo-monotone
\[ \exists \alpha > 0, \forall x, y \in U, \quad \langle F(x) - F(y), x - y \rangle \ge -\alpha \|x - y\|^2 \]
- $F$ is metric subregular
\[\exists \kappa > 0, \forall x \in U,\quad \|x - \xbar\| \le \kappa\inf \{\|H^{-1}F(x)\|~|~H \in \mathcal{H}(x)\}\]
- $\lambda^2 + 2M\lambda + \lambda^{-2}L^2 - \kappa\lambda(1-\lambda) < \lambda$
- $\lambda^{-1}\lim_{x \to \xbar}\|\nabla f(x) - \nabla f(\xbar) - \lambda (x - \xbar)\| \le \|x - \xbar\|$
Holds if
- $f$ has the KŁ property
i.e.
$\exists\phi:[0, \infty) \to [0, \infty)$ in $\mathcal{C}^{1}$ with $\phi(0) = 0$ and $\phi$ increasing
\[\phi'(f(x) - f(\xbar))\|\nabla f(x)\| \ge 1 \]
'
- ${\{x^k\}}_{k \in \bbN}$ has sufficient decrease
i.e. \[\exists \rho > 0, \forall k \in \bbN,\quad \rho\|x^{k+1} - x^k\|^2 \le f(x^{k}) - f(x^{k+1}) \]
- ${\{x^k\}}_{k \in \bbN}$ has a gradient lower bound
i.e. \[\exists \rho > 0, \forall k \in \bbN,\quad \|\nabla f(x^k)\| \ge \rho\|x^{k + 1} - x^k\| \]
- $F(\xbar) = 0$
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
-
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|H^{-1}\| \le \Omega$
Then any Newton-type method converges
superlinearly
for any $x^0$ close enough to $\xbar$
-
If $F$ is
Fréchet
differentiable on a set, with $\nabla F$
Lipschitz
continuous,
\[\|\nabla F(x) - \nabla F(y)\| \le L\|x - y\|\vphantom{\|^{\gamma-1}} \]
-
$B := \|{\nabla F(x^0)}^{-1}\|$ and $\eta := \|{\nabla F(x^0)}^{-1}F(x^0)\|$,
\[LB\eta \le \frac{1}{2} \]
then $\exists \xbar$ such that $F(\xbar) = 0$
-
If $F$ is
Newton
differentiable on a set, with $\gdef\tbar{\bar{t}}\mathcal{H}F$
Hölder
continuous,
\[\|\mathcal{H}F(x) - \mathcal{H}F(y)\| \le L\|x - y\|^{\gamma-1}\]
-
$B := \|{\mathcal{H}F(x^0)}^{-1}\|$ and $\eta := \|{\mathcal{H}F(x^0)}^{-1}F(x^0)\|$,
\[\begin{aligned}
&\frac{LB}{2}{\left(\frac{f(t)}{f'(t)}\right)}^\gamma \le
f\left(t - \frac{f(t)}{f'(t)}\right),
& f(0) = \eta,\quad f(t) > 0,\quad f(\tbar) = 0, \\
&f'(t) < 0,\quad f'(t) \ge {(1 + L t)}^{-1},
&f''(t) > 0 \end{aligned} \]'
then $\exists \xbar$ such that
$F(\xbar) = 0$
Modified Newton Methods
$F$ is Newton differentiable with $\mathcal{H}F$:
- is Newton differentiable with $\mathcal{G}F(x) = \mathcal{H}F(x) + B_{\varepsilon(x)}[0]$
- is Newton differentiable with $\mathcal{G}F(x)=\mathcal{H} F(\phi(x))$ for $\lim_{x \to \xbar}\phi(x) \to \xbar$
- is Newton differentiable with any $\mathcal{G}F$ with
\[\gdef\Hbar{\bar{H}}
\lim_{x \to \xbar} \sup_{H \in \mathcal{H}F(x), G \in \mathcal{G}F(x)}
\frac{\|H(x - \xbar) - G(x - \xbar)\|}{\|x - \xbar\|} = 0
\]
- is Newton differentiable for $H$ produced by a quasi-Newton update rule
$\mathcal{U}:\bbRn \times \bbRnxn \to \bbRnxn$ with
\[\lim_{x, H \to \xbar, \Hbar} \frac{\|\mathcal{U}(x, H) - \Hbar\|_F}{\|H - \Hbar\|_F} = 0 \]
Newton Differentiability is Necessary
- $F:\bbRn \to \bbRn$, $\mathcal{H}:\bbRn \setto \mathbb{R}^{n \times n}$
- $\forall x$, $\forall H \in \mathcal{H}(x)$, $\|H^{-1}\| \le \Omega$
- Any sequence, $x^0$ near $\xbar$
\[x^{k+1} = x^k - H^{-1}F(x^k),\quad H \in \mathcal{H}(x^k)\]
converges to $\xbar$
- linearly $\Rightarrow$ $F$ is weakly Newton differentiable at $\xbar$
- superlinearly $\Rightarrow$ $F$ is Newton differentiable at $\xbar$
-
$\mathcal{N}_{\mathcal{H}G}:\bbRn \setto \bbRn$,
\[\mathcal{N}_{\mathcal{H}G}x = \left \{
\begin{aligned}
\proj_{\mathcal{A}(x, H)}x~|~&H \in \mathcal{H}G(x), \mathcal{A}(x, H) \\
&= \{y~|~G(x) + H(y - x) = 0\}
\end{aligned}\right\}\]
is the Newton operator
-
${\{x^k\}}_{k \in \bbN}$, $x^{k+1} \in \mathcal{N}_{\mathcal{H}G}x^k$ is the
Newton-type method
$F$ is uniformly weakly Newton differentiable on $V$:
$\exists\gdef\setto{\rightrightarrows}\gdef\bbRnxn{\mathbb{R}^{n \times n}}\mathcal{H}:\bbRn \setto \bbRnxn$,
$\exists M$,
\[\forall \epsilon, \exists \delta, \forall \xbar \in V,\quad\|\xbar - x\| \le \delta\Rightarrow
\sup_{H \in \mathcal{H}(x)}\frac{\|F(x) - F(\xbar) - H(x - \xbar)\|}{\|x - \xbar\|} < M + \epsilon \]
$F$ is uniformly Newton differentiable on $V$:
$\exists\gdef\setto{\rightrightarrows}\gdef\bbRnxn{\mathbb{R}^{n \times n}}\mathcal{H}:\bbRn \setto \bbRnxn$,
\[\forall \epsilon, \exists \delta, \forall \xbar \in V,\quad\|\xbar - x\| \le \delta\Rightarrow
\sup_{H \in \mathcal{H}(x)}\frac{\|F(x) - F(\xbar) - H(x - \xbar)\|}{\|x - \xbar\|} < \epsilon \]
- $\mathcal{Z} := \{x~|~G(x) = 0\} \ne \emptyset$, $\proj_{\mathcal{Z}}$ is Lipschitz
-
$\gdef\bbRm{\mathbb{R}^m}G:\bbRn \to \bbRm$ is uniformly Newton differentiable on $\mathcal{Z}$ with $\mathcal{H}G$
-
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}G(x)}\|H^{+}\| \le \Omega$ and $H$ full rank
-
$\mathcal{H}G$ is linearly geometrically compatible
Then the Newton-type method
approaches $\mathcal{Z}$ superlinearly
for any $x^0$ close
enough to $\mathcal{Z}$
-
$\mathcal{H}G$ is linearly geometrically compatible
If $\mathcal{H}G$ is single valued and uniformly continuous
$\Rightarrow\mathcal{H}G$ is linearly geometrically compatible
- $\mathcal{Z} := \{x~|~G(x) = 0\} \ne \emptyset$, $\proj_{\mathcal{Z}}$ is Lipschitz
-
$G:\bbRn \to \bbRm$ is uniformly Newton differentiable on $\mathcal{Z}$ with $\mathcal{H}G$
-
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}G(x)}\|H^{+}\| \le \Omega$ and $H$ full rank
-
$\mathcal{H}G$ is linearly geometrically compatible
Then the Newton-type method
approaches $\mathcal{Z}$ superlinearly
for any $x^0$ close
enough to $\mathcal{Z}$
-
$G:\bbRn \to \bbRm$ is uniformly Newton differentiable on $\mathcal{Z}$ with $\mathcal{H}G$
$F:\bbRn \to \bbRn$ is $\mathcal{C}^1$ on a convex set and $\nabla F$ uniformly continuous on $V$ $\Rightarrow$ $F$ uniformly Newton differentiable on $V$ with $\mathcal{H}F = \nabla F$
-
$G:\bbRn \to \bbRm$ is uniformly Newton differentiable on $\mathcal{Z}$ with $\mathcal{H}G$
- $\forall {\{t_k\}}_{k \in \bbN} > 0$, ${\{d^k\}}_{k \in \bbN}$ convergent
- ${\{H^k\}}_{k \in \bbN} \in \partial^c F(\xbar + t_k d^k)$, $\lim_{k \to \infty} H^k(\lim_{n \to \infty}d_n)$ exists
-
\[\forall \epsilon, \exists \delta, \forall \xbar \in V,\quad\|\xbar - x\| \le \delta\Rightarrow
\frac{\|F'(x; x - \xbar) - F'(\xbar; x - \xbar)\|}{\|x - \xbar\|} \le \epsilon \]
$\Rightarrow$ $F$ uniformly Newton differentiable on $V$ with $\mathcal{H}F = \partial^C F$
Multi-Objective Optimization
Setup
Given $F:\bbRn \to \bbRm$, find $\xbar$ such that
\[\xbar \in \argmin_{x \in \bbRm} F(x)\]
For us, first order optimality $\exists \bar{\sigma} \in \Delta^m$
\[\bar{\sigma}^T \nabla F(\xbar) = 0\]
Introduce the function
$G:\bbRn \times \bbRm \to \bbRm$ \[G(x, \sigma) = \sigma^T \nabla F(x)\]
Problem reformulation
\[\{(x, \sigma)~|~G(x, \sigma) = 0\}\cap \bbRn \times \Delta^m\]
Relating to under-determined systems
$\nabla F$ is uniformly Newton differentiable on $V$ with Newton differential $\mathcal{H}F$
$\Rightarrow$ $G$ is uniformly Newton differentiable on $V$ with Newton differential $\mathcal{H}G(x, \sigma) = \sigma^T\mathcal{H}F(x) \times \{\Id\}$
-
$\mathcal{N}_{\mathcal{H}F}:\bbRn \times \bbRm \setto \bbRn \times \bbRm$,
\[\mathcal{N}_{\mathcal{H}F}(x, \sigma) = \left \{
\begin{aligned}
\proj_{\bbRn \times \Delta^m}&\proj_{\mathcal{A}(x, \sigma^T H)}(x, \sigma)~|~H \in \mathcal{H}F(x), \mathcal{A}(x, \sigma, H) \\
&= \{y~|~\sigma^T \nabla F(x) + \sigma^T H(y - x) = 0\}
\end{aligned}\right\}\]
is the Newton operator
-
${\{(x^k, \sigma^k)\}}_{k \in \bbN}$, $(x^{k+1}, \sigma^{k+1}) \in \mathcal{N}_{\mathcal{H}F}(x^k, \sigma^k)$ is the
Newton-type method
- $\mathcal{Z} := \{(x, \sigma)~|~\sigma^T \nabla F(x) = 0\} \ne \emptyset$, $\proj_{\mathcal{Z}}$ is Lipschitz
-
$\nabla F:\bbRn \to \bbRm$ is uniformly Newton differentiable on $\mathcal{Z}$ with $\mathcal{H}F$
-
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|H^{+}\| \le \Omega$ and $H$ full rank
-
$\sigma^T\mathcal{H}F \times I$ is linearly geometrically compatible
Then the Newton-type method
approaches $\mathcal{Z}$ superlinearly
for any $x^0$ close
enough to $\mathcal{Z}$
\[F(x) = [x_1^2 - x_2, x_2], \quad \mathcal{Z} = \{0\} \times \bbR \times \{.5\} \times \{.5\}\]
Setup
Given $f:\bbRn \to \bbR$, find $\xbar \in \argmin_{x} f(x)$
$f$ is Newton differentiable at $\xbar$ with Newton differential $\mathcal{H}f$
$\mathcal{H}f$ is unknown
Only a stochastic approximation with distribution $\mathbb{H}(x)$ is known
\[\forall x\quad\mathbb{P}(\mathbb{H}(x)^{-1} \in \mathcal{H}f(x)) \ge 1 - \delta\]
Stochastic Newton for Optimization
- $f:U \subseteq \bbRn \to \mathbb{R}$ Lipschitz continuous
- $\partial_1 f := \proj_{\partial^C f}0$ uniformly weakly Newton differentiable on $U$ with Newton differential $\mathcal{H}f$
- $\forall x \in U$,$\forall H \in \mathcal{H}f(x)$, $H$ is positive-definite and $\|H^{-1}\| \le \Omega$
- $\forall x \in U$,$\forall H \in \mathcal{H}f(x)$, $\|H\| \le \omega$
Using the assumptions $\Rightarrow$ sufficient decrease
i.e.
\[\exists \rho, \forall x, \forall x^{+} \in \{x - H^{-1}\partial_1 f(x)~|~H \in \mathcal{H}f(x)\},\quad f(x) - f(x^+) \ge \rho \|x^{+} - x\|^2\]
Using the assumptions $\Rightarrow$ sub-differential lower bound
i.e.
\[\exists \tau, \forall x, \forall x^{+} \in \{x - H^{-1}\partial_1 f(x)~|~H \in \mathcal{H}f(x)\},\quad \|\partial_1 f(x)\| \le \tau\|x^+ - x\|\]
#input data: f, ∂₁f, x⁰, c₀ ∈ [0, ∞), α ∈ (0, 1), ε ∈ (0, 1)
#input function: sample
using LinearAlgebra
k = 0
x = x⁰
c = c₀
while norm(∂₁f(x)) >= ε
B = sample(x) # B∼H⁻¹, H ∈ ℋf(x)
y = x - B∂₁f(x)
if f(x)-f(y) >= c*norm(y-x) && norm(∂₁f(x)) <= c*norm(y-x)
x = y
c = c
else
x = x
c = α*c
end
return x
end
Using assumption, given $\varepsilon > 0$ and $\exists \delta$
\[\forall x\quad\mathbb{P}(\mathbb{H}(x)^{-1} \in \mathcal{H}f(x)) \ge 1 - \delta\]
$K$ is the random number of iterations until the algorithm terminates
$\Rightarrow$ $\exists \rho$, $\exists \tau$
\[\gdef\bbE{\mathbb{E}}\bbE(K) \le \frac{1}{1 - \delta}\left(
\frac{f(x^0) - f(\xbar)}{\rho{(\tau\varepsilon)}^2}
+ \frac{\log \frac{\rho}{c^0}}{\log \alpha }
\right)
\]
$\forall \gamma \in (0, 1)$
\[\gdef\bbP{\mathbb{P}} \bbP\left(K \ge \frac{2 + \gamma(1 - \gamma^2)}{1 - \gamma} \bbE(K)\right)
\le e^{-\gamma^2\bbE(K)}
\]
Using assumption, given $\varepsilon > 0$ and $\exists \delta$
\[\forall x\quad\mathbb{P}(\mathbb{H}(x)^{-1} \in \mathcal{H}f(x)) \ge 1 - \delta\]
The algorithm run for $K$ steps
$\Rightarrow$ $\exists \rho$, $\exists \tau$
\[\bbP\left(\min_{n \le K}
\|\partial_1 f(x^n)\| \ge \sqrt{\frac{f(x^0) - f(\xbar)}{\rho\tau^2
\left((1 - \gamma)K - \frac{\log \frac{\rho}{c_0}}{\log \alpha } \right)}} \right)
\le e^{-K(1 - \delta)\frac{\gamma^2}{2}}
\]
Example: Gaussian Noise
Using assumption, $\Sigma \in \mathbb{R}^{n^2 \times n^2}$ a covariance matrix
$\mathcal{G}f(x) = \mathcal{H}f(x) + N$, where $N \sim \mathcal{N}(0, \Sigma)$
\[\gdef\tr{\operatorname{Tr}}c + n^{3/2}\sqrt{\tr \Sigma} < \Omega^{-1}\]
\[\bbE(K) \le \frac{1}{1 - n^{-1}}\left(\frac{f(x^0) - f(\xbar)}{0.5(1-c){(\omega^{-1}\varepsilon)}^2} + \frac{\log \frac{0.5(1-c)}{\rho^0}}{\log \alpha } \right)\]
$M=10$ inner loops
$M=100$ inner loops
Gaussian
Uniform Permutation
Quasi-Metric Spaces
$\gdef\bbM{\mathbb{M}}H:\bbM \times \bbM \to \bbRn$ is called
pseudo-linear if $H(x, x) = 0$
$H^{-}:\bbM \times \bbRn \to \bbM$ is a quasi-inverse if $H^{-}(x, 0) = 0$
\[\gdef\dist{\operatorname{dist}}
\begin{aligned}
\exists \|H^{-}\| &\in [0, \infty) \\ &\dist(H^{-}(x, v), H^{-}(y, w))
\le \|H^{-}\|\|v - w - H(x, y)\|
\end{aligned}
\]
$F:\bbM \to \bbRn$ is Newton differentiable at $\xbar$ if there is
$\mathcal{H}F$ with
\[
\lim_{x \to \xbar}\sup_{H \in \mathcal{H}F(x)}\frac{\|F(x) - F(\xbar) - H(x, \xbar)\|}{\dist(x, \xbar)} = 0
\]
- Convergence Theorem
- Kantorovich Theorem
Warm Up
\[\gdef\minim{\operatorname{minimize}}
\gdef\st{\operatorname{subject\ to}}
\gdef\minimize#1#2#3{{\begin{array}{ll}%
\displaystyle{\underset{#1}{\minim}}\quad& #2 \\%
\st & #3%
\end{array}}}
\gdef\minprob#1#2{\begin{array}{ll}%
\displaystyle{\minim_{#1}}\quad& #2 \\%
\end{array}}
\minimize{x \in \bbRn}{f(x)}{Ax = b}\]
Consider $\tilde{f}:\bbR^{n - m} := \ker A \to \bbR$, $\tilde{f} = f|_{Ax = b}$
Remains to relate $\nabla \tilde{f}$ and $\nabla^2 \tilde{f}$ to $f$
\[\gdef\proj{\operatorname{P}}
\nabla \tilde{f} = \proj_{\ker A} \nabla f
\qquad
\nabla^2 \tilde{f} = \proj_{\ker A} \circ \nabla^2 f \circ \proj_{\ker A}
\]
\[
\mathcal{N}_{f}x := x - {(\proj_{\ker A} \circ \nabla^2 f(x) \circ \proj_{\ker A})}^{-1} \proj_{\ker A} \nabla f(x)
\]
Problem setup:
Find $\xbar$ such that $F(\xbar) \perp \ker A$
Let $\mathcal{A} := \{x ~|~ Ax = b\}$
-
$\mathcal{N}_{\mathcal{H}F}:\bbRn \setto \bbRn$,
\[
\mathcal{N}_{\mathcal{H}F}x = \left \{
\begin{aligned}
\proj_{\mathcal{A}}x + d~|~H \in \mathcal{H}F(\proj_{\mathcal{A}}x),&
\proj_{\ker A}H\proj_{\ker A}d = \\
&-\proj_{\ker A}F(\proj_{\mathcal{A}}x)
\end{aligned}\right\}\]
is the Newton operator
-
${\{x^k\}}_{k \in \bbN}$, $x^{k+1} \in \mathcal{N}_{\mathcal{H}F}x^k$ is the
Newton-type method
Problem setup:
Find $\xbar$ such that $F(\xbar) \perp \ker A$
Let $\mathcal{A} := \{x ~|~ Ax = b\}$
- $F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
- $\proj_{\ker A} F(\xbar) = 0$
- $\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}F(x)}\|{\proj_{\ker A}H\proj_{\ker A}}^{-1}\| \le \Omega$
Then any Newton-type method converges superlinearly for any $x^0$ close
enough to $\xbar$
\[
\minimize{x \in \bbRn}{f(x)}{G(x) = 0}
\]
First order optimality:
\[
\proj_{\ker \nabla G(\xbar)}\nabla f(\xbar) = 0
\]
Problem:
Given $F$, $\mathcal{H}F$ and $G$, $\mathcal{H}G$, find $\xbar$ with
\[
{\forall H_G \in \mathcal{H}G(\xbar)}\quad\proj_{\ker H_G}F(\xbar) = 0
\]
The algorithm for linear equality constraints
\[
\mathcal{N}_{\mathcal{H}F}x = \{\proj_{\mathcal{A}}x + d~|~H_F \in \mathcal{H}F(\proj_{\mathcal{A}}x),
\proj_{\ker A}H_F\proj_{\ker A}d = -\proj_{\ker A}F(\proj_{\mathcal{A}}x)\}
\]
-
$\mathcal{N}_{\mathcal{H}F, \mathcal{H}G}:\bbRn \setto \bbRn$,
\[\begin{aligned}
\mathcal{N}_{\mathcal{H}F, \mathcal{H}G}x = \{\proj_{\mathcal{A}(x, H_G)}x + d~|~H_G \in \mathcal{H}G(x),H_F \in \mathcal{H}F(\proj_{\mathcal{A}(x, H_G)}x),
\\
\proj_{\ker H_G}H_F\proj_{\ker H_G}d = -\proj_{\ker H_G}F(\proj_{\mathcal{A}(x, H_G)}x)\}
\end{aligned}\]
is the
Newton operator
-
${\{x^k\}}_{k \in \bbN}$, $x^{k+1} \in \mathcal{N}_{\mathcal{H}F, \mathcal{H}G}x^k$ is the
Newton-type method
using LinearAlgebra
using Krylov
function step(x, F, H, G, J)
Jₓ = J(x)
J⁺ = pinv(Jₓ)
xᴾ = x - J⁺ * G(x)
P = I - J⁺ * Jₓ
b = P * F(xᴾ)
A = P * H(xᴾ) * P
d = Krylov.minres(A, -b)
return xᴾ + d
end
-
$G(\xbar) = 0 $ and $\forall \Hbar \in \mathcal{H}G(\xbar), \proj_{\ker \Hbar} F(\xbar) = 0$
-
$\mathcal{Z} := \{x~|~G(x) = 0\} \ne \emptyset$, $\proj_{\mathcal{Z}}$ is Lipschitz
-
$G:\bbRn \to \bbRm$ is uniformly Newton differentiable on $\mathcal{Z}$ with $\mathcal{H}G$ and Lipschitz
-
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}G(x)}\|H^{+}\| \le \Omega$ and $H$ full rank
-
$\mathcal{H}G$ is linearly geometrically compatible
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
-
$\sup_{x \in U} \sup_{H_F \in \mathcal{H}F(x), H_G \in \mathcal{H}G(x)}
\|{(\proj_{\ker H_G}H_F\proj_{\ker H_G})}^{-1}\| \le \Omega$
-
$\displaystyle{\lim_{x \to \xbar}\sup_{H_G \in \mathcal{H}G(x)}\frac{\|\proj_{\ker H_G}F(\xbar)\|}{\|x - \xbar\|}} = 0$
Then any Newton-type method converges
superlinearly
for any $x^0$ close enough to $\bar{x}$
\[\minimize{x,y \in \bbR}{{(x - 2)}^2 + {(y - 2)}^2}{x^2 + y^2 - 1 = 0}\]
\[\minimize{x,y \in \bbR}{{(y - x^2)}^2 + {(1 - x)}^2}{x^2 + y^2 - 1 = 0} \]
\[\gdef\sker{\operatorname{sker}}
\minimize{x \in \bbRn}{f(x)}{G(x) \preceq 0}
\]
Mutatis mutandis:
\[\sker A = \{x~|~ Ax \preceq 0\}, \quad
\mathcal{B}(x, H) := \{y~|~G(x) + H(y - x) \preceq 0\}\]
First order optimality:
\[
\proj_{\sker \nabla G(\xbar)}\nabla f(\xbar) = 0
\]
Equation formulation:
$\xbar$ such that $G(\xbar) = 0 $ and $\forall \Hbar \in \mathcal{H}G(\xbar), \proj_{\sker \Hbar} F(\xbar) = 0$
-
$\mathcal{N}_{\mathcal{H}F, \mathcal{H}G}:\bbRn \setto \bbRn$,
\[\begin{aligned}
\mathcal{N}_{\mathcal{H}F, \mathcal{H}G}x = \{\proj_{\mathcal{B}(x, H_G)}x + d~|~H_G \in \mathcal{H}G(x),H_F \in \mathcal{H}F(\proj_{\mathcal{B}(x, H_G)}x), \\
\proj_{\sker H_G}H_F\proj_{\sker H_G}d = -\proj_{\sker H_G}F(\proj_{\mathcal{B}(x, H_G)}x)\}
\end{aligned}\]
is the Newton operator
-
${\{x^k\}}_{k \in \bbN}$, $x^{k+1} \in \mathcal{N}_{\mathcal{H}F, \mathcal{H}G}x^k$ is the Newton-type method
-
$G(\xbar) \preceq 0 $ and $\forall \Hbar \in \mathcal{H}G(\xbar), \proj_{\sker \Hbar} F(\xbar) = 0$
-
$\mathcal{Z} := \{x~|~G(x) = 0\} \ne \emptyset$, $\proj_{\mathcal{Z}}$ is Lipschitz
-
$G:\bbRn \to \bbRm$ is Newton differentiable near $\mathcal{Z}$ with $\mathcal{H}G$ and Lipschitz
-
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}G(x)}\|H^{+}\| \le \Omega$ and $H$ full rank
-
$\mathcal{H}G$ is linearly geometrically compatible
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
-
$\sup_{x \in U} \sup_{H_F \in \mathcal{H}F(x), H_G \in \mathcal{H}G(x)}
\|{(\proj_{\sker H_G}H_F\proj_{\sker H_G})}^{-1}\| \le \Omega$
-
$\displaystyle{\lim_{x \to \xbar}\sup_{H_G \in \mathcal{H}G(x)}\frac{\|\proj_{\sker H_G}F(\xbar)\|}{\|x - \xbar\|}} = 0$
- ${\lim_{x \to \xbar}
\frac{\| \proj_{\sker H_G}F(\proj_{\mathcal{B}(x, H_G)}x) - \proj_{\sker H_G}F(\xbar)
- (\proj_{\sker H_G} H \proj_{\sker H_G})(x, \xbar) \|}{\|x - \xbar\|} = 0}$
Then any Newton-type method converges
superlinearly
for any $x^0$ close enough to $\bar{x}$
-
$G(\xbar) \preceq 0 $ and $\forall \Hbar \in \mathcal{H}G(\xbar), \proj_{\sker \Hbar} F(\xbar) = 0$
-
$\mathcal{Z} := \{x~|~G(x) = 0\} \ne \emptyset$, $\proj_{\mathcal{Z}}$ is Lipschitz
-
$G:\bbRn \to \bbRm$ is Newton differentiable near $\mathcal{Z}$ with $\mathcal{H}G$ and Lipschitz
-
$\sup_{x \in U \in \mathcal{V}(\xbar)} \sup_{H \in \mathcal{H}G(x)}\|H^{+}\| \le \Omega$ and $H$ full rank
-
$\mathcal{H}G$ is linearly geometrically compatible
-
$F:\bbRn \to \bbRn$ is Newton differentiable at $\xbar$ with $\mathcal{H}F$
-
$\sup_{x \in U} \sup_{H_F \in \mathcal{H}F(x), H_G \in \mathcal{H}G(x)}
\|{(\proj_{\sker H_G}H_F\proj_{\sker H_G})}^{-}\| \le \Omega$
-
$\displaystyle{\lim_{x \to \xbar}\sup_{H_G \in \mathcal{H}G(x)}\frac{\|\proj_{\sker H_G}F(\xbar)\|}{\|x - \xbar\|}} = 0$
- ${\lim_{x \to \xbar}
\frac{\| \proj_{\sker H_G}F(\proj_{\mathcal{B}(x, H_G)}x) - \proj_{\sker H_G}F(\xbar)
- (\proj_{\sker H_G} H \proj_{\sker H_G})(x, \xbar) \|}{\|x - \xbar\|} = 0}$
Then any Newton-type method converges
superlinearly
for any $x^0$ close enough to $\bar{x}$
\[\minimize{x,y \in \bbR}{{(x - 1)}^2 + {y}^2}{(x-1)^2 + (y-1)^2 - 1 \le 0} \]
\[\minimize{x,y \in \bbR}{{(y - x^2)}^2 + {(1 - x)}^2}{(x-1)^2 + (y-1)^2 - 1 \le 0} \]