﻿ A Spectral Projected Gradient-Newton Two Phase Method for Constrained Nonlinear Equations

Journal of Applied Mathematics and Physics
Vol.07 No.01(2019), Article ID:89915,7 pages
10.4236/jamp.2019.71009

A Spectral Projected Gradient-Newton Two Phase Method for Constrained Nonlinear Equations

Yuezhe Zhang

College of Science, University of Shanghai for Science and Technology, Shanghai, China

Received: December 29, 2018; Accepted: January 13, 2019; Published: January 16, 2019

ABSTRACT

In this paper, we proposed a spectral gradient-Newton two phase method for constrained semismooth equations. In the first stage, we use the spectral projected gradient to obtain the global convergence of the algorithm, and then use the final point in the first stage as a new initial point to turn to a projected semismooth asymptotically newton method for fast convergence.

Keywords:

Constrained Semismooth Equations, Spectral Projected Gradient Method, Newton Method, Two-Phase

1. Introduction

In this paper, we consider the constrained nonlinear semismooth equations problem: finding a vector ${x}_{\ast }\in \Omega$ such that

$\begin{array}{l}H\left(x\right)=0,\\ x\in \Omega :=\left\{x\in {ℝ}^{n}|l\le x\le u\right\},\end{array}$ (1)

where

$\Omega :=\left\{x\in {ℝ}^{n}|l\le x\le u\right\},\text{}{l}_{i}\in ℝ\cup \left\{-\infty \right\},\text{}{u}_{i}\in ℝ\cup \left\{-\infty \right\},\text{}{l}_{i}<{u}_{i},\text{}i=1,\cdots ,n$

$H:{ℝ}^{n}\to {ℝ}^{n}$ is a semismooth mapping. The notation of semismoothness was introduced for the functionals by Mifflin [1] and extended to vector functions by Qi and Sun [2] .

Systems of constrained semismooth equations arise in various application, for instance complementarity problems, the box constrained variational inequality problems, the KKT system of variational inequlity problems and so on. The solution of nonlinear equations can be transformed into solving the following constrained optimization problem:

$\begin{array}{l}\mathrm{min}f\left(x\right)=\frac{1}{2}{‖H\left(x\right)‖}^{2}\\ \text{s}\text{.t}\text{.}x\in \Omega \end{array}$ (2)

where $f:{R}^{n}\to R$ is continuously differentiable and its gradient denoted by $\nabla f\left(x\right)$ . Many researchers have studied constrained optimization problems such as (2) and given many effective algorithms. For example, a new class of adaptive non-monotone spectral gradient method is given in reference [3] , an active set projected trust region algorithm in [4] . The methods of optimization problems involve the first-order methods and the second-order methods. Classical first-order algorithms include gradient method, sub-gradient method, conjugate gradient method, etc. The main advantage of first-order method is its small storage, which is particularly suitable for large-scale problems. However, the disadvantage of first-order method is that its convergence speed is at most linear, and it can not meet the requirements of high precision. For the second-order method, it has the advantage of fast convergence speed. Under certain conditions, it can achieve superlinear convergence or even quadratic convergence. But its disadvantage is that it needs a good initial point, sometimes it even needs the initial point to approach the local optimal point.

Motivated by this, in this paper, we combine the advantages of the first-order method with those of the second-order method. We will consider the two-stage combination algorithm to solve the optimization problem. First, we use the first-order method to obtain the global convergence of the algorithm, and then use the final point obtained by the first-order method as the new initial point to turn to the second-order method to obtain the fast convergence speed. At the same time, we use projection technology to solve the constrained conditions.

2. Preliminaries

In this section, we present some definitions and theorems that are useful to our main result.

Suppose $H:{R}^{n}\to {R}^{n}$ is a locally Lipschitzian function, according to Rademacher theorem, H is differentiable almost everywhere. Denote the set of points at which H is differentiable by ${D}_{H}$ . We write ${H}^{\prime }\left({x}_{k}\right)$ for the usual $n×m$ Jacobian matrix of partial derivatives whenever x is a point at which the necessary partial derivatives exists. Let $\partial H\left(x\right)$ be the generalized Jacobian defined by Clarke in [5] . Then

$\partial H\left(x\right)={C}_{0}\left({\partial }_{B}H\left(x\right)\right)$ , (3)

where the ${C}_{0}$ denotes the convex hull of a set, ${\partial }_{B}H\left(x\right)=\left\{\underset{\begin{array}{l}\text{\hspace{0.17em}}{x}_{j}\to x\\ {x}_{j}\in {D}_{H}\end{array}}{\mathrm{lim}}{H}^{\prime }\left({x}_{j}\right)\right\}$ .

Definition2.1 [2] Suppose $H:{R}^{n}\to {R}^{n}$ is a locally Lipschitzian function, we say that H is semismooth at x if

$\underset{\begin{array}{l}V\in \partial H\left(x+t{h}^{\prime }\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}{h}^{\prime }\to h,t↓0\end{array}}{\mathrm{lim}}\left\{V{h}^{\prime }\right\}$ (4)

exists for any $h\in {R}^{n}$ .

Lemma 2.2 [2] : Suppose $H:{R}^{n}\to {R}^{n}$ is a locally Lipschitzian function, the following statements are equivalent:

1) H is semismooth at x;

2) For any $V\in \partial H\left(x+h\right),h\to 0$ ,

$Vh-{H}^{\prime }\left(x;h\right)=ο\left(‖h‖\right)$ , (5)

$H\left(x+h\right)-H\left(x\right)-Vh=ο\left(‖h‖\right)$ . (6)

Lemma 2.3 [2] : Suppose $H:{R}^{n}\to {R}^{n}$ is a locally Lipschitzian function, H is semismooth at x if each component of H is semismooth at x.

Definition 2.4 [2] : Suppose $H:{R}^{n}\to {R}^{n}$ is a locally Lipschitzian function, If for any $V\in \partial H\left(x+h\right),h\to 0$

$Vh-{H}^{\prime }\left(x;h\right)=Ο\left({‖h‖}^{1+p}\right)$ , (7)

where $0 , then we call H is p-order semismooth at x.

Lemma 2.5 [2] : suppose $H:{R}^{n}\to {R}^{n}$ is a locally Lipschitzian function, we say H is strongly BD-regular at x if all $V\in {\partial }_{B}H\left(x\right)$ are nonsingular.

Lemma 2.6 [6] : Suppose that $H:{ℝ}^{n}\to {ℝ}^{n}$ is locally Lipschitz continuous and H is BD-regular at $x\in {ℝ}^{n}$ . Then there exist a neighborhood $ℕ\left(x\right)$ of x and a constant K such that for any $y\in ℕ\left(x\right)$ and $V\in {\partial }_{B}H\left(y\right)$ , V is nonsingular and $‖{V}^{-1}‖\le K$ .

Lemma 2.7 [6] : Suppose that $H:{ℝ}^{n}\to {ℝ}^{n}$ is locally Lipschitz continuous and H is BD-regular at a solution ${x}_{\ast }$ of $H\left(x\right)=0$ . If H is semismooth at ${x}_{\ast }$ , then there exist a neighborhood $ℕ\left({x}_{\ast }\right)$ of ${x}_{\ast }$ and a constant $k>0$ such that for any $x\in ℕ\left(x\ast \right)$

$‖H\left(x\right)‖\ge k‖x-{x}_{\ast }‖$ . (8)

Lemma 2.8 [7] : The projection operator ${\Pi }_{X}\left(\cdot \right)$ satisfies.

1) For any $x\in X$ , ${\left[{\Pi }_{X}\left(z\right)-z\right]}^{\text{T}}\left[{\Pi }_{X}\left(z\right)-x\right]\le 0$ for all $z\in {ℝ}^{n}$ .

2) $‖{\Pi }_{X}\left(y\right)-{\Pi }_{X}\left(z\right)‖\le ‖y-z‖$ for all $y,z\in {ℝ}^{n}$ .

Lemma 2.9 [8] : Given $x\in {ℝ}^{n}$ and $d\in {ℝ}^{n}$ , the function $\xi$ defined by

$\xi \left(\lambda \right)=‖{\prod }_{X}\left(x+\lambda d\right)-x‖/\lambda ,\text{\hspace{0.17em}}\lambda \ge 0$ (9)

is nonincreasing.

Lemma 2.9 actually implies that if $x\in X$ is a stationary point of (2), then

${\stackrel{¯}{d}}_{G}\left(\lambda \right)={\Pi }_{X}\left[x+\lambda {d}_{G}\right]-x=0,\text{}\forall \lambda \ge \text{0}$ (10)

3. Algorithm

In order to obtain the global convergence of the algorithm, in the first stage, we adopt the non-monotone spectral projection gradient method of the first-order method. The one-dimensional search procedure of Algorithm 3.1 will be called SPG1 from now on and Algorithm 3.2 will be called SPG2 in the rest of the paper.

Given $z\in {ℝ}^{n}$ , we define $P\left(z\right)$ as the orthogonal projection on $\Omega$ , denote $g\left(x\right)=\nabla f\left(x\right)$ . ${x}_{0}\in \Omega$ , integer $M\ge 1$ , a small parameter ${\alpha }_{\mathrm{min}}>0$ , a large parameter ${\alpha }_{\mathrm{max}}>{\alpha }_{\mathrm{min}}$ , sufficient decrease parameter $\gamma \in \left(0,1\right)$ , $0<{\sigma }_{1}<{\sigma }_{2}<1$ , initially ${\alpha }_{0}\in \left[{\alpha }_{\mathrm{min}},{\alpha }_{\mathrm{max}}\right]$ , ${x}_{0}\in \Omega$ .

Algorithm 3.1 [9] (SPG1)

Step 1. If $‖P\left({x}_{k}\right)-g\left({x}_{k}\right)-{x}_{k}‖<{\epsilon }_{1}$ , stop, input ${x}_{k}$ .

Step 2. (Backtracking)

Step 2.1 Set $\lambda ={\alpha }_{k}$ .

Step 2.2 Set ${x}_{+}=P\left({x}_{k}-\lambda g\left({x}_{k}\right)\right)$ .

Step 2.3 If

$f\left({x}_{+}\right)\le \underset{0\le j\le \mathrm{min}\left\{k,M-1\right\}}{\mathrm{max}}f\left({x}_{k-j}\right)+\gamma 〈{x}_{+}-{x}_{k},g\left({x}_{k}\right)〉,$ (11)

Then define ${\lambda }_{k}=\lambda$ , ${x}_{k+1}={x}_{+}$ , ${s}_{k}={x}_{k+1}-{x}_{k}$ , ${y}_{k}=g\left({x}_{k+1}\right)-g\left({x}_{k}\right)$ , and go to step 3.

If (11) does not hold, define ${\lambda }_{new}\in \left[{\sigma }_{1}\lambda ,{\sigma }_{2}\lambda \right]$ . Set $\lambda ={\lambda }_{new}$ , and go to step 2.2.

Step 3. compute ${b}_{k}=〈{s}_{k},{y}_{k}〉$ , If ${b}_{k}\le 0$ , set ${\alpha }_{k+1}={\alpha }_{\mathrm{max}}$ ; else compute ${\alpha }_{k}=〈{s}_{k},{s}_{k}〉$ , ${\alpha }_{k+1}=\mathrm{min}\left\{{\alpha }_{\mathrm{max}},\mathrm{max}\left\{{\alpha }_{\mathrm{min}},{a}_{k}/{b}_{k}\right\}\right\}$ , and go to step 1.

Algorithm 3.2 [9] (SPG2)

Step 2. (Backtracking)

Step 2.1. Compute ${d}_{k}=P\left({x}_{k}-{\alpha }_{k}g\left({x}_{k}\right)\right)-{x}_{k}$ , Set $\lambda =1$ .

Step 2.2. Set ${x}_{+}={x}_{k}+\lambda {d}_{k}$ .

Step 2.3. If

$f\left({x}_{+}\right)\le \underset{0\le j\le \mathrm{min}\left\{k,M-1\right\}}{\mathrm{max}}f\left({x}_{k-j}\right)+\gamma \lambda 〈{d}_{k},g\left({x}_{k}\right)〉,$ (12)

Then define ${\lambda }_{k}=\lambda ,{x}_{k+1}={x}_{+},{s}_{k}={x}_{k+1}-{x}_{k},{y}_{k}=g\left({x}_{k+1}\right)-g\left({x}_{k}\right)$ , and go to step 3.

If (12) does not hold, define ${\lambda }_{new}\in \left[{\sigma }_{1}\lambda ,{\sigma }_{2}\lambda \right]$ . Set $\lambda ={\lambda }_{new}$ , and go to step 2.2.

The output point of the first stage is used as the initial point of the next stage.

Algorithm 3.3 [10] (A Projected semismooth asymptotical newton method)

Step 0. Choose constants $\rho ,\sigma ,\eta \in \left(0,1\right),{p}_{1}>0,{p}_{2}>2$ , Let ${x}_{0}={x}_{N}\in \Omega$ , $k:=0$ .

Step 1. Choose ${V}_{k}\in {\partial }_{B}H\left({x}_{k}\right)$ , compute $\nabla f\left({x}_{k}\right)={V}_{k}^{\text{T}}H\left({x}_{k}\right)$ .

Step 2. If ${x}_{k}$ is a stationary point, stop. Otherwise let

${d}_{G}^{k}=-{\gamma }_{k}\nabla f\left({x}_{k}\right)$ , (13)

where

${\gamma }_{k}=\mathrm{min}\left\{1,\eta f\left({x}_{k}\right)/{‖\nabla f\left({x}_{k}\right)‖}^{2}\right\}$ , (14)

and go to step 3.

Step 3. If the linear system

$H\left({x}_{k}\right)+{V}_{k}d=0$ (15)

has a solution ${d}_{N}^{k}$ , and

$-\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{N}^{k}\ge {p}_{1}{‖{d}_{N}^{k}‖}^{{p}_{2}}$ , (16)

then use the direction ${d}_{N}^{k}$ . Otherwise, set ${d}_{N}^{k}={d}_{G}^{k}$ .

Step 4. Let ${m}_{k}$ be the smallest nonnegative integer m satisfying

$f\left({x}_{k}+{\stackrel{¯}{d}}^{k}\left({\rho }^{m}\right)\right)\le f\left({x}_{k}\right)+\sigma \nabla f{\left({x}_{k}\right)}^{\text{T}}{\stackrel{¯}{d}}_{G}^{k}\left({\rho }^{m}\right)$ , (17)

where for any $\lambda \in \left[0,1\right]$ ,

${\stackrel{¯}{d}}^{k}\left(\lambda \right)={t}_{k}^{\ast }\left(\lambda \right){\stackrel{¯}{d}}_{G}^{k}\left(\lambda \right)+\left[1-{t}_{k}^{\ast }\left(\lambda \right)\right]{\stackrel{¯}{d}}_{N}^{k}\left(\lambda \right)$ , (18)

${\stackrel{¯}{d}}_{G}^{k}\left(\lambda \right)={\prod }_{X}\left[x+\lambda {d}_{G}^{k}\right]-{x}_{k},{\stackrel{¯}{d}}_{N}^{k}\left(\lambda \right)={\prod }_{X}\left[x+\lambda {d}_{N}^{k}\right]-{x}_{k}$ (19)

and ${t}_{k}^{\ast }\left(\lambda \right)$ is an optimal solution to

$\underset{t\in \left[0,1\right]}{\mathrm{min}}\frac{1}{2}{‖H\left({x}_{k}\right)+{V}_{k}\left[t{\stackrel{¯}{d}}_{G}^{k}\left(\lambda \right)+\left(1-t\right){\stackrel{¯}{d}}_{N}^{k}\left(\lambda \right)\right]‖}^{2}$ , (20)

the optimal solution is

${t}^{\ast }\left(\lambda \right)=\mathrm{max}\left\{0,\mathrm{min}\left\{1,t\left(\lambda \right)\right\}\right\}$ , (21)

let ${\lambda }_{k}={\rho }^{{m}_{k}}$ , ${x}_{k+1}={x}_{k}+{\stackrel{¯}{d}}^{k}\left({\lambda }_{k}\right)$ .

Step 5. Let $k=:k+1$ , and go to step 1.

4. Convergence Analysis

Theorem 4.1 [9] : Algorithm SPG1 is well defined, and any accumulation point of the sequence $\left\{{x}_{k}\right\}$ that is generates is a constrained stationary point.

Theorem 4.2 [9] : Algorithm SPG2 is well defined, and any accumulation point of, and any accumulation.

Theorem 4.3 [10] : Let $\left\{{x}_{k}\right\}\subset X$ be a sequence generated by Algorithm 3.3, then any accumulation point of $\left\{{x}_{k}\right\}$ is a station point of (2).

5. Application

Many practical problems can be solved by transforming them into constrained semi-smooth equations. For example, mixed complement problem (MCP):

$F:{R}^{n}\to {R}^{n}$ is a continuous differentiable function, finding a vectors $x\in X$ satisfies

$F{\left(x\right)}^{\text{T}}\left(y-x\right)\ge 0,\text{}\forall y\in X$ , (22)

The function ${\psi }_{\alpha }:{ℝ}^{2}\to ℝ$ with $\alpha \in \left[0,1\right]$ is defined by

${\psi }_{\alpha }\left(a,b\right):={\left(\left[{\varphi }_{\alpha }{\left(a,b\right)}_{+}\right]\right)}^{2}+{\left({\left[-a\right]}_{+}\right)}^{2}$ , (23)

where ${\left[a\right]}_{+}:=\mathrm{max}\left\{0,a\right\}$ for any $a\in ℝ$ and ${\varphi }_{\alpha }:{ℝ}^{2}\to ℝ$ is the penalized Fischer-Burmeister function introduced by Chen et al. [11] and has the form:

${\varphi }_{\alpha }\left(a,b\right):=\alpha {\varphi }_{FB}\left(a,b\right)+\left(1-\alpha \right){a}_{+}{b}_{+}$ , (24)

Here, ${\varphi }_{\alpha }:{ℝ}^{2}\to ℝ$ is an NCP function, which is given by

${\varphi }_{FB}\left(a,b\right):=\left(a+b\right)-\sqrt{{a}^{2}+{b}^{2}}$ , (25)

The mixed complement problem can be transformed into a semi-smooth system of equations by the above functions.

Let $N=\left\{1,\cdots ,n\right\}$

$\begin{array}{l}{I}_{f}:=\left\{i|{l}_{i}=-\infty ,{u}_{i}=\infty ,i\in N\right\},\text{}{I}_{l}:=\left\{i|{l}_{i}>-\infty ,{u}_{i}=\infty ,i\in N\right\},\\ {I}_{u}:=\left\{i|{l}_{i}=-\infty ,{u}_{i}<\infty ,i\in N\right\},\text{}{I}_{lu}:=N\\left\{{I}_{l}\cup {I}_{u}\cup {I}_{f}\right\}\end{array}$ (26)

MCP can be reformulated as $H\left(x\right)=0$ with

${H}_{i}\left(x\right):=\left\{\begin{array}{l}|{F}_{i}\left(x\right)|\text{}\text{ }\text{ }\text{if}\text{\hspace{0.17em}}i\in {I}_{f}\\ |{\varphi }_{\alpha }\left({x}_{i}-{l}_{i},{F}_{i}\left(x\right)\right)|\text{if}\text{\hspace{0.17em}}i\in {I}_{l}\\ |{\varphi }_{\alpha }\left({u}_{i}-{x}_{i},-{F}_{i}\left(x\right)\right)|\text{}\text{ }\text{ }\text{if}\text{\hspace{0.17em}}i\in {I}_{u}\\ \sqrt{{\psi }_{\alpha }\left({x}_{i}-{l}_{i},{F}_{i}\left(x\right)\right)+{\psi }_{\alpha }\left({x}_{i}-{l}_{i},{F}_{i}\left(x\right)\right)}\text{if}\text{\hspace{0.17em}}i\in {I}_{lu}\end{array},i=1,\cdots ,n\text{}$ (27)

Then we can use the two phase method to solve this problem.

6. Conclusion

In this paper, we proposed a two-phase method for the constrained equations. We can also combine other first-order and second-order methods. In this paper, the iteration complexity analysis of the first-order method is a meaningful work, and we will do further research.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

Cite this paper

Zhang, Y.Z. (2019) A Spectral Projected Gradient-Newton Two Phase Method for Constrained Nonlinear Equations. Journal of Applied Mathematics and Physics, 7, 104-110. https://doi.org/10.4236/jamp.2019.71009

References

1. 1. Mifflin, R. (2006) Semismooth and Semiconvex Functions in Constrained Optimization. SIAM Journal on Control & Optimization, 15, 959-972. https://doi.org/10.1137/0315061

2. 2. Qi, L. and Sun, J. (1993) A Nonsmooth Version of Newton’s Method. Mathematical Programming, 58, 353-368. https://doi.org/10.1007/BF01581275

3. 3. Ji, L. and Yu, Z.S. (2009) New Class of Adaptive Nonmonotone Spectral Projected Gradient Method. Journal of University of Shanghai for Science & Technology.

4. 4. Qi, L.Q., Tong, X.J. and Li, D.H. (2004) Active-Set Projected Trust-Region Algorithm for Box-Constrained Nonsmooth Equations. Journal of Optimization Theory and Applications, 120, 601-625. https://doi.org/10.1023/B:JOTA.0000025712.43243.eb

5. 5. Clarke, F.H. (1983) Optimization and Nonsmooth Analysis. John Wiley & Sons, New York.

6. 6. Qi, L. (1993) Convergence Analysis of Some Algorithms for Solving Nonsmooth Equations. Mathematics of Operations Research, 18, 227-244. https://doi.org/10.1287/moor.18.1.227

7. 7. Zarantonello, E.H. (1971) Projections on Convex Sets in Hilbert Space and Spectral Theory: Part I. Projections on Convex Sets: Part II. Spectral Theory. Revista de la Unión Matemática Argentina, 26, 237-424.

8. 8. Powell, M.J.D. (1983) Variable Metric Methods for Constrained Optimization. Mathematical Programming: The State of the Art, Springer, Berlin Heidelberg. https://doi.org/10.1007/978-3-642-68874-4_12

9. 9. Birgin, E.G., Martínez, J.M. and Raydan, M. (2000) Nonmonotone Spectral Projected Gradient Methods on Convex Sets. Society for Industrial and Applied Mathematic, 10, 1196-1211. https://doi.org/10.1137/S1052623497330963

10. 10. Sun, D., Womersley, R.S. and Qi, H. (2002) A Feasible Semismooth Asymptotically Newton Method for Mixed Complementarity Problems. Mathematical Programming, 94, 167-187. https://doi.org/10.1007/s10107-002-0305-2

11. 11. Chen, B., Chen, X. and Kanzow, C. (2000) A Penalized Fischer-Burmeister NCP-Function. Mathematical Programming, 88, 211-216. https://doi.org/10.1007/PL00011375