Global Convergence of an Extended Descent Algorithm without Line Search for Unconstrained Optimization

doi:10.4236/jamp.2018.61013

Journal of Applied Mathematics and Physics
Vol.06 No.01(2018), Article ID:81762,8 pages
10.4236/jamp.2018.61013

Cuiling Chen^*, Liling Luo, Caihong Han, Yu Chen

●How to Cite this Article

College of Mathematics and Statistics, Guangxi Normal University, Guilin, China

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

http://creativecommons.org/licenses/by/4.0/

Received: December 16, 2017; Accepted: January 13, 2018; Published: January 16, 2018

ABSTRACT

In this paper, we extend a descent algorithm without line search for solving unconstrained optimization problems. Under mild conditions, its global convergence is established. Further, we generalize the search direction to more general form, and also obtain the global convergence of corresponding algorithm. The numerical results illustrate that the new algorithm is effective.

Keywords:

Unconstrained Optimization, Descent Method, Line Search, Global Convergence

1. Introduction

Consider an unconstrained optimization problem (UP)

$\underset{x \in ℜ^{n}}{m i n} f (x),$ (1)

where $f : ℜ^{n} \to ℜ$ is a continuously differentiable function. In general, the iterative algorithms for solving (UP) usually take the form:

$x_{k + 1} = x_{k} + α_{k} d_{k},$ (2)

where $x_{k}, α_{k}$ and $d_{k}$ are current iterative point, a positive step length and a search direction, respectively. For simplicity, we denote $\nabla f (x_{k})$ by $g_{k}$ and $f (x_{k})$ by $f_{k}$ .

The main task in the iterative formula (2) is to choose search direction $d_{k}$ and determine step length $α_{k}$ along the direction. There are many classic methods to choose search direction $d_{k}$ , such as the steepest descent methods, Newton-type methods, Variable metric methods (see [1] ), and conjugate gradient methods

$d_{k} = {\begin{array}{l} - g_{k} & if k = 1, \\ - g_{k} + β_{k} d_{k - 1} & if k \geq 2, \end{array}$ (3)

where $β_{k}$ is a parameter (see [2] [3] [4] ). For step length $α_{k}$ , it is usually determined by line search procedure, such as exact line search, Wolfe line search, Armijo line search, and so on. However, these line search procedures may involve extensive computation of objective functions and its gradients, which often becomes a significant burden for large-scale problems. Evidently, it is a good idea that line search procedure is avoided in algorithm design in order to reduce the evaluations of objective functions and gradients.

Based on the above consideration, some authors have started to study the algorithms without line search. Recently, some conjugate gradient algorithms without line search were investigated. In [5] , Sun and Zhang studied some well-known conjugate gradient methods without line search, for instance, Fletcher-Reeves method, Hestenes-Stiefel method, Dai-Yuan method, Polak- Ribière method and Conjugate Descent method. In [6] , Chen and Sun researched a two-parameter family of conjugate gradient methods without line search. In [7] [8] , Wang and Zhu put forward to conjugate gradient path methods without line search. Shi, Shen and Zhou proposed descent methods without line search in [9] and [10] , respectively. Further, Zhou presented the steepest descent algorithm without line search in [11] .

Inspired by the above literatures, in this paper we will extend the descent algorithm without line search of [10] to more general case, and discuss its global convergence. The rest of this paper is organized as follows. In Section 2, we describe the extended descent algorithm without line search. In Section 3, we analyze its global convergence. Further, we generalize the search direction to more general form, and obtain global convergence of corresponding algorithm. Finally, numerical results are reported in Section 4.

2. Extended Descent Algorithm

To proceed, we first assume that [2]

(H₁) The function f has lower bound on $£ = {x \in ℜ^{n} | f (x) \leq f (x_{1})}$ , where $x_{1}$ is available.

(H₂) The gradient g is Lipschitz continuous in an open convex set $B$ that contains $£$ , i.e., there exists $L > 0$ such that

$‖ g (x) - g (y) ‖ \leq L ‖ x - y ‖, \forall x, y \in B .$ (4)

Now we give the extended algorithm.

Algorithm 2.1. Given a starting point $x_{1}$ , a positive constant $ϵ$ , three

parameters $μ_{1}, μ_{2}$ and $ρ$ such that $0 < μ_{1} < \frac{1}{2} < μ_{2} < 1$ , $\frac{1}{2} \leq ρ < 1$ . Let $k : = 1$ .

Step 1. If $‖ g_{k} ‖ < ϵ$ , then stop; otherwise go to Step 2.

Step 2. Compute

$s_{k} = {\begin{array}{l} ρ, k = 1, \\ \frac{ρ {‖ g_{k} ‖}^{2}}{ρ {‖ g_{k} ‖}^{2} + (1 - ρ) | g_{k}^{T} d_{k - 1} |}, k \geq 2. \end{array}$ (5)

Step 3. Set search direction

$d_{k} = {\begin{array}{l} - s_{k} g_{k}, k = 1, \\ - [ρ (1 - \frac{α_{k - 1} s_{k}}{1 + α_{k - 1}}) g_{k} + (1 - ρ) \frac{α_{k - 1} s_{k}}{1 + α_{k - 1}} d_{k - 1}], k \geq 2. \end{array}$ (6)

Step 4. Compute step length by the following rule. When $k = 1$ , $α_{k}$ is determined by Wolfe line search, i.e., it satisfies that

$f (x_{k} + α_{k} d_{k}) - f_{k} \leq μ_{1} α_{k} g_{k}^{T} d_{k},$ (7)

$g {(x_{k} + α_{k} d_{k})}^{T} d_{k} \geq μ_{2} g_{k}^{T} d_{k} .$ (8)

When $k \geq 2$ ,

$α_{k} = - \frac{g_{k}^{T} d_{k}}{L_{k} {‖ d_{k} ‖}^{2}},$ (9)

where $L_{k}$ satisfies that $ρ L \leq L_{k} \leq m_{k} L$ and ${m_{k}, k = 1, 2, \dots}$ is a positive sequence which has a sufficient large upper bound.

Step 5. Set next iterative point

$x_{k + 1} = x_{k} + α_{k} d_{k} .$ (10)

Step 6. Set $k : = k + 1$ , and go to Step 1.

Remark 2.1. Note that the formula of $s_{k}$ and $d_{k}$ in Algorithm 2.1 are the generalized forms of those in [10] .

3. Global Convergence

Lemma 3.1. If Algorithm 2.1 generates an infinite sequence ${x_{k}, k = 1, 2, \dots}$ , then all search directions $d_{k}$ are descent, and $\forall k \geq 2$ , it holds that

$- g_{k}^{T} d_{k} \geq \frac{ρ {‖ g_{k} ‖}^{2}}{1 + α_{k - 1}} .$ (11)

Proof. If $k = 1$ , it is obvious that $- g_{1}^{T} d_{1} = ρ {‖ g_{1} ‖}^{2} > 0$ . If $k \geq 2$ , by (5) and (6), we have

$\begin{matrix} - g_{k}^{T} d_{k} = ρ (1 - \frac{α_{k - 1} s_{k}}{1 + α_{k - 1}}) {‖ g_{k} ‖}^{2} + (1 - ρ) \frac{α_{k - 1} s_{k}}{1 + α_{k - 1}} g_{k}^{T} d_{k - 1} \\ = ρ {‖ g_{k} ‖}^{2} - \frac{α_{k - 1} s_{k}}{1 + α_{k - 1}} [ρ {‖ g_{k} ‖}^{2} - (1 - ρ) g_{k}^{T} d_{k - 1}] \\ \geq ρ {‖ g_{k} ‖}^{2} - \frac{α_{k - 1} s_{k}}{1 + α_{k - 1}} [ρ {‖ g_{k} ‖}^{2} + (1 - ρ) | g_{k}^{T} d_{k - 1} |] \\ = \frac{ρ {‖ g_{k} ‖}^{2}}{1 + α_{k - 1}} . \end{matrix}$ (12)

This completes the proof. ,

Lemma 3.2 (Mean value theorem, see [1] ). Suppose that the objective function $f (x)$ is continuously differentiable on an open convex set $B$ , then

$f (x_{k} + α d_{k}) - f_{k} = α \int_{0}^{1} g {(x_{k} + t α d_{k})}^{T} d_{k} d t,$ (13)

where $x_{k}, x_{k} + α d_{k} \in B$ , $d_{k} \in ℜ^{n}$ . If $f (x)$ is twice continuously differentiable on $B$ , then

$g (x_{k} + α d_{k}) - g_{k} = α \int_{0}^{1} \nabla^{2} f (x_{k} + t α d_{k}) d_{k} d t,$ (14)

and

$f (x_{k} + α d_{k}) - f_{k} = α g_{k}^{T} d_{k} + α^{2} \int_{0}^{1} (1 - t) d_{k}^{T} \nabla^{2} f (x_{k} + t α d_{k}) d_{k} d t .$ (15)

Lemma 3.3. $\forall k \geq 2$ ,

${‖ d_{k} ‖}^{2} \leq 3 ρ^{2} \cdot \sum_{1 \leq i \leq k} {‖ g_{i} ‖}^{2} .$ (16)

Proof. Where $k \geq 2$ , it holds that $(1 - ρ) s_{k} | g_{k}^{T} d_{k - 1} | = ρ (1 - s_{k}) {‖ g_{k} ‖}^{2}$ by (5). Then $\forall k \geq 2$ , we have

$\begin{matrix} {‖ d_{k} ‖}^{2} = {‖ ρ (1 - \frac{α_{k - 1} s_{k}}{1 + α_{k - 1}}) g_{k} + (1 - ρ) \frac{α_{k - 1} s_{k}}{1 + α_{k - 1}} d_{k - 1} ‖}^{2} \\ = ρ^{2} {(1 - \frac{α_{k - 1} s_{k}}{1 + α_{k - 1}})}^{2} {‖ g_{k} ‖}^{2} + 2 ρ (1 - \frac{α_{k - 1} s_{k}}{1 + α_{k - 1}}) \\ \cdot (1 - ρ) \frac{α_{k - 1} s_{k}}{1 + α_{k - 1}} \cdot g_{k}^{T} d_{k - 1} + {(1 - ρ)}^{2} {(\frac{α_{k - 1} s_{k}}{1 + α_{k - 1}})}^{2} {‖ d_{k - 1} ‖}^{2} \\ \leq ρ^{2} {‖ g_{k} ‖}^{2} + 2 ρ (1 - ρ) s_{k} | g_{k}^{T} d_{k - 1} | + {‖ d_{k - 1} ‖}^{2} \\ = ρ^{2} {‖ g_{k} ‖}^{2} + 2 ρ^{2} (1 - s_{k}) {‖ g_{k} ‖}^{2} + {‖ d_{k - 1} ‖}^{2} \leq 3 ρ^{2} {‖ g_{k} ‖}^{2} + {‖ d_{k - 1} ‖}^{2} . \end{matrix}$

Using induction principle and noting that ${‖ d_{1} ‖}^{2} = ρ^{2} {‖ g_{1} ‖}^{2}$ , it yields that

${‖ d_{k} ‖}^{2} \leq 3 ρ^{2} {‖ g_{k} ‖}^{2} + 3 ρ^{2} {‖ g_{k - 1} ‖}^{2} + 3 ρ^{2} {‖ g_{k - 2} ‖}^{2} + \dots + ρ^{2} {‖ g_{1} ‖}^{2} .$

Therefore (16) holds. The proof is completed. ,

Theorem 3.1. If (H₁), (H₂) hold, and Algorithm 2.1 generates an infinite sequence ${x_{k}, k = 1, 2, \dots}$ , then

$\sum_{k = 2}^{+ \infty} \frac{{‖ g_{k} ‖}^{4}}{{(1 + α_{k - 1})}^{2} \sum_{1 \leq i \leq k} {‖ g_{i} ‖}^{2}} < + \infty;$ (17)

and

$\sum_{k = 2}^{+ \infty} \frac{α_{k}}{1 + α_{k - 1}} {‖ g_{k} ‖}^{2} < + \infty .$ (18)

Proof. When $k \geq 2$ , from (13), (4), Lemma 3.1, Lemma 3.3 and $ρ L \leq L_{k} \leq m_{k} L$ , it yields that

$\begin{matrix} f_{k} - f_{k + 1} = - α_{k} \int_{0}^{1} g {(x_{k} + t α_{k} d_{k})}^{T} d_{k} d t \\ = - α_{k} g_{k}^{T} d_{k} - α_{k} \int_{0}^{1} {[g (x_{k} + t α_{k} d_{k}) - g_{k}]}^{T} d_{k} d t \\ \geq - α_{k} g_{k}^{T} d_{k} - α_{k} \int_{0}^{1} ‖ g (x_{k} + t α_{k} d_{k}) - g_{k} ‖ \cdot ‖ d_{k} ‖ d t \\ \geq - α_{k} g_{k}^{T} d_{k} - α_{k}^{2} L \int_{0}^{1} t {‖ d_{k} ‖}^{2} d t = - α_{k} g_{k}^{T} d_{k} - \frac{1}{2} α_{k}^{2} L {‖ d_{k} ‖}^{2} \end{matrix}$

$\begin{matrix} = (\frac{1}{L_{k}} - \frac{1}{2 L_{k}^{2}}) \frac{{(g_{k}^{T} d_{k})}^{2}}{{‖ d_{k} ‖}^{2}} \geq \frac{(2 ρ - 1) {(g_{k}^{T} d_{k})}^{2}}{2 L m_{k}^{2} {‖ d_{k} ‖}^{2}} \\ \geq \frac{(2 ρ - 1) \cdot ρ^{2} {‖ g_{k} ‖}^{4}}{2 L m_{k}^{2} {(1 + α_{k - 1})}^{2} \cdot 3 ρ^{2} \cdot \sum_{1 \leq i \leq k} {‖ g_{i} ‖}^{2}} \\ = \frac{(2 ρ - 1) {‖ g_{k} ‖}^{4}}{6 L m_{k}^{2} {(1 + α_{k - 1})}^{2} \sum_{1 \leq i \leq k} {‖ g_{i} ‖}^{2}}, \end{matrix}$ (19)

which implies that ${f_{k}, k = 1, 2, \dots}$ is a decreasing sequence. And it is clear that the sequence ${x_{k}, k = 1, 2, \dots}$ generated by Algorithm 2.1 is contained in $B$ by (H₁), and there exists a constant $f^{*}$ such that $\lim_{k \to \infty} f_{k} = f^{*}$ . Therefore

$\sum_{k = 2}^{+ \infty} (f_{k} - f_{k + 1}) = \lim_{N \to + \infty} \sum_{k = 2}^{N} (f_{k} - f_{k + 1}) = \lim_{N \to + \infty} (f_{2} - f_{N + 1}) = f_{2} - f^{*} .$

Thus

$\sum_{k = 2}^{+ \infty} (f_{k} - f_{k + 1}) < + \infty,$

which combining with (19) yields

$\sum_{k = 2}^{+ \infty} \frac{{‖ g_{k} ‖}^{4}}{m_{k}^{2} {(1 + α_{k - 1})}^{2} \sum_{1 \leq i \leq k} {‖ g_{i} ‖}^{2}} < + \infty .$ (20)

Since ${m_{k}, k = 1, 2, \dots}$ has an upper bound, (17) holds.

On the other hand, by (9) and Lemma 3.1, we have

$\begin{matrix} f_{k} - f_{k + 1} \geq - α_{k} g_{k}^{T} d_{k} - \frac{1}{2} α_{k}^{2} L {‖ d_{k} ‖}^{2} \\ = - α_{k} g_{k}^{T} d_{k} + \frac{L α_{k} g_{k}^{T} d_{k}}{2 L_{k}} = - \frac{(2 L_{k} - L) (α_{k} g_{k}^{T} d_{k})}{2 L_{k}} \\ \geq - \frac{(2 ρ - 1) (α_{k} g_{k}^{T} d_{k})}{2 ρ} \geq \frac{(2 ρ - 1) α_{k} {‖ g_{k} ‖}^{2}}{2 (1 + α_{k - 1})} . \end{matrix}$ (21)

By the same analysis as the above proof, (18) holds. The proof is completed. ,

Lemma 3.4 (see [12] ). If the conditions in Theorem 3.1 hold and $\sup_{k \geq 1} {α_{k}} < + \infty$ , then both the sequence ${g_{k}, k = 1, 2, \dots}$ and ${d_{k}, k = 1, 2, \dots}$ have a bound.

Theorem 3.2. If the conditions in Theorem 3.1 hold, then

$\lim \inf_{k \to + \infty} ‖ g_{k} ‖ = 0.$ (22)

Proof. Suppose $\lim \inf_{k \to + \infty} ‖ g_{k} ‖ \neq 0$ , then there exists a positive $γ$ such that

$‖ g_{k} ‖ \geq γ, \forall k \geq 1.$ (23)

In the following, we carry out our proofs in two cases.

Case 1. We complete the proof by utilizing reduction to absurdity. Suppose that $\sup_{k \geq 1} {α_{k}} < + \infty$ . By (17), we have

$\sum_{k = 2}^{+ \infty} \frac{{‖ g_{k} ‖}^{4}}{\sum_{1 \leq i \leq k} {‖ g_{i} ‖}^{2}} < + \infty$ (24)

From Lemma 3.4, we know that there exists $M > 0$ such that $‖ g_{k} ‖ \leq M, \forall k \geq 1$ . Combining (23), we have

$\frac{{‖ g_{k} ‖}^{4}}{\sum_{1 \leq i \leq k} {‖ g_{i} ‖}^{2}} \geq \frac{γ^{4}}{k \cdot M^{2}} .$

It is known that

$\sum_{k = 2}^{+ \infty} \frac{γ^{4}}{k \cdot M^{2}} = \frac{γ^{4}}{M^{2}} \sum_{k = 2}^{+ \infty} \frac{1}{k} = + \infty,$

$\sum_{k = 2}^{+ \infty} \frac{{‖ g_{k} ‖}^{4}}{\sum_{1 \leq i \leq k} {‖ g_{i} ‖}^{2}} = + \infty,$ (25)

which contradicts with (24). Therefore (22) holds.

Case 2. When $\sup_{k \geq 1} {α_{k}} = + \infty$ , the proof is the same as that in [10] and here is omitted.

It follows from the proofs of Case 1 and Case 2 that (22) holds. This completes the proof. ,

Remark 3.1. Search direction of Algorithm 2.1 can be extended to more general form as follows:

$d_{k} = {\begin{array}{l} - s_{k} g_{k}, k = 1, \\ - ρ (1 - φ (α_{k - 1}) s_{k}) g_{k} \pm (1 - ρ) ϕ φ (α_{k - 1}) s_{k} d_{k - 1}, k \geq 2, \end{array}$ (26)

where the function $φ (α)$ satisfies the following conditions(see [10] ):

a) It is continuous and strictly monotone increasing when $α \in [0, + \infty)$ ;

b) $\lim_{α \to 0^{+}} φ (α) = φ (0) = 0$ and $\lim_{α \to + \infty} φ (α) = 1$ ;

c) $α (1 - φ (α))$ is continuous, strictly monotone increasing when $α \in [0, + \infty)$ , and

$\lim_{α \to + \infty} α (1 - φ (α)) = 1.$

Evidently, there are many functions satisfying the conditions (a)-(c). For

example, $\frac{α}{1 + α}$ , $\frac{α^{2}}{1 + α + α^{2}}$ , $\frac{α^{3}}{1 + α^{2} + α^{3}}$ , etc (see [10] ). We denote Algorithm

2.1 in which $d_{k}$ is determined by (26) as Algorithm 3.1. By using proof technique of above Theorem 3.2, it is easy to get its convergence theorem.

4. Numerical Results

In this section, we report some preliminary numerical experiments. The test problems and their initial values are drawn from [13] .

In numerical experiment, we take the parameter $L_{k} = 100$ ,and stop the iteration if the inequality $‖ g_{k} ‖ \leq 10^{- 5}$ is satisfied. The detailed numerical results

Table 1. Numerical results.

are reported in Table 1, in which NI, NF and NG denote the total number of iterations, the total number of function evaluations and the total number of gradient evaluations, respectively. From Table 1, we can see the extended algorithm has good numerical results.

5. Conclusion

In this paper, we extended the descent algorithm without line search of [10] to more general case, and got its global convergence. Compared with [10] , the extended algorithm has more effective numerical perfermance, so it is effective. In the future, we will further research the descent algorithms without line search, and try to get some new descent algorithms without line search, which not only convergence globally, but also have good numerical results.

Acknowledgements

We gratefully acknowledge the scholarship fund of education department of Guangxi Zhuang autonomous region, Guangxi basic ability improvement project fund for the middle-aged and young teachers of colleges and universities (2017KY0068, KY2016YB069), Guangxi higher education undergraduate course teaching reform project fund (2017JGB147), NNSF of China (11761014), Guangxi natural science foundation (2017GXNSFAA198243).

Cite this paper

Chen, C.L., Luo, L.L., Han, C.H. and Chen, Y. (2018) Global Convergence of an Extended Descent Algorithm without Line Search for Unconstrained Optimization. Journal of Applied Mathematics and Physics, 6, 130-137. https://doi.org/10.4236/jamp.2018.61013

References

1. Nocedal, J. and Stephen, J.W. (1999) Numerical Optimization. Springer-Verlag, New York. https://doi.org/10.1007/b98874

2. Dai, Y.H. and Yuan, Y. X. (2000) Nonlinear Conjugate Gradient Methods. Shanghai Science and Technology Press of China, Shanghai. (In Chinese)

3. Gilbert, J.C. and Nocedal, J. (1992) Global Convergence Properties of Conjugate Gradient Methods for Optimization. SIAM Journal on Optimization, 2, 21-42. https://doi.org/10.1137/0802003

4. Grippo, L. and Lucidi, S. (1997) A Globally Convergent Version of the Polak-Ribière Conjugate Gradient Method. Mathematical Programming, 78, 375-391. https://doi.org/10.1007/BF02614362

5. Sun, J. and Zhang, J.P. (2001) Global Convergence of Conjugate Gradient Methods without Line Search. Annals of Operations Research, 103, 161-173. https://doi.org/10.1023/A:1012903105391

6. Chen, X.D. and Sun, J. (2002) Global Convergence of a Two-Parameter Family of Conjugate Gradient Methods without Line Search. Journal of Computational and Applied Mathematics, 146, 37-45. https://doi.org/10.1016/S0377-0427(02)00416-8

7. Wang, J.Y. and Zhu, D.T. (2016) Conjugate Gradient Path Method without Line Search Technique for Derivative-Free Unconstrained Optimization. Numerical Algorithms, 73, 957-983. https://doi.org/10.1007/s11075-016-0124-9

8. Wang, J.Y. and Zhu, D.T. (2017) Derivative-Free Restrictively Preconditioned Conjugate Gradient Path Method without Line Search Technique for Solving Linear Equality Constrained Optimization. Computers and Mathematics with Applications, 73, 277-293. https://doi.org/10.1016/j.camwa.2016.11.025

9. Shi, Z. J. and Shen, J. (2005) Convergence of Descent Method without Line Search. Applied Mathematics and Computation, 167, 94-107. https://doi.org/10.1016/j.amc.2004.06.097

10. Zhou, G.M. (2009) A Descent Algorithm without Line Search for Unconstrained Optimization. Applied Mathematics and Computation, 215, 2528-2533. https://doi.org/10.1016/j.amc.2009.08.058

11. Zhou, G.M. and Feng, C.S. (2013) The Steepest Descent Algorithm without Line Search for p-Laplacian. Applied Mathematics and Computation, 224, 36-45. https://doi.org/10.1016/j.amc.2013.07.096

12. Shi, Z.J. and Shen, J. (2005) A New Descent Algorithm with Curve Search Rule. Applied Mathematics and Computation, 161, 753-768. https://doi.org/10.1016/j.amc.2003.12.058

13. Morè, J.J., Garbbow, B.S. and Hillstrom, K.E. (1981) Testing Unconstrained Optimization Software. ACM Transactions on Mathematical Software, 7, 17-41. https://doi.org/10.1145/355934.355936

Journal Menu >>