A Line Search Algorithm for Unconstrained Optimization

doi:10.4236/jsea.2010.35057

Paper Menu >>

Journal Menu >>

J. Software Engineering & Applications, 2010, 3: 503-509

doi:10.4236/jsea.2010.35057 Published Online May 2010 (http://www.SciRP.org/journal/jsea)

A Line Search Algorithm for Unconstrained

Optimization*

Gonglin Yuan1, Sha Lu2, Zengxin Wei1

1College of Mathematics and Information Science, Guangxi University, Nanning, China; 2School of Mathematical Science, Guangxi

Teachers Education University, Nanning, China.

Email: glyuan@gxu.edu.cn

Received February 6th, 2010; revised March 30th, 2010; accepted March 31st, 2010.

ABSTRACT

It is well known that the line search methods play a very important role for optimization problems. In this paper a new line

search method is proposed for solving unconstrained optimization. Under weak conditions, this method possesses global

convergence and R-linear convergence for nonconvex function and convex function, respectively. Moreover, the given

search direction has sufficiently descent property and belongs to a trust region without carrying out any line search rule.

Numerical results show that the new method is effective.

Keywords: Line Search, Unconstrained Optimization, Global Convergence, R-linear Convergence

1. Introduction

Consider the unconstrained optimization problem

min( )

, (1)

where :n

RR is continuously differentiable. The

line search algorithm for (1) often generates a sequence of

iterates {}

by letting

1,0,1,2,

kkkk

xx dk



  (2)

where k

x is the current iterate point, k

d is a search

direction, and 0



 is a steplength. Different choices

of k

d and k



will determine different line search

methods [1-3]. The method is divided into two stages at

each iteration: 1) choose a descent search direction k

d; 2)

choose a step size k



along the search direction k

Throughout this paper, we denote ()

x by k

()

x by k

, and 1

()

x

 by 1k

, respectively.

|| || denotes the Euclidian norm of vectors.

One simple line search method is the steepest descent

method if we take kk

dg as a search direction at every

iteration, which has wide applications in solving

large-scale minimization problems [4]. However, the

steepest descent method often yields zigzag phenomena in

solving practical problems, which makes the algorithm

converge to an optimal solution very slowly, or even fail

to converge [5,6]. Then the steepest descent method is not

the fastest one among the line search methods.

If kkk

dHg



 is the search direction at each iteration

in the algorithm, where k

is an n × n matrix approxi-

mating 21

[()]



, then the corresponding line search

method is called Newton-like method [4-6] such as

Newton method, quasi-Newton method, variable metric

method, etc. Many papers [7-10] have been proposed by

the method for optimization problems. However, one

drawback of the Newton-like method is required to store

and compute matrix k

at each iteration and thus adds

the cost of storage and computation. Accordingly, this

method is not suitable to solve large-scale optimization

problems in many cases.

The conjugate gradient method is a powerful line

search method for solving the large scale optimization

problems because of its simplicity and its very low

memory requirement. The search direction of the conju-

gate gradient method often takes the form

,0,

kkk

gdifk

dgifk















 (3)

where kR





is a scalar which determines the different

* This work is supported by China NSF grants 10761001, the Scientific

Research Foundation of Guangxi University (Grant No. X081082), and

Guangxi SF grants 0991028.

A Line Search Algorithm for Unconstrained Optimization

504

conjugate gradient methods [11-13] etc. The convergence

behavior of the different conjugate gradient methods with

some line search conditions [14] has been widely studied

by many authors for many years (see [4, 15]). At present,

one of the most efficient formula for k



from the com-

putation point of view is the following PRP method

()

|| ||

PRP kk k







 (4)

If 1kk

, it is easy to get 0

PRP



, which implies

that the direction k

d of the PRP method will turn out to

be the steepest descent direction as the restart condition

automatically when the next iteration point is approximate

to the current point. This case is very important to ensure

the efficiency of the PRP conjugate gradient method (see

[4,15] etc.). For the convergence of the PRP conjugate

gradient method, Polak and Ribière [16] proved that the

PRP method with the exact line search is globally con-

vergent when the objective function is convex, and Powell

[17] gave a counter example to show that there exist

nonconvex functions on which the PRP method does not

converge globally even the exact line search is used.

We all know that the following sufficient descent con-

dition

|| ||

kk k

dcg , for all 0k and some constant

0c (5)

is very important to insure the global convergence of the

algorithm by nonlinear conjugate gradient method, and it

may be crucial for conjugate gradient methods [14]. It has

been showed that the PRP method with the following

strong Wolfe-Powell (SWP) line search rules which is to

find the step size k



satisfying

() T

kkk kkkk

xdf gd



 , (6)

and

|() |||

kkkk kk

xdd gd



 (7)

did not ensure the condition (5) at each iteration, where



, 12



. Then Grippo and Lucidi [18]

presented a new line search rule which can ensure the

sufficient descent condition and established the conver-

gence of the PRP method with their line search technique.

Powell [17] suggested that k



should not be less than

zero. Considering this idea, Gilbert and Nocedal [14]

proved that the modified PRP method max{0, }

PRP





is globally convergent under the sufficient descent as-

sumption condition and the following weak Wolfe-Powell

(WWP) line search technique: find the steplength k



such that (6) and

()

kkkk kk

xdd gd





. (8)

Over the past few years, much effort has been put to

find out new formulas for conjugate methods such that

they have not only global convergence property for gen-

eral functions but also good numerical performance (see

[14,15]). Resent years, some good results on the nonlinear

conjugate gradient method are given [19-25].

These observations motivate us to propose a new

method which possesses not only the simplicity and low

memory but also desirable theoretical features. In this

paper, we design a new line search method which pos-

sesses not only the sufficiently descent property but also

the following property

|| ||||||

dcg



, for all 0k and some constant

10c (9)

whatever line search rule is used, where the property (9)

implies that the search direction k

d is in a trust region

automatically.

This paper is organized as follows. In the next section,

the algorithms and other line search rules are stated. The

global convergence and the R-linear convergence of the

new method are established in Section 3. Numerical re-

sults and one conclusion are presented in Section 4 and in

Section 5, respectively.

2. The Algorithms

Besides the inexact line search techniques WWP and

SWP, there exist other line search rules which are often

used to analyze the convergence of the line search

method:

1) The exact minimization rule. The step size k



chosen such that

()min()

kkkk k

xd fxd









. (10)

2) Goldstein rule. The step size k



is chosen to satisfy

(6) and

() T

kkkkkkk

xdf gd



 . (11)

Now we give our algorithm as follows.

1) Algorithm 1 (New Algorithm)

Step 0: Choose an initial point 0,

R and constants





, 1





, 12





. Set 00





()



 , :0.k



Step 1: If |||| ,





then stop; Otherwise go to step 2.

Step 2: Compute steplength k



by one line search

technique, and let 1kkkk







.

Step 3: If 1

|||| ,







then stop; Otherwise go to step 4.

A Line Search Algorithm for Unconstrained Optimization

505

Step 4: Calculate the search direction 1k

d by (3),

where k



is defined by (4).

Step 5: Let 11

11 11

min{0, }

|| ||

new kk

kk kk

dd gg



 





 ,

where 1

|| ||||||









, 1kk k



 ,

|| ||max{||||,|| ||}

kkk

ysy

, 1kk k

yg g





.

Step 6: Let 1

:naw



, :1kk, and go to step 2.

Remark. In the Step 5 of Algorithm 1, we have

||||max{||||,||||}1

|| |||| ||

kkk

ysy



,

which can increase the convergent speed of the algorithm

from the computation point of view.

Here we give the normal PRP conjugate gradient algo-

rithm and one modified PRP conjugate gradient algorithm

[14] as follows.

2) Algorithm 2 (PRP Algorithm )

Step 0: Choose an initial point 0,

R and constants



, 1



, 12





. Set 00





()

x, :0.k

Step 1: If |||| ,



 then stop; Otherwise go to step 2.

Step 2: Compute steplength k



by one line search

technique, and let 1kkkk



 .

Step 3: If 1

|||| ,



 then stop; Otherwise go to step 4.

Step 4: Calculate the search direction 1k

d by (3),

where k



is defined by (4).

Step 5: Let :1kk and go to step 2.

3) Algorithm 3 (PRP+ Algorithm see [14])

Step 0: Choose an initial point 0,

R and con-

stants 01



, 1



, 12





. Set 0



()

fx , :0.k

Step 1: If |||| ,



 then stop; Otherwise go to step 2.

Step 2: Compute steplength k



by one line search

technique, and let 1kkkk



 .

Step 3: If 1

|||| ,



 then stop; Otherwise go to step 4.

Step 4: Calculate the search direction 1k

d by (3),

where max{0, }

PRP





Step 5: Let :1kk and go to step 2.

We will concentrate on the convergent results of Al-

gorithm 1 in the following section.

3. Convergence Analysis

The following assumptions are often needed to analyze

the convergence of the line search method (see [15,26]).

Assumption A (i) fis bounded below on the level set

{:()()}

Rfx fx ;

Assumption A (ii) In some neighborhood 0



is differentiable and its gradient is Lipschitz con-

tinuous, namely, there exists a constants 0L such that

||()() ||||||

xgy Lxy





, for all 0

,xy.

In the following, let 0



for all k, for otherwise a

stationary point has been found.

Lemma 3.1 Consider Algorithm 1. Let Assumption (ii)

hold. Then (5) and (9) hold.

Proof. If 0k



, (5) and (9) hold obviously. For 1k,

by Assumption (ii) and the Step 5 of Algorithm 1, we

have

111

|||||||| ||||

||||(2 max{1,}1)|||| .

new

kkk

ddd

gLg











Now we consider the vector product 11





in the

following two cases:

case 1. If 110







. Then we get

11 1111

11 1

min{0, }||||||||

|| ||

|||| .

Tnew Tkk

kk kkkk

kk k

gd gdgg

gd g



 



 













case 2. If 110



. Then we obtain

11 1111

111 1

min{0, }||||||||

|| ||

|| |||| ||

|| ||

|||| .

Tnew Tkk

kk kkkk

Tkk

kkk k

gd gdgg

gdg g



 





 





 



 



Let (0,1)c



, 12max{1, }1cL



 and use the Step 6

of Algorithm 1, (5) and (9) hold, respectively. The proof is

completed.

The above lemma shows that the search direction k

has such that the sufficient descent condition (5) and the

condition (9) without any line search rule.

Based on Lemma 3.1, Assumption (i) and (ii), let us

give the global convergence theorem of Algorithm 1.

Theorem 3.1 Let 11

{,, ,}

kkkk

dx g





be generated by

Algorithm 1 with the exact minimization rule, the Gold-

stein line search rule, the SWP line search rule, or the

WWP line search rule, and Assumption (i) and (ii) hold.

Then

lim ||||0

  (12)

holds.

A Line Search Algorithm for Unconstrained Optimization

506

Proof. We will prove the result of this theorem with the

exact minimization rule, the Goldstein line search rule, the

SWP line search rule, and the WWP line search rule,

respectively.

1) For the exact minimization rule. Let the step size k



be the solution of (10).

By the mean value theorem, 0

gd , and Assump-

tion (ii), for any

||||

5 ||||5 ||||

kk kk

gd gd

Ld Ld









we have

() ()() ()

()()

[() ]()

kkk kkkkk

kkkkk

kkkkkkk kk

xd fxfxdfx

gxtdd dt

dgxtdg ddt





















224

1|| ||

|| ()

114

()|||| (13)

5|| ||225|| ||

()

25 ||||

kkkk k

kk kk

gdL d

gd gd

gd Ld

LdL d







 



which together with Assumption (i), we can obtain

()

|| ||









. (14)

This implies that

()

lim 0

||||

  (15)

holds. By Lemma 3.1, we get (12).

2) For Goldstein rule. Let the step size k



be the so-

lution of (6) and (11).

By (11) and the mean value theorem, we have

()

kk kkkk kkkkk

xddff gd

 



 ,

where (0,1)



, thus

()

kkkkk kk

xddgd

 

.

Using Assumption (ii) again, we get

(1 )

[ ()]||||

kkkkkk kk

xdgdLd



 



,

which combining with (6), and use Assumption (i), we

have (14) and (15), respectively. By Lemma 3.1, (12)

holds.

3) For strong Wolf-Powell rule. Let the step size k



be the solution of (6) and (7).

By (7), we have

()

TTT

kkkkk kkk

dgx ddgd



 .

Similar to the proof of the above case. We can obtain

(12) immediately.

4) For weak Wolf-Powell rule. Let the step size k



the solution of (6) and (8). Similar to the proof of the case

3), we can also get (12).

Then we conclude this result of this theorem.

By Lemma 3.1, there exists a constant 00



 such

that

|||||| ||





for all .k (16)

By the proof process of Lemma 3.1. We can deduce that

there exists a positive number 1



satisfying

(),

|| ||

ff d







 for all .k (17)

Similar to the proof of Theorem 4.1 in [27], it is not

difficult to prove the linear convergence rate of Algorithm

1. We state the theorem as follows but omit the proof.

Theorem 3.2 (see [27]) Based on (16), (17), and the

condition that the function

is twice continuously dif-

ferentiable and uniformly convex on n

R. Let

{,, ,}

kkkk

dxg





be generated by Algorithm 1 with the

exact minimization rule, the Goldstein line search rule,

the SWP line search rule, or the WWP line search rule.

Then {}

converges to



at least linearly, where



is the unique minimal point of ()

4. Numerical Results

In this section, we report some numerical experiments

with Algorithm 1, Algorithm 2, and Algorithm 3. We test

these algorithms on some problem [28] taken from

MATLAB with given initial points. The parameters

common to these methods were set identically, 10.1





20.9





, 6

10 ,





 In this experiment, the following

Himmeblau stop rule is used:

If 1

|()|

xe,let 1

|()( )|

1|()|

fx fx

stop fx





; Other-

wise, let 1

1| ( )()|

stopfxfx 

 , where 5

110e



. If

|| ||





or 2

top e



was satisfied, the program will

be stopped, where 5

210e



.

We also stop the program if the iteration number is

more than one thousand. Since the line search cannot

always ensure the descent condition 0



, uphill

search direction may occur in the numerical experiments.

In this case, the line search rule maybe failed. In order to

avoid this case, the stepsize k



will be accepted if the

A Line Search Algorithm for Unconstrained Optimization

507

searching number is more than forty in the line search.

The detailed numerical results are listed on the web site

http://210.36.18.9:8018/publication.asp?id=34402

Dolan and Moré [29] gave a new tool to analyze the

efficiency of Algorithms. They introduced the notion of a

performance profile as a means to evaluate and compare

the performance of the set of solvers S on a test set P.

Assuming that there exist

n solvers and

n problems,

for each problem

and solver

, they defined

,ps

tcomputing time (the number of function evalua-

tions or others) required to solve problem

by solvers.

Requiring a baseline for comparisons, they compared

the performance on problem

by solver

with the

best performance by any solver on this problem; that is,

using the performance ratio

min{ :}

rtsS



Suppose that a parameter ,

rr for all

chosen, and ,

rr if and only if solver s does not

solve problem

The performance of solver

on any given problem

might be of interest, but we would like to obtain an overall

assessment of the performance of the solver, then they

defined

(){ :},

sps

tsizepPrt





Thus ()



was the probability for solver

S that a

performance ratio ,

r was within a factor tR of the

best possible ration. Then function



was the (cumula-

tive) distribution function for the performance ratio. The

performance profile :[0,1]



 for a solver was a

nondecreasing, piecewise constant function, continuous

from the right at each breakpoint. The value of (1)



was

the probability that the solver would win over the rest of

the solvers.

According to the above rules, we know that one solver

whose performance profile plot is on top right will win

over the rest of the solvers.

In Figures 1-3, NA denotes Algorithm 1, PRP denotes

Algorithm 2, and PRP+ denotes Algorithm 3. Figures 1-3

show that the performance of these methods is relative to

NTNFm NG, where NF and NG denote the

number of function evaluations and gradient evaluations

respectively, and m is an integer. According to the re-

sults on automatic differentiation [30], the value of m

can be set to 5m. That is to say, one gradient evalua-

tion is equivalent to m number of function evaluations if

automatic differentiation is used. From these three figures

Figure 1. Performance profiles(NT) of methods with Gold-

stein rule

Figure 2. Performance profiles(NT) of methods with strong

Wolfe-Powell rule

Figure 3. Performance profiles (NT) of methods with weak

Wolfe-Power rule

A Line Search Algorithm for Unconstrained Optimization

508

it is clear that the given method has the most wins (has the

highest probability of being the optimal solver).

In summary, the presented numerical results reveal that

the new method, compared with the normal PRP method

and the modified PRP method [14], has potential advan-

tages.

5. Conclusions

This paper gives a new line search method for uncon-

strained optimization. The global and R-linear conver-

gence are established under weaker assumptions on the

search direction k

d. Especially, the direction k

d satis-

fies the sufficient condition (5) and the condition (9)

without carrying out any line search technique, and some

paper [14,27,30] often obtains these two conditions by

assumption. The comparison of the numerical results

shows that the new search direction of the new algorithm

is a good search direction at every iteration.

REFERENCES

[1] G. Yuan and X. Lu, “A New Line Search Method with

Trust Region for Unconstrained Optimization,” Commu-

nications on Applied Nonlinear Analysis, Vol. 15, No. 1,

2008, pp. 35-49.

[2] G. Yuan, X. Lu, and Z. Wei, “New Two-Point Stepsize

Gradient Methods for Solving Unconstrained Optimi-

zation Problems,” Natural Science Journal of Xiangtan

University, Vol. 29, No. 1, 2007, pp. 13-15.

[3] G. Yuan and Z. Wei, “New Line Search Methods for

Uncons- trained Optimization,” Journal of the Korean

Statistical Society, Vol. 38, No. 1, 2009, pp. 29-39.

[4] Y. Yuan and W. Sun, “Theory and Methods of Optimi-

zation,” Science Press of China, Beijing, 1999.

[5] D. C. Luenerger, “Linear and Nonlinear Programming,”

2nd Edition, Addition Wesley, Reading, MA, 1989.

[6] J. Nocedal and S. J. Wright, “Numerical Optimization,”

Springer, Berlin, Heidelberg, New York, 1999.

[7] Z. Wei, G. Li, and L. Qi, “New Quasi-Newton Methods for

Unconstrained Optimization Problems,” Applied Mathe-

matics and Computation, Vol. 175, No. 1, 2006, pp. 1156-

1188.

[8] Z. Wei, G. Yu, G. Yuan, and Z. Lian, “The Superlinear

Convergence of a Modified BFGS-type Method for

Unconstrained Optimization,” Computational Optimiza-

tion and Applications, Vol. 29, No. 3, 2004, pp. 315-332.

[9] G. Yuan and Z. Wei, “The Superlinear Convergence Anal-

ysis of a Nonmonotone BFGS Algorithm on Convex

Objective Functions,” Acta Mathematica Sinica, English

Series, Vol. 24, No. 1, 2008, pp. 35-42.

[10] G. Yuan and Z. Wei, “Convergence Analysis of a Modified

BFGS Method on Convex Minimizations,” Computational

Optimization and Applications, Science Citation Index,

2008.

[11] Y. Dai and Y. Yuan, “A Nonlinear Conjugate Gradient

with a Strong Global Convergence Properties,” SIAM

Journal of Optimization, Vol. 10, No. 1, 2000, pp. 177-

182.

[12] Z. Wei, G. Li, and L. Qi, “New Nonlinear Conjugate

Gradient Formulas for Large-Scale Unconstrained Optimi-

zation Problems,” Applied Mathematics and Computation,

Vol. 179, No. 2, 2006, pp. 407-430.

[13] G. Yuan and X. Lu, “A Modified PRP Conjugate Gradient

Method,” Annals of Operations Research, Vol. 166, No. 1,

2009, pp. 73-90.

[14] J. C. Gibert, J. Nocedal, “Global Convergence Properties

of Conjugate Gradient Methods for Optimization,” SIAM

Journal on Optimization, Vol. 2, No. 1, 1992, pp. 21-42.

[15] Y. Dai and Y. Yuan, “Nonlinear Conjugate Gradient

Methods,” Shanghai Science and Technology Press, 2000.

[16] E. Polak and G. Ribiè, “Note Sur la Xonvergence de

Directions Conjugèes,” Rev Francaise Informat Recher-

che Operatinelle 3e Annèe, Vol. 16, 1969, pp. 35-43.

[17] M. J. D. Powell, “Nonconvex Minimization Calculations

and the Conjugate Gradient Method,” Lecture Notes in

Mathematics, Springer-Verlag, Berlin, Vol. 1066, 1984,

pp. 122-141.

[18] L. Grippo and S. Lucidi, “A Globally Convergent Version

of the Polak-RibiÈRe Gradient Method,” Mathematical

Programming, Vol. 78, No. 3, 1997, pp. 375-391.

[19] W. W. Hager and H. Zhang, “A New Conjugate Gradient

Method with Guaranteed Descent and an Efficient Line

Search,” SIAM Journal on Optimization, Vol. 16, No. 1,

2005, pp. 170-192.

[20] Z. Wei, S. Yao, and L. Liu, “The Convergence Properties

of Some New Conjugate Gradient Methods,” Applied

Mathematics and Computation, Vol. 183, No. 2, 2006, pp.

1341-1350.

[21] G. H. Yu, “Nonlinear Self-Scaling Conjugate Gradient

Methods for Large-scale Optimization Problems,” Thesis

of Doctor's Degree, Sun Yat-Sen University, 2007.

[22] G. Yuan, “Modified Nonlinear Conjugate Gradient

Methods with Sufficient Descent Property for Large-Scale

Optimization Problems,” Optimization Letters, Vol. 3, No.

1, 2009, pp. 11-21.

[23] G. Yuan, “A Conjugate Gradient Method for Uncons-

trained Optimization Problems,” International Journal of

Mathematics and Mathematical Sciences, Vol. 2009, 2009,

pp. 1-14.

[24] G. Yuan, X. Lu, and Z. Wei, “A Conjugate Gradient

Method with Descent Direction for Unconstrained Optimi-

zation,” Journal of Computational and Applied Mathe-

matics, Vol. 233, No. 2, 2009, pp. 519-530.

[25] L. Zhang, W. Zhou, and D. Li, “A Descent Modified

Polak-RibiÈRe-Polyak Conjugate Method and its Global

Convergence,” IMA Journal on Numerical Analysis, Vol.

26, No. 4, 2006, pp. 629-649.

[26] Y. Liu and C. Storey, “Efficient Generalized Conjugate

Gradient Algorithms, Part 1: Theory,” Journal of Optimi-

zation Theory and Application, Vol. 69, No. 1, 1992, pp.

17-41.

A Line Search Algorithm for Unconstrained Optimization

509

[27] Z. J. Shi, “Convergence of Line Search Methods for

Unconstrained Optimization,” Applied Mathematics and

Computation, Vol. 157, No. 2, 2004, pp. 393-405.

[28] J. J. Moré, B. S. Garbow, and K. E. Hillstrome, “Testing

Unconstrained Optimization Software,” ACM Transac-

tions on Mathematical Software, Vol. 7, No. 1, 1981, pp.

17-41.

[29] E. D. Dolan and J. J. Moré, “Benchmarking Optimization

Software with Performance Profiles,” Mathematical

Programming, Vol. 91, No. 2, 2002, pp. 201-213.

[30] Y. Dai and Q. Ni, “Testing Different Conjugate Gradient

Methods for Large-scale Unconstrained Optimization,”

Journal of Computational Mathematics, Vol. 21, No. 3,

2003, pp. 311-320.