Higher Order Iteration Schemes for Unconstrained Optimization

doi:10.4236/ajor.2011.13011

American Journal of Oper ations Research, 2011, 1, 73-83

doi:10.4236/ajor.2011.13011 Published Online September 2011 (http://www.SciRP.org/journal/ajor)

Higher Order Iteration Schemes for Unconstrained

Optimization

Yangyang Shi1, Pingqi Pan2

1Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology,

Delft, The Netherlands

2Department of Mat hem at i cs, S o ut heast University, Nanjing, China

E-mail: shiyang1983@gmail.com, panpq@seu.edu.cn

Received August 3, 2011; revised August 20, 2011; accepted September 19, 2011

Abstract

Using a predictor-corrector tactic, this paper derives new iteration schemes for unconstrained optimization. It

yields a point (predictor) by some line search from the current point; then with the two points it constructs a

quadratic interpolation curve to approximate some ODE trajectory; it finally determines a new point (correc-

tor) by searching along the quadratic curve. In particular, this paper gives a global convergence analysis for

schemes associated with the quasi-Newton updates. In our computational experiments, the new schemes us-

ing DFP and BFGS updates outperformed their conventional counterparts on a set of standard test problems.

Keywords: Unconstrained Optimization, Iteration Scheme, ODE Method, Quasi-Newton Update,

Convergence Analysis

1. Introduction

Consider the unconstrained optimization problem



min ,

n

f

xx R (1)

where :n

f

RR is twice continuously differentiable.

Let k

x

be the k-th iteration point. We will denote

values of ()

f

x and its gradient at k

x

by k

f

and k

f



,

respectively.

Optimization problems are usually solved by iteration

methods. The line search widely used in unconstrained

optimization is a kind of iteration scheme for updating

iterates. Such a scheme, by which one obtains the next

iterate 1k

x

 from a current iterate k

x

, is of the follow-

ing form:

1=

kk ,

k

x

xp



 (2)

where k and p



are termed search direction and step-

size, respectively. k is usually determined as a descent

direction with respect to the objective

p()

f

x, and



by

exact or inexact line searches, so that the objective value

decreases after the iteration.

For instance, the famous Newton method uses the

scheme with search direction



1

2

=,

kk

pf



 

k

f

where 2k

f

 is the Hessian matrix of ()

f

x at k

x

, and

stepsize =1



.

The quasi-Newton methods are reliable and efficient

in solving the unconstrained optimization problems.

Saving explicit calculations of the second order deriva-

tives and solution of a system of linear equations,

quasi-Newton methods achieved a great degree of popu-

larity since the first paper of Davidon [1,2]. He used

=,

kk

pHf

k





where k

H

is some approximation to the inverse Hes-

sian matrix



1

2k

f



.

The next approximate inverse Hessian matrix 1k

H



,

is obtained by updating k

H

by rank-one or rank-two

matrix. To this end, all quasi-Newton updates require

1k

H



for satisfying the so-called quasi-Newton equa-

tion:

1=,

kk k

H

ys

 (3)

where 1k

=

kk

yf f



 and 1k

=

kk

s

xx

.

Various quasi-Newton updates were proposed in the

past. The important modification of Davidon’s work by

Fletcher and Powell [3] (the DFP algorithm) was the first

and successful one. It was then surpassed by the BFGS

update (as accepted as the best quasi-Newton method)

[4-8] proposed independently by Broyden, Fletcher,

Goldfarb and Shanno. These updates theoretically guar-

antee all k

H

to be positive definite; therefore, the asso-

74 Y. Y. SHI ET AL.

ciated k is a descent direction, and the objective de-

creases if

p



is determined by some line search.

There are other iteration schemes that appear differ-

ently from the conventional ones. The so-called ODE

methods use the following initial value problem:



0

d=

d

d0=

xpx

t

x











(4)

Assume that satisfies certain conditions, and

hence the preceding defines a trajectory.

()px

Arrow, Huwicz and Uzawa [9] used ()= ()pxfx





and . The associated trajec-

tories might be called steepest descent curve and Newton

curve respectively [10]. In this way, in fact, one could

obtain many curves corresponding to existing uncon-

strained optimization methods.



1

2() ()fx fx



 ()px=

Pan [11-13] generalized the steepest descent curve and

Newton curve by setting ()=()()px xAx



, where

()

x



is called ratio factor and ()

A

x direction matrix.

He suggested some concrete ratio factors and direction

matrices, and showed that under certain conditions, the

objective value decreases strictly along the associated

trajectory, the limit point of which is just an optimum.

ODE methods treat the optimization problem in the

view of trajectory. They use numerical methods to ap-

proximately calculate associated trajectory, and finally

approach the limit point of the trajectory. When Euler's

approach is applied in the ODE method, standard itera-

tion schemes are obtained. In fact the standard iteration

schemes are originally derived in the direction of de-

creasing the objective function value instead of trajectory.

Euler's approach is only of the first order precision. So it

is possible to apply higher order approach to mimic the

trajectory to get higher order iteration scheme than the

standard one.

In this paper, we derive new iteration schemes along

this line. In view of the importance of DFP and BFGS

methods, we will focus on iteration schemes with respect

to these methods.

The paper is organized as follows. Section 2 derives

new iteration schemes. Section 3 offers the convergence

analysis. Section 4 reports encouraging computational

results with a set of problems.

2. Higher Order Iteration Scheme

Assume that k

x

is the current iterate. The next iterate

1k

x

 will be determined by approximately following the

trajectory, defined by (4). Let 1k

x



 be a predictor. In-

troduce notation

 

11

,.

kkk kk

ppx ppx







We construct a quadratic interpolation curve, locally

approximating the trajectory as follows:





2

=

kk,

k

x

tatbtc (5)

where satisfy the following

conditions: ,,

kkk

abc

2=

kkkk kk

atbtcx ,

,

(6a)

2=

kk kk

at bp



(6b)

211

=

kk kkkk

atbtcx 1

,





 

1

.

(6c)

1

2=

kkk k

atb p



 (6d)

Set 1, so =0

k

t

1

(0) =k

x

x



1

=

kk

bp . From (6a)-(6d), it is eas-

ily to draw that



=

k

c, and

1k

x





11

=

2

kk

kkk

pp

txx





, (7a)

1

=

2

kk

pp

at 

.



(7b)

Pre-multiplying the both sides of (7a) by ,

we obtain an approximate 1k, furthermore, have an

approximate solution of (6a)-(6d).



1

T

kk

xx





t





11

2

1

=,

4

T

kk kkkk

k

kk

pp xxpp

axx





 



 



1

1

,

(8)

1

=, =

kkkk

bpcx





(9)

where





1kk

xx



 denotes 2-norm of the vector





1kk

xx



.

The unconstrained optimization problem (1) can get a

approximate solution by solving the following one-di-

mension minimization problem:









2

min=, 0.

kkk

tfatbtct



  (10)

To solve such problem, we apply the inexact line

search rule, furthermore, we modify the sufficient decent

condition





,

T

kkk kk

f

xtp fxtpf





 (11)

in this way:





 

2,

T

kkkkkk

f

atbtcf ctbfc



  (12)

where (0,1)





.

2.1. Modified Inexact Line Search Algorithm

The conventional backtracking inexact line search [14]

oper ates in th is way. A t the beginning we set . The

algorithm will stop if t satisfies the sufficient decent

condition. Otherwise, the algorithm will continue with

ˆ

=tt

tt





.

A modified backtracking inexact line search algorithm

Y. Y. SHI ET AL.

75

was obtained by applying the expression (12) as the suf-

ficient decent condition in backtracking line search algo-

rithm.

Subalgorithm 2.1 modified backtracking inexact line

search

Step 0. Given ,

ˆ

t



, (0,1)



,



; set .

ˆ

=tt

T

Step 1. Set if =0

k

a>

kk kk

af cbf

T





, where con-

stant 1

ˆ

ct

.

Step 2. If



 

2,

T

kkkkkk

f

atbtcf ctbfc



  (13)

go to step 4; ot herwise, go to ste p 3.

Step 3. tt



 an d go to step 2.

Step 4. Terminate with . =

k

tt

2.2. Higher Order Iteration Schemes

The higher order iter ation schemes, firstly obtain the pr e-

dictor 1k

x



 from the current point k

x

by inexact line

search rule following the direction k. Then construct

the quadratic interpolation curve by the relevant infor-

mation of k

p

x

and 1k

x



, and calculate 1k

x

 satisfying

modified inexact line search rule (12). The overall steps

of the higher order iteration schemes are organized as

follows.

Algorithm 2.1 Higher order iteration schemes

Step 0. Given initial point x0,



, (0,1)



d an



, set

:=k0;

Step 1. If <

k

f



, stop.

Step 2. compute the predictor

 Call backtracking inexact line search algorithm to ob-

tain t.

k

 Compute k1=

kkk

x

xtp







.

Step 3. If 1<fx

k





Step 4. Compute a new iteration point.

, stop.

 Compute ,,abc by (8) and (9).

kkk

 Call subalgorithm 2.1 to get .

1k

t



 Compute 2

111

=

k kkkkk

x

at bt





:= 1kkc

.

Step 5. Set and go to step 1.

2.3. An Extension of the Higher Order

Iteration Scheme

The higher order iteration schemes vary with different

. In this paper we extend the higher order iteration

schemes to BFGS method, and set k

. We

get the predictor 1k

()px =

kk

pHf

x



 from k

x

, satisfying the back-

tracking inexact line search rule, in the direction .

From k

k

p

x

and 1k

x



, we compute 111kk

.

Then by searching along the curve, we obtain the new

point

=

k

pH



f



1k

x

. The overall steps of the variant of the itera-

tion schemes are organized as follows.

Algorithm 2.2 higher order iteration schemes using

BFGS upda te

Step 0. Given initial point x0; set , := 0k



, (0,1)





,

N, 0=

H

I,



and



;

Step 1. Compute the predictor.

 Call backtracking inexact line search algorithm to ob-

tain t.

k

 C ompute k1=

kkkk

x

xtHf







 and





.

11

=

kk

ffx





Step 2. If





1<fx

k





, stop.

Step 3. Comp ut e 1k

H



.

 If T

kk

ys





 then 1=

k

H

I



; otherwise

1=1

TTT

kkk kkkkkkkk

kk TT T

kk kkkk

yHy sssyHHys

HH sy sysy



 

 





 



 ,

T

(14)

where 1k

=

kk

y

ff









, =

kkkk

s

tH f

.

Step 4. Compute the new iteration point.

 C ompute ,,abc by (8) and (9).

kkk

 Call subalgorithm 2.1 to get .

1k

t



 C ompute 2

111

=

k kkkkk

x

at bt



c



 .

Step 5. If 1<fk





Step 6. Update

, stop.

k

H

.

 If 1( )kmodN= 0



or T

kk

ys



 then 1=

k

H

I

; oth-

erwise

1=1

TTT

kkkkkkkkkkk

kk TT T

kk kkkk

yHy sssyHHys

HH sy sysy



 

 



 ,

T

f

(15)

where 1

=

kk

yf

k



 , 1

=

kk k

s

xx

.

 If 11 1

111

<

T

kk k

kkk

fH f

fHf



 













or 11

1

<

T

kk

k

fH

f









, then

set 1=

k

H

I

.

Step 7. Set := 1kk



and go to step 1.

Note:

1) In this paper,

I

denotes identity matrix.

2) In step 3, we adopt a strategy to make sure that the

curvature condition >

T

kk

ys



 is hold. So 1k

H



 is posi-

tive and 1k

F



 is a decent direction.

3) In step 4, we call the subalgorithm (2.1), in which

we set the parameter 1

=c



.

4) In step 6, we adopt a restart technique that if 1k



is the integral multiple of the N or T

kk

ys



, we restart

with 1=

k

H

I

. Clearly, the BFGS method is a kind of

conjugate direction method, so the restart technique can

reduce the accumulation of the roundoff errors.

5) We only report the variant using BFGS update. we

also derive a variant of the higher order iteration schemes

by using the DFP updating formula instead of the BFGS

76 Y. Y. SHI ET AL.

updating formula in step 3 and step 6.

3. The Global Convergence of the Higher

Order Iteration Schemes

Definition 3.1 [11] Curve ()

x

t where (0, )t



, if for any

>0



, (0, )t



, ()

x

t is conta ined in the domain of

and is strictly monotone decreasing in ()fx

(0, )(())fxt



,

then ()

x

t is a decent trajectory of at ()fx0

(0) =

x

x.

And if lim()

t

x

t



 is exist and equal the minimization

point of , then the curve ()fx ()

x

t is normal decent

trajectory of at

()fx 0

x

.

Pan proved the global convergence of the ODE meth-

ods with ratio factor and direction matrix [11]. In this

paper we only consider the situation that ratio factor is 1

and direction matrix is identity matrix. So we draw the

theorem as follows,

Theorem 3.1 [11] Given 0n

x



, assume the level set









0

=: ,

n



x

xfxfx

is bounded close set, and is twice continuously

differentiable in the set  and 0, then the

right segment trajectory of the the ordinary equations (4)

is the decent curve of at 0

()fx

)

()0

fx

(

fx

x

and the limit point

of the trajectory is the stationery point of ()

f

x. If

is convex, then the right segment trajectory is

normal decent curve of at

()fx ()fx 0

x

.

We use the quadratic interpolation curve (5) to ap-

proximate the trajectory. However, when ,

the iteration scheme may not be decent in the local re-

gion of the predictor 1k

>0

T

kk

af

x



. So we apply the strategy of

step 1 in subalgorithm (2.1) to keep the iteration decent.

Theorem 3.2 Given constant and

ˆ

t1

ˆ

ct

. In subal-

gorithm (2.1), if is decent direction satisfying

k

b

0,

T

kk

bf

then for any 1

0<tc

, the condition



20,

T

kk k

at btf

is hold.

Proof. If , clearly, the conclusion holds.

0

T

kk

af

>0

T

af

Otherwise , from algorithm (2.1) step 1, if

kk

>,

TT

kk kk

af cbf

then set , so the conclusion holds. If

=0

k

a

,

TT

kk kk

af cbf

then



210.

T

kkkkkk

atbtfta bf

c





 



 □

Theorem 3.3 Consider the algorithm (2.1), where k

and k are decent direction, and k satisfying the con-

dition

p

p

1

T

kk

pf c

pf

 

 (16)

and

2k

pcf

k

(17)

where 1 and 2 are constants. Suppose that >0c>0c

()

f

x is bounded below and continuously differentiable

in the level set







0

=: ,

n

xxfx fx,

(18)

where 0

x

is the starting point. And the gradient

f



is

Lipschitz continuous on ; namely, there exists L such

that







.

f

xfyLxy  (19)

Then for some ,

k=0

k

f



is hold, otherwise,

lim=0.

k

kf

  (20)

Proof. Consider the situation that for all k, 0

k

f



.

Then from the algorithm (2.1), we have that

1,

T

kkkkk

f

ftpf









 (21)

and

11111 ,

T

kk kkkk

1

f

ftpff







 







(22)

By (21) and (22), we have

1,

T

kk kkk

f

ftp



f



 

 (23)

and

11

0.

kk

ff







 (24)

With (23) and (24) , we obtain

1.

T

kk kkk

f

ftp



f



 (25)

By summing this expression over all indices less than

or equal to k, we obtain

01 =0 .

kT

kkk

jk

f

ftp



f



 

 (26)

Since f is bounded below, we have that 01k

f





is

less than some positive constant, for all k. Hence, by

taking limits in (26), we obtain

Y. Y. SHI ET AL.

77

.

=0 <

T

kkk

j

tpf



 (27)

In standard inexact line search algorithem, we know

that if the initiate trial does not satisfy the condition

ˆ

t

(11), then k

t



violate the condition. So

>.

T

kk

kkk k

tt

k

f

xpf pf











 

(28)

By the Lipschitz condition (37), we have



1

0

2

0

2

=

=d

d

1

=2

21

, for all 0<

kk kkkk

tT

k

T

kk kkkkk

tk

T

kk kk

T

kk kk k

T

kk

T

kk kk

k

fffxtpf

tpff xspfps

tp fsLps

tpfLt p

pf

tp ftLp







 





















(29)

It follows from (28) and (29) that



2

21

>

T

kk

k

pf

t

Lp







.

(30)

If initiate trial satisfy the condition (11), then .

Furthermore, from (16) and (17), we have

ˆ

tˆ

=

tt



2

21 ˆ

min,> 0.

T

kk

k

pf

t

Lp















t

(31)

It follows fro m (27) that



2

=0

21| |

ˆ

min, <

T

kk T

kk

jk

pf tp f

Lp









 







 (32)

It follows from (16) and (17)



22

2

112

=0

21 ˆ

min, <

kk

j

cptcc f

L





















(33)

By (33), we obtain that

2

=0 <

k

j

f



. (34)

This implies that

=0.

lim k

kf

  (35)

The theorem (3.3) analyzes the global convergence of

the iteration scheme based on ODE, similarly, we obtain

the global convergence of the variant iteration scheme

using BFGS update formula.

Theorem 3.4 Consider the algorithm (2.2), suppose

that f is bounded below in and continuously differ-

entiable in the level set

n













0

=xf xf x, (36)

where 0

x

is the starting point. And the gradient

f



is

Lipschitz continuous on ; namely, there exists L such

that











.

f

xfyLxy  (37)

Then for some ,

k=0

k

f



is hold, otherwise,

=0.

lim k

kf

  (38)

Proof. Consider the situation that for all k, 0

k

f



.

Then from the algorithm (2.2), we have that





11 11

T

kkkkk1

f

ftHf f

















k

(39)

and



1.

T

kk kk

f

ftHff









 (40)

The step 3 implies that





11 1

>0

T

kk k

Hf f

 





 (41)

By combining the condition (39), (40) and (41), we have



1.

T

kkkk k

f

ftHf



f



 (42)

From the step 8 in the algorithm (2.2), we have that

11 1

111

,

T

kkk

fH f

fHf



 









(43)

and

11

1

.

T

kk

k

fH

f









 (44)

From the theorem (3.3), we conclude that

=0.

lim k

kf

  (45)

is hold.

□

4. Computational Results

In this section, we report computational results showing

that the variant iteration schemes using BFGS and DFP

update formula outperformed the BFGS method and DFP

method on two sets of test functions. The first set of 20

functions were from [15], and the second from [16], which

can be obtained from http://www.ici.ro/camo/neculai/

ansoft.htm.

Y. Y. SHI ET AL.

78

4.1. Test Codes 4.2. Result for 20 Small Scale Functions

In this section, the following four codes are tested: The first set of test problems included the 20 problems.

Numerical results obtained are listed in Table 1, where

numbers of function value computation and gradient

computation are listed in columns labeled “f ” and “

f



”,

respectively. And CUP-time required for solving each

problem are listed in columns labeled ‘Time’ and its unit

is second. “-” denotes that the algorithm does not get a

correct solution in upper bound iteration number.

 DFP: the DFP method.

 BFGS: the BFGS method.

 HDFP: the higher order iteration schemes using DFP

update.

 HBFGS: the higher order iteration schemes using

BFGS update.

To have the competitions fair and easy, all the codes

were implemented with the same parameters: 4

=10





,

=0.5



, , 0

=15N=

H

I, ,

6

=10



12

=10





and

; The detail results of the BFGS and HBFGS are

showed below. And the performance of DFP and HDFP

are only demonstrated in the overall result s table.

ˆ=1

t

Compiled using Matlab 7.0.4, the four codes operated

under a Windows XP system Home Edition Version

2002 on an Asus PC with Genuine Intel(R) Centrino-

Duo T2300 processor 1.66 GHz and 1.00 GB memory.

Table 1 serves as an comparison between the BFGS

and HBFGS. It shows that the computation numbers of

function value andgradient vectors of HBFGS are fewer

than that of BFGS. However, the HBFGS costs 0.11

seconds more than the BFGS, because the HBFGS has to

compute k. Although the computation of k is much

less compare with that of function value, it affects the

CPU-time, especially, for small scale problems. So the

HBFGS is competitive with BFGS on the 20 small scale

a a

Table 1. Statistics of first 20 functions.

BFGS HBFGS

Problem Time f

f



Time f

f



Rosenbrock 0.02 62 36 0.06 133 44

Freudenstein and Roth 0.00 29 14 0.03 48 16

Powell badly scaled 0.17 796 303 0.02 257 33

Brown badly scaled - - - - - -

Beale 0.03 23 16 0.03 35 18

Jennrich and Sampson 0.00 7 2 0.02 7 2

Helical vally 0.05 77 35 0.06 166 44

Bard 0.05 430 34 0.03 65 22

Guassian 0.00 6 5 0.02 20 10

Meyer - - - - - -

Gulf research and develop 0.05 1 2 0.00 1 2

Box three-dimensional 0.04 51 41 0.05 86 44

Powell singular 0.05 65 36 0.06 157 58

Wood 0.05 88 32 0.06 194 44

Kowalik and Osborne 0.05 45 42 0.05 71 38

Brown and Dennis 0.34 2546 263 0.20 1225 104

Biggs EXP6 0.03 21 19 0.05 31 20

Watson 0.39 79 42 0.47 163 54

Extended Rosenbrock 0.13 244 74 0.25 832 130

Broyden banded 0.09 160 45 0.19 647 112

total 1.53 4730 1041 1.64 4138 795

Y. Y. SHI ET AL.

79

problems.

4.3. Result for 50 Middle Scale Functions

The second test set of 5 0 problems consis t of 43 functions

with 100 variables, 3 functions with 200 variables and 4

functions with 300 variables. The problems with “*”

have 300 ind ependen t variables, and with “**” have 200

independe nt variables.

Table 2 shows that compared with BFGS, the compu-

tation of the function value and the gradient vectors and

CPU-time of HBFGS decrease by 52.65%, 52.08% and

36.01%, respectively. In summary the HBFGS method

are faster and have less computation than the BFGS

method.

4.4. Result for 50 Large Scale Functions

The second test set of 50 problems consist of 46 func-

tions with 500 variables, 4 functions with 300 variables.

The problems with “*” have 300 inde pendent variabl e s.

Table 3 shows that the HBFGS’s CPU-time, computa-

tion numbers of function value and gradient vectors are

less than the BFGS by 949.10 seconds, 38808 and 3957,

respectively.

4.5. Statistics of the Ratio

The Table 4 gives overall comparison of HDFP, HBFGS

and DFP, BFGS. In Table 4, “Time” denotes the run

time ratio, “f ” denotes function value computation num-

ber ratio and “

f



” denotes gradient computation num-

ber ratio.

Table 4 shows that the HDFP outperforms the DFP

with the average CPU-time ratio 1.58, function co mputa-

tion ratio 1.67 and gradient computation ratio 1.71. And

the HBFGS defeats the BFGS with the average CPU-

time ratio 1.23, function computation ratio 1.57 and gra-

dient com put at i on rat i o 1.56.

4.6. Summary of the Tests

As the tests show, although the higher order iteration

schemes add the computation of k, it has less compu-

tation of function value and gradient vector. For large

scale problems, the computation of is much less

than that of function value.

a

k

a

5. Concluding Remarks

We gave a new iteration scheme based on ODE, proved

Table 2. Statistics of middle scale 50 functions.

BFGS HBFGS

Problem Time f

f



Time f

f



Strictly Convex1 0.05 7 8 0.06 21 12

Strictly Convex2 0.30 202 97 0 .19 147 60

Extended Freudenstein an d R o t h 0 .08 21 12 0.13 45 18

Extended Trigonometric 0.17 237 57 0.39 388 124

Extended White and Holst 0.19 75 36 0.22 118 38

Extended Beale 0.08 20 15 0.09 45 20

Extended Penalty 0.09 85 26 0.13 141 3 8

Perturbed Quadratic 0.80 1729 279 0.53 1095 170

Diagonal2 0.28 112 113 0.27 97 96

Diagonal1 5.58 10756 1998 1.03 1701 342

Diagonal3 5.77 10380 2001 1.03 1758 326

Hager 0.16 138 63 0.13 107 40

Generalized Tridiagonal-1 0.47 460 104 0.44 439 90

Extended Tridiagonal-1 0.19 30 28 0.17 29 24

Extended Three Exponential Terms 0.06 12 9 0.05 3 3 12

80 Y. Y. SHI ET AL.

Generalized Tridiagonal-2 0.67 1261 189 0.77 1340 192

Diagonal4 0.03 9 4 0.03 28 6

Diagonal5 0.03 5 6 0.03 12 8

Extended Himmelblau 0.06 19 10 0.06 56 18

Generalized PSC1 0.73 369 313 0.28 137 104

Extended PSC1 0.06 22 15 0.09 41 22

Extended Powell** 1.45 148 78 0.98 111 50

Extended Block Diagonal BD1 0.05 14 13 0.06 30 16

Extended Maratos 0.19 146 67 0.25 264 90

Extended Cliff 0.17 71 44 0.09 65 22

Quadratic Diagonal Perturbed 0.11 57 40 0.13 65 42

Extended Wood** 0.34 81 25 0.64 134 44

Scaled Quadratic SQ1** 6.61 3163 460 5.70 32 05 360

Quadratic Function QF1 0.52 1015 193 0.42 784 140

Extended Quadratic Penalty QP1 0.06 41 21 0.06 45 16

Extended Quadratic Penalty QP2 0.08 49 26 0.11 82 34

A Quadratic Function QF2 0.92 2352 324 0.58 137 7 186

Extended EP1 0.03 12 5 0.05 31 8

Extended Tridiagonal-2 0.11 75 40 0.11 83 38

BDQRTIC 1.70 5044 560 0.69 1619 206

TRIDIA 1.75 5288 570 0.88 2336 258

ARWHEAD 1.69 749 121 1.81 748 114

NONDIA (Shanno-78) 0.06 88 19 0.19 378 58

NONDQUAR 5.25 2170 1999 2.86 1207 1040

DQDRTIC 0.05 49 13 0.14 234 42

Extended Rosenbrock 0.13 74 38 0.16 153 52

EG2 0.05 21 12 0.08 57 24

DIXMAANA* 0.48 14 11 0.75 27 16

DIXMAANB* 0.69 19 16 0.94 31 20

Almost Perturbed Quadratic 0.80 1744 280 0.47 963 148

Tridiagonal Perturbed Quadratic 1.25 1686 267 0.83 1088 168

DIXMAANC* 0.67 21 16 0.98 40 20

DIXMAANE* 8.31 176 174 4.93 109 102

Partial Perturbed Quadratic 0.97 1618 260 0.64 1019 158

Broyden Tridiagonal 0.64 1030 174 0.66 1015 160

total 50.97 52964 11251 32.31 25078 5392

Y. Y. SHI ET AL.

81

Table 3. Statistics of large scale 50 functions on BFGS and HBFGS.

BFGS HBFGS

Problem Time f

f



Time f

f



Strictly Convex1 1.58 7 8 2.13 22 12

Strictly Convex2 383.11 8094 2001 346.83 6694 1714

Extended Freudenstein an d R o t h 3.14 21 12 5.83 48 20

Extended Trigonometric 12.84 411 66 48.61 706 242

Extended White and Holst 9.36 75 36 11.09 123 40

Extended Beale 3.39 20 15 4.83 45 20

Extended Penalty 116.31 6041 580 32.44 309 17

Perturbed Quadratic 81.64 3428 398 86.69 3508 400

Diagonal2 67.25 353 354 38.95 205 194

Diagonal1 423.53 15033 2001 315.34 10818 1448

Diagonal3 418.64 16311 2001 323.20 11385 1432

Hager 33.78 626 176 58.16 1511 284

Generalized Tridiagonal-1 19.14 414 95 19.84 441 94

Extended Tridiagonal-1 7.98 35 32 6.41 30 24

Extended Three Exponential Terms 2.02 13 10 2.48 32 12

Generalized Tridiagonal-2 28.84 977 145 41.16 1365 196

Diagonal4 0.42 9 4 1.28 29 8

Diagonal5 0.91 5 6 1.27 12 8

Extended Himmelblau 1.70 19 10 4.02 62 20

Generalized PSC1 110.98 672 584 23.94 166 120

Extended PSC1 3.14 25 16 4.75 41 22

Extended Powel** 1.56 148 78 1.02 111 50

Extended Block Diagonal BD1 2.66 15 14 2.91 30 16

Extended Maratos 12.27 141 65 17.20 241 86

Extended Cliff 9.16 71 44 4.72 65 22

Quadratic Diagonal Per turbed 24.92 343 131 19.22 258 96

Extended Wood 0.36 81 25 0.66 134 44

Scaled Quadratic SQ1 6.59 3163 460 5.67 3205 360

Quadratic Function QF1 67.39 2626 344 78.47 2949 378

Extended Quadratic Penalty QP1 3.36 40 19 8.72 374 42

Extended Quadratic Penalty QP2 6.42 76 35 9.84 133 50

A Quadratic Function QF2 110.25 5324 556 103.45 4822 492

Extended EP1 0.69 12 5 1.39 31 8

Y. Y. SHI ET AL.

82

Extended Tridiagonal-2 7.16 72 39 8.30 83 42

BDQRTIC 401.72 21393 2001 216.27 10251 1010

TRIDIA 403.97 23427 2001 160.38 8716 750

ARWHEAD 1.70 749 121 1.77 748 114

NONDIA (Shanno-78) 16.36 310 86 44.33 913 218

NONDQUAR 423.55 2203 2001 600.22 3070 2555

DQDRTIC 2.53 51 14 7.42 199 38

Extended Rosenbrock 7.20 74 38 10.33 153 52

EG2 11.64 486 59 5.80 280 34

DIXMAANA* 0.48 14 11 0.73 27 16

DIXMAANB* 0.67 19 16 0.95 31 20

Almost Perturbed Quadratic 71.27 3120 361 75.47 3187 360

Tridiagonal Pert u r be d Q u ad ratic 89.06 3623 420 87.52 3468 392

DIXMAANC* 0.66 21 16 0.94 40 20

DIXMAANE* 8.11 176 174 4.98 109 102

Partial Perturbed Quadratic 82.22 3446 392 93.06 3728 420

Broyden Tridiagonal 40.69 1206 203 44.28 1303 208

total 3544.33 125019 18279 2995.23 86211 14322

Table 4. Statistics of the ratio.

DFP/HDFP BFGS/HBFGS

Problem Time f

f



Time f

f



small 20 functions 2.49 2.66 2.63 0.93 1.14 1.31

middle 50 functions 1.03 1.29 1.23 1.58 2.11 2.09

large scale 50 functions 1.23 1.07 1.27 1.18 1.45 1.28

average 1.58 1.67 1.71 1.23 1.57 1.56

the global convergence of this scheme and variant method

using DFP and BFGS update formula. In particular, this

iteration has a class of variant methods using different

directions as the right-hand side vectors of (4). From our

experiments, we can safely conclude that this iteration

scheme improved the BFGS and DFP method on the test

data sets.

6. References

[1] W. C. Davidon, “Variable Metric Method for Minimiza-

tion,” Technical Report ANLC5990 (Revised), Argonne

National Laboratory, Argonne, 1959.

[2] W. C. Davidon, “Variable Metric Method for Minimiza-

tion,” SIAM Journal on Optimization, Vol. 1, No. 1, 1991,

pp. 1-17. doi:10.1137/0801001

[3] R. Fletcher and M. J. D. Powell, “A Rapidly Convergent

Descent Method for Minimization,” The Computer Jour-

nal, Vol. 6, No. 2, 1963, pp. 163-168.

[4] C. G. Broyden, “The Convergence of a Class of Double

Rank Minimization Algorithms 2. The New Algorithms,”

IMA Journal of Applied Mathematics, Vol. 6, No. 3, 1970,

pp. 222-231. doi:10.1093/imamat/6.3.222

[5] R. Fletcher, “A New Approach to Variable Matric Algo-

rithm,” The Computer Journal, Vol. 13, No. 3, 1970, pp.

317-322. doi:10.1093/comjnl/13.3.317

[6] C. G. Broyden, “The Convergence of a Class of Double

Rank Minimization Algorithms 1. General Considera-

tion,” IMA Journal of Applied Mathematics, Vol. 6, No. 1,

Y. Y. SHI ET AL.

83

1970, pp. 76-90. doi:10.1093/imamat/6.1.76

[7] D. Goldfarb, “A Family of Variable Metric Methods De-

rived by Variational Means,” Mathematics of Computa-

tion, Vol. 24, No. 109, 1970, pp. 23-26.

doi:10.1090/S0025-5718-1970-0258249-6

[8] D. F. Shanno, “Conditioning of Quasi-Newton Methods

for Function on Minimization,” Mathematics of Compu-

tation, Vol. 24, No. 111, 1970, pp. 647-656.

doi:10.1090/S0025-5718-1970-0274029-X

[9] K. J. Arrow, L. Hurwicz and H. Uzawa, “Studies in Lin-

ear and Nonlinear Programming,” Stanford University

Press, Palo Alto, 1958.

[10] F. H. Branin and S. K. Hoo, “A Method for Finding Mul-

tiple Extreme of a Function of n Variables,” In: F. A.

Lootsman, Ed., Numerical Method for Nonlinear Opti-

mization, Academic Press, Cambridge, 1972.

[11] P. Q. Pan, “Differential Equation Methods for Uncon-

strained Optimization,” Nanjing University Journal of

Computational Mathematics, in Chinese, Vol. 4, 1982, pp.

338-349.

[12] P. Q. Pan, “New ODE Methods of Equality Constrained

Optimization (1): Equations,” Journal of Computational

Mathematics, Vol. 10, No. 1, 1992, pp. 77-92.

[13] P. Q. Pan, “New ODE Methods for Equality Constrained

Optimization (2): Algorithm,” Journal of Computational

Mathematics, Vol. 10, No. 2, 1992, pp. 129-146.

[14] J. Nocedal and S. J. Wright, “Numerical Optimization,”

Science Press, Beijing, 2006.

[15] J. J. More, B. S. Garbow and K. E. Hillstrome, “Testing

Unconstrained Optimization Software,” ACM Transac-

tions on Mathematical Software, Vol. 7, No. 1, 1981, pp.

17-41. doi:10.1145/355934.355936

[16] N. Andrei, “Unconstrained Optimization Test Function,”

Advanced Modeling and Optimization, Vol. 10, No. 1,

2008, pp. 147-161.

Paper Menu >>

Journal Menu >>