Global optimization of protein-peptide docking by a filling function method

doi:10.4236/ojapps.2012.24B007

Paper Menu >>

Journal Menu >>

Global optimization of protein-peptide docking by a filling

function method

Francesco Lampariello, Giampaolo Liuzzi

Istituto di Analisi dei Sistemi ed Informatica (IASI)

CNR

Rome, Italy

giampaolo.liuzzi@iasi.cnr.it

Abstract—Molecular docking programs play a crucial role in drug design and development. In recent years, much attention has been

devoted to the protein-peptide docking problem in which docking of a flexible peptide with a known protein is sought. In this work we

present a new docking algorithm which is based on the use of a filling function method for continuos constrained global optimization.

Indeed, the protein-peptide docking position is sought by minimizing the conformational potential energy subject to constraints necessary

to maintain the primary sequence of the given peptide. The resulting global optimization problem is difficult mainly for two reasons. First,

the problem is large scale in constrained global optimization; second, the energy function is multivariate non-convex so that it has many

local minima. The method is based on the device of modifying the original objective function once a local minimum has been attained by

adding to it a filling term. This allows the overall algorithm to escape from local minima thus, ultimately, giving the algorithm ability to

explore large regions in the peptide conformational space. We present numerical results on a set of benchmark docking pairs and

comparison with the well-known software package for molecular docking PacthDock.

Keywords-Protein-peptide docking; potential reduction; continuous global optimization

1. Introduction

In this paper we address the problem of docking small

peptide molecules, for example, drug candidates, onto a

given protein model. Molecular docking programs play a

crucial role in drug design and development. In recent years,

much attention has been devoted to this problem where

docking of a flexible peptide with a known protein is sought.

We consider a docking algorithm based on the use of a

filling function method for continuos unconstrained global

optimization [1].

The correct protein-peptide docking position is obtained

by minimizing the function representing the total potential

energy according to a specific mathematical model. In order

to preserve the primary sequence of the given peptide it is

necessary to take into account some constraints on the

problem variables, and then we construct the Lagrangian of

the original problem. The resulting optimization problem

has two main features; it is a large-scale one in constrained

global optimization, and the total potential energy function

has many local minima. Once a local minimum has been

found, the method modifies the original objective function

by adding to it a filling term. This allows the algorithm to

escape from the local minimum so that it may explore large

regions in the search space.

The approaches proposed in the literature are based on

the minimization of successive approximations of some

potential energy function, obtained by introducing some

suitable parameters (see, e.g. [2], [3]) in order to simplify

the single minimization step. Moreover, these methods

determine the docking between the peptide and a prefixed

small part of the protein, namely the prefixed receptor or

binding site. Most of these methods allow for receptor

flexibility [4], [3], [2]. The minimization process is based

on a multi-start strategy that uses the steepest descent or

conjugate gradient methods as local minimization tools.

2. The Energy Model

As regards the mathematical model employed, we

assume that the protein tertiary structure and the peptide

primary residue sequence are known and we denote by N

and M the number of peptide and protein residues,

respectively.

In order to find the docking pocket and position of the

peptide onto the protein, we consider that peptide is flexible

in such a way that all its atoms have three degrees of

freedom. Moreover, the given peptide primary residue

sequence is not modified while calculating the docking

position by means of suitable simple nonlinear constraints.

Let

ELJ

ij =4 ij

[

(

ıij

rij

)

(

ıij

rij

)

]

,EC

ij =qiqj

0rij

be the Lennard-Jones and Coulomb potentials, respectively.

They represent the interaction between protein atom i and

peptide atom j and

denotes their distance. Hence, we

consider the following total potential energy function [3],

(

)



(

)

(1)

Open Journal of Applied Sciences

Supplement：2012 world Congress on Engineering and Technology

26 Cop

where the summation is calculated over all pairs (

N×M

)

of atoms.

3. Global Optimization Approach

Here we consider the problem of finding the docking

position of a given peptide by taking into account all the

given protein atoms, so that the binding site is not prefixed,

but directly determined by the algorithm. In order to avoid

that the peptide primary sequence is modified, we consider

the following constraints between the carbon alpha atoms of

the peptide residues:

i,i+1

=3.8,䌔i= 1,…,N í1,

(2)

and

i,k

2,䌔i= 1,…,N í2,k=i+2,…,N .

(3)

Therefore, our algorithm computes the final docking

position by searching for a global minimum of E(r) subject

to (2) and (3).

For the sake of clarity, let us consider the general

constrained nonlinear global optimization problem

globminf

(

)

s.t.x䌜F,

(4)

where

x䌜ξ

{

x䌜ξn:g

(

)

0

}

:ξ

ĺξ,

:ξ

ĺξ

Let us assume, as usual, that f and g are continuously

differentiable functions and that f (x) is radially unbounded

on the feasible set. Now, let

(

[Ȝ

)

(

)

ȜTg

(

)

with

Ȝ䌜ξm,

be the Lagrangian function for Problem (4).

A pair

(

x¿Ȝ¿

)

is a Karush-Kuhn-Tucker (KKT) pair if [5]

(

x¿

)

0,g

(

x¿

)

TȜ¿

=0, Ȝ¿0, 䌛L

(

x¿Ȝ¿

)

=0.

(5)

Under some mild regularity assumptions on the constraint

functions, if

is a local minimum point of Problem (4),

we have [5]:

Proposition 1: Let

be a local minimum of Problem

(4). Then, a vector

䌜ξ

of KKT multipliers exists such

that

(

x¿Ȝ¿

)

is a KKT pair. Furthermore, the hessian of the

Lagrangian function

䌛x

2L(x*,Ȝ*)

is positive definite.

Finally, we assume that a local optimization routine is

available, namely, a routine that, given an arbitrary starting

point x, provides a local minimum of Problem (4). More in

particular, as local optimization engine we use the

augmented Lagrangian algorithm ALGENCAN [6], [7].

A.The filling function technique

Starting from a point randomly chosen in

ξn,

let

be the local minimum reached by the local optimization

routine and

Ȝ0䌜ξm

be the vector of corresponding KKT

multipliers.

In order to search over the various minima for finding

that corresponding to the lowest f value, we have to escape

the basin of attraction of the current minimum

x0.

To do

this, we consider the quadratic model of Problem (4) in

x0,

that is,

(

x;x

)

(

xíx

)

(

xíx

)

(6)

where

is the Hessian of the Lagrangian function for

Problem (4) computed at

Then, we construct the

following gaussian-based function

(

x;x

)

=ȕ

1íexp

(

íĮT

(

x;x

)

íȕ

(7)

whe r e

and

are positive scalars, and we consider the

new function obtained by addingto the original

function

(

)

ࡂ

(

x;x

)

(

)

ĳ

(

x;x

)

(8)

Since

(

)

we have

(

)

=+

so that

ࡂ

(

)

=+.

Moreover, since

(

x;x0

)

>0, for allx,

ࡂ

(

x;x0

)

(

)

,䌔x.

Then,

(

x;x

)

ĺ0, and ࡂf

(

x;x

)

ĺf

(

)

䌹xíx0䌹ĺ+,

that is far from

in any direction.

Therefore, function (8) is substantially the original

function modified only locally, i.e., obtained by “filling” f

within a neighborhood of

x0,

whose aplitude can be varied

by taking different values of the parameter

Į.

Thus, by applying the local search routine tostarting

from a point near

x0,

we reach a minimum point

ࡂx

which

can not be

and which is not a minimum of f.

Now, by reapplying the local search to the original

Cop

function f (x) starting from

ࡂx,

either the same point

reached, or a new minimum point is found.

In the first case, the value of

is not sufficiently low

to escape the basin of attraction of

x0,

and a lower

value

is needed.

B.The global optimization procedure

The global optimization algorithm can be summarized in

the following scheme.

Global Optimization Algorithm (GOAL)

Data.

s0,ǻV ׫,

integers

p10,

and

R1.

Step 0. Set

r= 1

and

Ȝ 1.

Step 1. Generate at random a

point

x䌜ξ

compute

(

Ѻx

)

and apply the local

search routine for minimizing f(x). Let

be the

minimum reached; set

(

)

0 .

Step 2. If

r= 1,

set

min

(

)

min

else,

䌹x0íx

(

)

䌹<׫,

for an

index

k=1,…,rí1,

set

Ȝ Ȝ1,

and if

Ȝr

go to

Step 1; otherwise STOP.

Step 3. Set

i= 1

and

fl=f

(

Ѻx

)

Step 4. Compute the parameters(see (9) below),

whe r e

s= s

(

ií1

)

ǻV .

Step 5. Starting

fro m

ǻ[minimize

ࡂ

(

x;x

)

Let

ࡂx

be the

minimum reached; minimize f(x) starting from

ࡂx

and

let

be the minimum obtained.

Step 6.

䌹x

íx

(

)

䌹>׫,for allk=1,…,r,

and

fl<f

(

)

,set fl=f

(

)

,and xl=xi.Seti=i+ 1,

and if

ip

go to Step 4.

Step 7.

fl<f min ,set fmin=f l,and xmin =xl.

r<R, se tr=r+ 1;otherwise STOP.

(

Ѻx

)

,then set x

(

)

,and go to Step 3;

otherwise, set

Ȝ 1and go to Step 1.

The value of

which is the lowest distance between

two minimum points over which they are considered

distinct, should be chosen taking into account the stopping

criterion of the local search routine.

Note that the procedure, differently from the classical

multistart algorithm, terminates after the prefixed maximum

number R of restarts have been performed, regardless the

number of initial points chosen at random. Thus, it is even

possible that a new restarting minimum point is always

found, i.e., from Step 6 the algorithm never returns to Step 1,

so that only one initial random point is employed.

As regards the parameters

Įand ȕ

in function (7), in

order to establish the amplitude of the neighborhood of a

minimum xo where the function f is filled, we take

Į 9



3s,

(9)

where s is the distance from

x0.

Thus, the prefixed

distance s corresponds to

3ı

    

deviation of the Gaussian function in (6). The procedure

employs p increasing values of the prefixed distance s, and

correspondingly the function f is filled within

neighborhoods of

larger and larger.

4. Preliminary Results

In this Section we present some preliminary numerical

results and comparison with the well-known software

package for molecular docking PatchDock [8], [9] in terms

of computed protein-peptide dockings for 23 protein-

peptide pairs. We implemented our method in double

precision Fortran90 and run the code on an Intel core 2 duo

processor with 4GB Ram under Linux operatig system by

usin g

ǻV V

=0.1, ׫=10

í3

,p= 25and R=50.

We applied our method to proteins in complex with

specific ligands which are taken from the PDB [10]. The

obtained results are summarized in Table I where we report,

for each protein-peptide pair: (a) the name of the PDB entry;

(b) the number of residues, NPE and MPR, composing,

respectively, the peptide and the protein; (c) the avarage

RMSD (root mean square distance) in Angstrom (A)

between the computed and the known peptide docking

position obtained by running our code (GOAL) and

PatchDock from ten randomly chosen starting peptide

positions.

In the table we use boldface to highlight a success. As it

can be seen, GOAL outperforms PatchDock on 19 out of 23

pairs.

DISTANCES BETWEEN COMPUTED AND KNOWN POSITIONS

GOAL PatchDock

PDB nameNPEMP

RMSD RMSD

1A30(A)

1A30(B)

1AWQ

1I31

1G3F

1VWG

1AB9

1BE9

1GUX

142

9.7270784

10.265117

8.6928129

7.7272010

6.9038916

8.6765423

7.9983487

6.5084529

9.7264299

15.969474

24.954113

22.708910

6.3274 45

7.6029 35

11.081827

32.219646

32.202145

28 Cop

2FIB

1BXL

1DUZ

1F95

1YCQ

1EG4

1IO6

1CKA

1ELW

2SEB

1CE1

1PAU

1EVH

1BC5

250

219

6.9589953

5.6981535

4.5037251

7.3566127

5.9804459

7.1259413

2.7675009

5.8080659

4.2639108

10.671 972

19.134 851

17.125 254

11.54 442 9

18.764 696

9.243341

3.931751

25.406809

3.985806

24.322502

18.821020

5.870293

59.454960

23.193996

31.363981

28.343372

52.701431

15.931898

37.939034

As we can see, the method is able to guess the peptide

docking position with a maximum r.m.s.d of 10.7A (for

2SEB), meaning that the region of correct binding has been

located by the method. Then, it is reasonable to think that

the computed position can be further refined to the exact

position, by allowing for receptor flexibility, applying, for

instance, a proviously proposed tool (e.g. DynaDOCK [3],

AutoDOCK [11], FDS [2]) which is the subject of ongoing

work.

5. Conclusions

In the paper we present a global optimization method

based on a gaussian filling function applied to the protein-

peptide docking problem. Indeed, the problem is defined as

that of minimizing the potential energy function (1) subject

to some constraints necessary to preserve the peptide

primary sequence. The preliminary numerical results

obtained, in terms of r.m.s.d. values, show viability of the

proposed approach.

We are currently trying to improve the approach by

considering more accurate potential energy functions.

Moreover, we are attempting to exploit to a greater extent

the chemical-physical properties of the atoms composing

the peptide and the given protein, and to use some

refinement tools for improving the computed final docking

position.

REFERENCES

[1] F. Lampariello, “A filling function method for continuos

unconstrained global optimization: application to morse

clusters,” IASI-CNR, Tech. Rep. R.615, 2004.

[2] R. D. Taylor, P. J. Jewsbury, and J. W. Essex,

“FDS:Flexible ligand and receptor docking with a

continuum solvent model and soft-core energy

function,” Journal of Computational Chemistry, vol. 24,

pp. 1637–1656, 2003.

[3] I. Antes, “Dynadock: A new molecular dynamics-based

algorithm for proteinpeptide docking including receptor

flexibility,” Proteins: Structure, Function and

Bioinformatics, vol. 78, pp. 1084–1104, 2010.

[4] J. Apostolakis, A. Pluckthun, and A. Caflisch, “Docking

small ligands in flexible binding sites,” Journal of

Computational Chemistry, vol. 19, pp. 21–37, 1998.

[5] D. Bertsekas, Nonlinear Programming. Massachusetts,

USA: Athena Scientific, 1999.

[6] R. Andreani, E. Birgin, J. Martinez, and M. Schuverdt,

“On augmented lagrangian methods with general

lower-level constraints,” SIAM Journal on

Optimization, vol. 18, pp. 1286–1309, 2007.

[7] ——, “Augmented lagrangian methods under the

constant positive linear dependence constraint

qualification,” Mathematical Programming, vol. 111,

pp. 5–32, 2008.

[8] D. Duhovny, R. Nussinov, and H. Wolfson, “Efficient

unbound docking of rigid molecules,” in Proceedings

of the 2’nd Workshop on Algorithms in

Bioinformatics(WABI) Rome, Italy, ser. Lecture Notes

in Computer Science 2452, Gusfield et al., Ed. Springer

Verlag, 2002, pp. 185–200.

[9] D. Schneidman-Duhovny, Y. Inbar, R. Nussinov, and H.

Wolfson, “Patchdock and symmdock: servers for rigid

and symmetric docking,” Nucleic Acids Research, vol.

33, pp. 363–367, 2005.

[10] H. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. Bhat,

H. Weissig, I. Shindyalov, and P. Bourne, “The protein

data bank,” Nucleic Acids Research, vol. 28, pp. 235–

242, 2000.

[11] G. Morris, D. Goodsell, R. Halliday, R. Huey, W. Hart,

R. Belew, and A. Olson, “Automated docking using a

lamarckian genetic algorithm and an empirical binding

free energy function,” Journal of Computational

Chemistry, vol. 19, pp. 1639–1662, 1998.

Cop