Visualization of protein structure relationships using constrained twin kernel embedding

doi:10.4236/jbise.2008.12022

Paper Menu >>

Journal Menu >>

J. Biomedical Science and Engineering, 2008, 1, 133-140

Published Online August 2008 in SciRes. http://www.srpublishing.org/journal/jbise JBiSE

Visualization of protein structure relationships us-

ing constrained twin kernel embedding

Yi Guo1, Jun-Bin Gao2, Paul Wing Hing Kwan1 & Kevin Xinsheng Hou3

1School of Science and Technology University of New England, Armidale, NSW 2351, Australia. 2School of Computer Science Charles Sturt University,

Bathurst, NSW 2795, Australia. 3School of Biological, Biomedical and Molecular Sciences University of New England, Armidale, NSW 2351, Australia.

Correspondence should be addressed to Yi Guo (yguo4@turing.une.edu.au), Jun-Bin Gao (jbgao@csu.edu.au), Paul W. Kwan (kwan@turing.une.edu.au), and

Kevin X. H ou ( xhou@une.edu.au).

ABSTRACT

In this paper, a recently proposed dimensional-

ity reduction method called Twin Kernel Em-

bedding (TKE) [10] is applied in 2-dimensional

visualization of protein structure relationships.

By matching the similarity measures of the in-

put and the embedding spaces expressed by

their respective kernels, TKE ensures that both

local and global proximity information are pre-

served simultaneously. Experiments conducted

on a subset of the Structural Classification Of

Protein (SCOP) database confirmed the effec-

tiveness of TKE in preserving the original rela-

tionships among protein structures in the low er

dimensional embedding according to their simi-

larities. This result is expected to benefit sub-

sequent analyses of protein structures and their

functions.

1. INTRODUCTION

Recent years have seen phenomenal advances in dimen-

sionality reduction (DR) methods that are widely applied

in bioinformatics [26, 14, 6, 16], biometrics [19, 4, 15,

17], robotics [11], etc. The rapid growth of DR methods

stems from the need to reduce the complexity of the

problem at hand. The target of these methods is mainly

to find the corresponding counterparts of the input data

in a much lower dimensional space without incurring

significant information loss. The low dimensional repre-

sentation can be used in subsequent procedures such as

classification, pattern recognition, and so on. For exam-

ple, a medium sized protein structure typically has a few

thousands of degrees of freedom which is naturally re-

siding in very high dimensional space. It causes the so-

called “curse of dimensionality” problem which will not

only drastically increase the computational complexity of

the learning algorithms but also require large storage

space, leading to very slow indexing and searching speed

in a large scale database in the sequel. Through DR,

most of the redundant dimensions can be removed and

the degrees of freedom left are then input to the subse-

quent discriminant and classification tasks with consid-

erable simplication.

Another advantage of using DR is the 2- or 3-

dimensional mappings of the original data can be visual-

ized in an Euclidean space that can facilitate interpreting

the relationships among data by the researchers. Nor-

mally, the relationships among a set of protein structures

are typically represented in the form of trees derived by

hierarchical clustering. However, this representation

only provides some hints on the evolutionary distances

between protein structures. This limitation motivates our

applying to the application of DR methods in visualizing

the similarity relationships among protein structures.

In this paper, we will propose a new DR algorithm

called Constrained Twin Kernel Embedding (CTKE)

based on Twin Kernel Embedding (TKE) [10] to achieve

the above target. To provide the necessary background

knowledge, we will first give a brief review of DR meth-

ods on protein structures in the next section, followed by

an introduction of TKE and related topics involved in

this new algorithm. Then the CTKE algorithm will be

introduced, which integrates the similarity matching by

TKE with the objective function used in LS-SVM [23].

Experiments on real protein structure data will be pre-

sented and finally we summarize this paper with a con-

clusion.

2. THE RELATED WORKS

DR methods can be categorized into Linear DR methods

(LDR) such as Principal Component Analysis (PCA)

[13], Linear Discriminant Analysis (LDA) [7] and

NonLinear DR methods (NLDR) such as ISOMAP [25],

Laplacian Eigenmaps (LE) [3], Locally Linear Embed-

ding (LLE) [20], etc. LDR methods have been widely

used in bioinfo rmatics due to their simplicity. For exam-

ple, Teodoro et al. [27] applied PCA to transform the

original high dimensional protein motion data into a

lower dimensional representation th at captures the domi-

nant modes of motions of the protein. However, the line-

arity assumption on which linear methods are con-

structed does not hold in most cases. In [5], Das et al.

134 Yi Guo et al. / J. Biomedical Science and Engineering 1 (2008) 133-140

successfully projected the folding free-energy on a few

relevant coordinates by using a typical nonlinear method,

ISOMAP, to correctly identify the transition-state en-

semble of the reaction based on the fact that empirical

reaction coordinates routinely used in protein folding

studies cannot be reduced to a linear combination of the

Cartesian coordinates.

Because of the power of the NLDR methods, they

have also been applied to visualizing the relationships

between protein structures. Hanke and Reich [12] em-

ployed the Kohonen self organizing maps, a special form

of neural networks as a visualization tool for the analy-

ses of protein structure similarity by converting the se-

quences into a characteristic signal matrix. In [1], by

using the pairwise similarity index between two se-

quences, Sammon maps projected the sequences onto a

display plane in such a way that the Euclidean distances

between the images approximate as closely as possible

the corresponding values in the original sequence space.

The metric used to measure the similarity between two

protein structures was based on the individual residue

similarities derived from a series of amino acid exchan ge

matrices. Further, a modified nonlinear Sammon projec-

tion was developed in [2] to display the relationships

among protein structures based on their amino acid com-

position.

Recently, a new method called Stochastic Proximity

Preserving (SPE) was introduced into this field by Far-

num et al. in [6]. SPE preserves only the local relation-

ships among closely related sequences to avoid a draw-

back of those global methods such as MDS that underes-

timates the proximity of sequences since all pairwise

distances are included in the algorithm, leading to erro-

neous results. To emphasize the proximity, SPE first

applies a neighborhood filtering procedure to the similar-

ity matrix (the similarity metric used in SPE is identical

to that used in [1]). Then the filtered similarity matrix is

input into the Sammon’s nonlinear mapping to obtain the

final result.

From the discussion above, we can clearly see that

there are three important components in DR:

dis/similarity metric for the input data (protein structures

in this paper), the objective function (the core of the al-

gorithm) and the dis/similarity metric for images (the

corresponding low dimensional representation of the

protein structures) which is usually the Euclidean dis-

tance. These DR methods are trying to preserve the simi-

larity metric of the protein structures as much as possible

and reproduce it in a human interpretable space. The

dis/similarity metrics used in those methods mentioned

above are based on proteins and their structure-related

evaluators while the objective functions are from

Sammon’s mapping.

In computational biology, the similarity metric known

as kernel functions that are both powerful and promising

is gaining much attention. An important advantage of a

kernel function is that the form of the input data does not

have to be vectorial. Any structured data like protein

structures can be properly processed by specially de-

signed kernel functions. This advantage avoids the in-

formation loss during vectorization. TKE is constructed

on the basis of this kind of similarity metric. The objec-

tive function is totally different from Sammon’s mapping

and furthermore the dis/similarity metric for images is

not limited to simple Euclidean distance, but a kernel

function to capture the nonlinearity.

3. TWIN KERNEL EMBEDDING

Without loss of generality, the following notations will

be adopted. The data in the input space y are denoted

by i

y (,,1iN

…

) while i

( ,,1iN=… ) their embed-

dings in a low dimensional space or the so-called latent

space

. Notice that here the i

y will be the protein

structures. The term “embeddings” is from the manifold

learning literature which means the images of the input

data or equivalently the embedded data. In addition,

Yand

will be used to denote respectively the set of

input objects and the set of embedded objects. If the ob-

jects were vectorial, Y (and

) would denote a matrix

consisting of rows of vectors. Furthermore, ·abdenotes

the inner product of two vectors aand b.

The Twin Kernel Embedding (TKE) preserves the

similarity structure of input data in the latent space by

matching the similarity relations represented by two ker-

nel Gram matrices, i.e. one for the input data and the

other for their embeddings by simply minimizing the

objective function

xy VecKVecK ⋅

−

, (1)

where Vec is the vec operator on matrix (to stack all the

columns of the matrix to make a long vector) and

and

are the kernel Gram matrices derived from

valid Mercer kernel functions (,)

k⋅⋅ and (,)

⋅

⋅ [21]

defined on the input data and embeddings respectively.

The idea is to preserve the similarities among the input

data and reproduce them in the lower dimensional latent

space expressed again in similarities among embeddings.

To make this point clearer, we can simply regard

-ecK VecVKas a linear kernel (liner kernel is defined

as (,)kabab

⋅) which is a measure of similarity of the

variables involved in the kernel function. The larger the

value of the kernel, the more similar these two variables

are. As a result, we minimize (1) to make

and y

similar as possible.

To avoid any trivial solutions to (1), two regulariza-

tion terms on the kernel and embeddings are introduced

and the objective function in (1) becomes

xkxxx

L =-tr(KK)+ltr(KK)+ltr(XX) (2)

where we use the fact that yxyx

×Vec =tr(VecKKKK).

The second term is a ridge regularizeron the kernel to

make sure that the norm of the kernel is controlled. This

Yi Guo et al. / J. Biomedical Science and Engineering 1 (2008) 133-140 135

can avoid solutions that simply let the elements in

to infinity. The third term imposes a heavy penalty on

too large a norm for the embeddings which ensure that

their coordinates are relatively small. k

and

are tun-

able parameters to control the strength of the regulariza-

tion and are assumed to be positive.

In order to capture the nonlinear structure,

(,)

k⋅⋅ should be chosen to be nonlinear. Normally, we

use the RBF kernel 2

xi jx-x

k(x ,x)= gexp(-s)

‖‖

(3)

where

and

are positive hyperparameters. Nonethe-

less, other kernels can also be applied here. The selection

of RBF kernel is for its connection to the Euclidean dis-

tance which has a geometric understanding of the em-

beddings. There is no closed form solution for Xand

hence an optimization procedure like gradient descent

based algorithm for optimization should be employed

provided that (,)

k⋅⋅is differentiable. The initialization

of Xis also required to start the optimization process.

KLE [9], KPCA [22] and other methods that can work

with kernels could be utilized here for initializing

The dimension of the embeddings which is normally 2 is

assigned according to the application. A by-product of

this optimization process is that we can get the optimal

hyper-parameters (such as

and

if RBF kernel is used)

of the kernel function (,)

k⋅⋅as well. It ensures that the

kernel we pick up is well adjusted.

TKE is designed to preserve locality and non-locality

at the same time. This is done by the filtering of the en-

tries in y

. Not all the entries remain the same in the

optimization process but those that convey most of the

similarity information of the input data. This filtering

process is fulfilled by performing k-nearest neighbor

selecting procedure on y

. Given an object i

y in the

input space, only those objects whose similarities (in the

sense of kernel values) to i

y are in the k nearest

neighbors that are selected to retain their original values

while all others are set to 0. The variable k (>1) in k-

nearest neighboring controls the neighborhood that the

algorithm will preserve. It can be interpreted as con-

structing a weighted adjacency graph which performs in

feature space and the weights on edges are evaluated by

a kernel as that in KLE. Similar procedure is involved in

SPE as well which uses the neighborhood radius to en-

sure the proximity preserving property. Because TKE

tries to match

to y

and RBF kernel (3) cannot give

a value of 0 except that two points in the latent space are

very far apart, TKE seeks a solution that keeps the points

in the same neighborhood close while makes the points

not in the same neighborhood be very far. However,

TKE also works without filtering in which case TKE

will preserve all pairwise similarities and will become a

global approach simply.

In addition to the fact that TKE outperforms other

methods such as KPCA, KLE etc, an elegant feature of

TKE is that it uses only the pairwise similarities since in

its objective function, only the kernel Gram matrix of the

input data is required. Through TKE, any kind of data

can be visualized in lower dimensional space as long as

an appropriate kernel is defined for them. As a result, a

kernel Gram matrix on the input data y

K and an initiali-

zation for X will be adequate for TKE to find the opti-

mal embeddings.

4. CONSTRAINED TWIN KERNEL EM-

BEDDING

It is clear from observing (2) that there is no explicit

connections between input data and their corresponding

embeddings in TKE except the similarity preserving. As

such, the information hidden in the input data is ne-

glected. For example, if the input data actually stay on or

near to a smooth manifold embedded in the ambient

space, the location of the input data can be explored to

predict the coordinates of the embeddings on the mani-

fold. Furthermore, TKE can only find the optimal em-

beddings for currently presented data as we can see from

its objective function. To address these problems, we

introduce the constraints reflecting the relationship be-

tween input data and embeddings into TKE by following

steps. We first define a mapping function :

yx→and

then incorporate it into the objective functional of TKE

as regularization terms. Finally the optimal embeddings

X and the

will be searched via conjugate gradients

algorithm.

We can start from the kernel feature mapping directly

and incorporate it as the core part of the LS-SVM (the

dual form of the objective function of LS-SVM) into

TKE as what has been done in [24]. We minimize the

following objective function similar to that of LS-SVM

with equality constraints

jij

j=1 i

=ww+e

∑

∑ (4)

. ()

ijj jiij

.t xye

ωϕ

+ (5)

corresponding to the maximum margin (the first term)

and least square errors (the second term) in (4) where

and

are adjustable parameters.

’s are the feature

mappings which map the input i

y into Hilbert space

where the inner product is defined.

is a column vec-

tor having the same dimension as the Hilbert space. We

can regard the i

in (5) as the projection of ()

onto

a subspace parameterized by

’s. Because the differ-

ence of the constraints, it does not have the same geo-

metrical interpretation of LS-SVM because the original

constraints are inequalities reflecting the correct classifi-

cation hyperplanes while here they are just feature map-

136 Yi Guo et al. / J. Biomedical Science and Engineering 1 (2008) 133-140

ping which builds the relation between input data i

y and

its counterpart i

in low dimensional space. In (4), we

are minimizing the error ij

e in reconstruction of i

es-

sentially however it should be done with ij

properly

constrained. This happens to have the form of the LS-

SVM objective function. To solve (4) with the equality

constraints (5), we can use Lagrange multipliers as

dT2

jijijijjjiij

j=1ij ij

=ww+e+a(x-wf(y)-e)

∑∑∑ (6)

From the saddle points, 0

∂

, 0

∂=

∂ we have

() 0()

ijj ijijj i

jii

ij ijijij

Lee

υωα ϕωα ϕ

ωυ

ηα α

∂=− =⇒=

∂

∂=−=⇒=

∂

∑∑

Substitute them back into (6) and eliminate

and ij

we have the dual problem to be maximized according to

the min-max duality

ijji ijjiij

j=1 iiij

ijijijjij iij

ij i

jjjijij

jijij

u11 h1

L=(af (y )) (af(y ))+(a)

2u u 2h

+a {x-(af (y ))f(y )-a}

=-aK a-a+a x

2u 2h

∑∑∑ ∑

∑∑

∑∑∑

(7)

where 1T

(,, )

jj j

αα α

=… , ()()()

mij

Tmi

yyky,y

ϕϕ

to which the “kernel trick” applies. (,)

k⋅⋅ is the kernel

associated with the kernel mapping and

is the Gram

matrix derived from (,)

k⋅⋅ accordingly. We then maxi-

mize the above dual problem with respect to ij

instead

and ij

e. From the discussion above we see that

the ij

’s are free variables. To limit the choice of them,

we combine (7) with the objective functional of TKE to

incorporate the similarity preserving as

yij xi jkxi jxii

i,jij i

jjij ij

Tij

jiji

L=-k (y,y)k(x,x)+lk (x,x)+lxx

+aKa+a-ax

2u 2h

∑∑∑

∑∑∑ (8)

where we turn the maximization of (7) to minimization

aligned with the TKE objective functional. Here we see

that the terms related to i

y are expressed by kernel

(,)

k⋅⋅ . Therefore, this revised objective function is still

non-vectorial data applicable. Again we express (8) into

matrix form to facilitate the differentiation and let

(,) (,)

kk⋅⋅ =⋅⋅ for simplicity

xy k xx

L=-tr[KK+ lK+ lXX

+tr[AKA]+tr[A A]-tr[A X]

2u 2

]tr[]t[

(9)

where

11 1d

N1 Nd

a

A= MM

a

⎡

⎤

⎢

⎥

⎢

⎥

⎢

⎥

⎣

⎦

Hence we minimize L in (8) with respect to

and

kernel hyperparameters of (,)

k⋅⋅. If we substitute the

saddle point solution back into the equality constraints

(5), the following mapping function is handy to predict

new input samples

ijmj ymiij

=ak(y,y)+a

∑ (10)

The errors 1ij

can be neglected since the values of the

errors are very small compared with

after optimiza-

tion.

To apply the conjugate gradient algorithm, the deriva-

tives of L with respect to X and Aare given by

,and, 2

xkx y

LLK L

K-K

XKX K

∂∂∂ ∂

=− =

∂∂∂ ∂ (11)

AA-X

υη

∂=+

∂ (12)

is still initialized by KPCA or KLE to obtain pure

non-vectorial data applicability. Initial

is from the

solution of 0

∂

∂ after

is known. We have

(-1

KI)X

υη

=+ . It implies that we could alternately

update

and

in optimization, however we still use

conjugate gradient algorithm to update them at the same

time. Because the mapping function defined between

input space and latent space acts as constraints, we call

this algorithm Constrained TKE (CTKE). It is noticeable

that in CTKE, we use y

to construct the mapping

function, so we do not have to filter y

before com-

mencing the algorithm.

5. EXPERIMENTAL RESULTS

Experiments were conducted on the SCOP (Structural

Classification Of Protein). This database is available at

http://scop.mrc-lmb.cam.ac.uk/scop/. The database

comes with a detailed and comprehensive description of

the structural and evolutionary relationships of the pro-

teins of known structure. 292 protein structures from

different protein superfamilies and families are extracted

for the test. The kernels we used are from the family of

the so-called alignment kernels whose thorough analyses

can be found in [18]. The corresponding kernel Gram

matrices are available on the website of the paper as

supplements and were used directly in our experiments.

5.1 Parameters Setting

Both CTKE and TKE have parameters to be determined

beforehand. Through empirical analyses (performed a set

Yi Guo et al. / J. Biomedical Science and Engineering 1 (2008) 133-140 137

of experiments on the same data set varying only the

parameters), we found these algorithms are not sensitive

to the choice of the parameters, so long as the conjugate

gradient optimization can be carried out without prema-

ture early stop. So we use the following parameters in

the 10k= in k nearest neighboring; for CTKE,

0.05

=, experiments. For TKE, 0.005

0.001

and 0.01

,0.5

== . The minimize

tion will stop after 1000 iterations or when consecutive

update of the objective function is less than 7

10-.(,)

⋅

is the RBF kernel and initialization is done by KPCA for

both CTKE and TKE.

(a) CTKE with MAMMOTH kernel

(b) TKE with MAMMOTH kernel

Figure 1. Visualization results of protein structures.

138 Yi Guo et al. / J. Biomedical Science and Engineering 1 (2008) 133-140

5.2 Proteins on 2-D Plane

The proteins from the same families are expected to be

close in the 2-dimensional space with overlappings indi-

cating similar proteins but actually from different fami-

lies. The results are plotted in Figure 1. The results of

other methods (kernel applicable methods) are also pre

sented for comparison in Figure 2. Each point (denoted

as a shape in the figures) represents a protein. The same

shapes with the same colors are the proteins from same

families while the same shapes with different colors rep-

resent the proteins from different families but of thesame

superfamilies. All the figures in this paper share the

same legends as those in Figure 1 (b).

Both TKE and CTKE reveal the fact that proteins

from the same families congregate together as clusters

while KPCA and KLE fail to reveal it. For example, in

Figure 1 (a) and (b), almost all of the proteins from the

globin family gathered at the bottom left corner indicat-

ing similar structures. Interestingly, CTKE and TKE also

reveals the fact that the proteins from the same super-

family but different families are similar in structure,

which is reflected in the 2-dimensional plane that the

corresponding groups (families) are close if they are in

the same superfamily. For instance, note that the proteins

from ferritin and ribonucleotide reductase-like families

(blue diamonds and violet diamond respectively in the

figure), they are close in the group level. SPE has com-

parable performance visually. But the overlapping in the

middle shows that it cannot distinguish some families

clearly.

In order to further quantify the results, we use the “pu-

rity” [8] to evaluate some of these methods. It uses the

fraction of the number of samples from the same class as

given point in a neighborhood with size n. The purity is

the average of the fraction over all points. The higher the

purity, the better the quality of the clusters. Intuitively,

this standard provides an objective judgement from the

classification point of view and the method with better

(a) KPCA (b) KLE (k = 50)

Figure 2. The result of other methods with MAMMOTH kernel. Like TKE, SPE can work globally and locally depending on

whether the kernel Gram matrix is filtered first by k-nearest neighboring. SPE global worked with the whole kernel Gram matrix

and SPE local filtered Ky with 100-nearest neighboring in this experiment

Yi Guo et al. / J. Biomedical Science and Engineering 1 (2008) 133-140 139

Figure 3. Comparison of different methods using purity.

purity has the potential to achieve better classification

rate. It is noticeable that for SPE and KLE with parame-

ter k, we did multiple experiments to choose the optimal

value of k corresponding to the largest purity when the

size of neighborhood is 1. We found that for SPE the

larger the k, the better the result. Specifically, when k

equals the number of the data, i.e. n o filtering at all, SPE

turns out to be a nonlinear MDS. As we can see from

Figure 3, CTKE and TKE have higher purity than others.

The average purity of CTKE is 0.5946 and TKE 0.6135.

It shows that CTKE has very close performance to TKE.

However, the advantage of the CTKE is its ability to

predict novel samples because of the mapping function

defined explicitly between input space and latent space.

This mechanism broadens the applicability of this algo-

rithm greatly to classification, identification etc. It also

provides a tool to explore the manifold formed by data.

6 CONCLUSIONS

In this paper, we visualized the similarity relationships

among protein structures using constrained Twin Kernel

Embedding (CTKE) which is constructed on the original

TKE [10]. It has comparable performance to that of TKE

but possesses the ability to predict the embeddings for

novel samples. Because CTKE implements a mapping

function from input space to latent space, the information

among input data is further exploited. CTKE also has the

similarity preserving property as TKE does and is purely

non-vectorial data applicable since the mapping function

comes from the feature mapping and expressed as kernel

function eventually. Moreover, there is no k-nearest

neighbor filtering in CTKE and hence avoid choosing

another parameter which is common across other DR

algorithms. From the experiments on proteins, we have

seen that CTKE preserves the similarity relationships

among the protein structures and reproduces them in a

much lower dimensional space, allowing easy interpreta-

tion by researchers. This algorithm is promising as it can

be further applied to the study of the evolution of protein

structure and the prediction of proteins functions.

ACKNOWLEDGEMENTS

This work is supported by the National Science Foundation of China

(NSFC 60373090), the ARC DP Development Grant from Charles Sturt

University and the University Research Grant from the University of

New England.

REFERENCES

[1] Dimitris K. Agrafiotis. (1997) A new method for analyzing protein

sequence relationships based on sammon maps. Protein Science,

6(2):287–293.

[2] Izydor Apostol and Wojciech Szpankowski. (1999) Indexing and

mapping of proteins using a modified nonlinear sammon projection.

Journal of Computational Chemistry, 20 (10):1049– 1059.

[3] Mikhail Belkin and Partha Niyogi. (2003) Laplacian eigenmaps for

dimensionality reduction and data representation. Neural Computa-

tion, 15(6):1373–1396.

[4] Samarasena Buchala, Neil Davey, Ray J. Frank, and Tim M. Gale.

(2004) Dimensionality reduction of face images for gender classifi-

cation. In Proceedings of 2nd International IEEE Conference on In-

telligent Systems, volume 1, pages 88–93.

[5] Payel Das, Mark Moll, Hern´an Stamati, Lydia E. Kavraki, and

Cecilia Clementi. (2006) L owd imensional free energy landscapes of

protein folding reactions by nonlinear dimensionality reduction. In

Proceedings of the National Academy of Sciences, volume 103,

pages 9885– 9890, USA .

[6] Michael A. Farnum, Huafeng Xu, and Dimitris K. Agrafiotis. (2003)

Exploring the nonlinear geometry of protein homology. Protein

Science, 12(1604-1612) .

[7] Ronald A. Fisher. (1936) The use of multiple measurements in

taxonomic problems. Annals of Eugenics, 7:179–188 .

[8] Amir Globerson, Gal Chechik, Fernando Pereira, and Naftali

Tishby.(2005) Euclidean embedding of co-occurrence data. In Law-

140 Yi Guo et al. / J. Biomedical Science and Engineering 1 (2008) 133-140

rence K. Saul, YairWeiss, and L´eon Bottou, editors, Advances in

Neural Information Processing Systems 17, pages 497–504. MIT

Press, Cambridge, MA .

[9] Yi Guo, Junbin Gao, and Paul W. Kwan. (2006) Kernel Laplacian

eigenmaps for visualization of non-vectorial data. In Lecture Notes

on Artificial Intelligence, volume 4304, pages 1179– 1183.

[10] Yi Guo, Junbin Gao, and Paul W. Kwan. (2008) Twin kernel em-

bedding. IEEE Transaction ofPattern Analysis and Machine Intel-

ligence, submitted.

[11] Jihun Ham, Yuanqing Lin, and Daniel. D. Lee. (2005) Learning

nonlinear appearance manifolds for robot localization. In

IEEE/RSJ International Conference on Intelligent Robots and Sys-

tems, pages 2971– 2976.

[12] Jens Hanke and Jens G. Reich. (1996) Kohonen map as a visuali-

zation tool for the analysis of protein sequences: multiple align-

ments, domains and segments of secondary structures. Bioinfor-

matics, 12(6):447–454.

[13] M. Jolliffe. (1986) Principal Component Analysis. Springer-

Verlag, New York.

[14] Philip M. Kim and Bruce Tidor. (2003) Subsystem identification

through dimensionality reduction of large-scale gene expression

data. Genome Research, 13:1706–1718.

[15] Nathan Mekuz, Christian Bauckhage, and John K. Tsotsos. (2005)

Face recognition with weighted locally linear embedding. In Pro-

ceedings of the 2nd Canadian Conference on Computer and Robot

Vision, pages 290–296.

[16] Oleg Okun, Helen Priisalu, and Alexessander Alves. (2005) Fast

non-negative dimensionality reduction for protein fold recognition.

In ECML, pages 665–672.

[17] Zhengjun Pan, Rod Adams, and Hamid Bolouri. (2000) Dimen-

sionality reduction of face images using discrete cosine transforms

for recognition. In IEEE Conference on Computer Vision and Pat-

tern Recognition. [18] Jian Qiu, Martial Hue, Asa Ben-Hur, Jean-

Philippe Vert, and William Stafford Noble. (2007) An alignment

kernel for protein structur es. In Bioinformatics.

[18] Bisser Raytchev, Ikushi Yoda, and Katsuhiko Sakaue. (2006)

Multi-view face recognition by nonlinear dimensionality reduction

and generalized linear models. In FGR ’06: Proceedings of the 7th

International Conference on Automatic Face and Gesture Recog-

nition (FGR06), pages 625–630, Washington, DC, USA.

[19] Sam T. Roweis and Lawrence K. Saul. (2000) Nonlinear dimen-

sionality reduction by locally linear embedding. Science,

290(22):2323–2326.

[20] B. Sch¨olkopf and A.J. Smola. (2002) Learning with Kernels:

Support Vector Machines, Regularization, Optimization, and Be-

yond. The MIT Press, Cambridge, MA.

[21] Bernhard Sch¨olkopf, Alexander J. Smola, and Klaus-Robert

M¨uller. (1998) Nonlinear component analysis as a kernel eigen-

value problem. Neural Computation, 10:1299–1319.

[22] J.A.K. Suykens and J. Vandewalle. (1999) Least squares support

vector machine class ifiers. Neural Processing Letters, 9:293–300.

[23] Johan A.K. Suykens. (2007) Data visualization and dimensionality

reduction using kernel maps with a reference point. Technical Re-

port 07-22, K.U. Leuven, ESAT-SCD/SISTA.

[24] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. (2000)

A global geometric framework for nonlinear dimensionality reduc-

tion. Science, 290(22):2319–2323.

[25] Miguel L. Teodoro, George N. Phillips Jr, and Lydia E. Kavraki.

(2002) A dimensionality reduction approach to modeling protein

flexibility. In International Conference on Computational Molecu-

lar Biology (RECOMB), pages 299–308.

[26] Miguel L. Teodoro, George N. Phillips Jr, and Lydia E. Kavraki.

(2003) Understanding protein flexibility through dimensionality

reduction. Journal of Computational Biology, 10(3-4):617–634