Engineering, 2013, 5, 530-533
http://dx.doi.org/10.4236/eng.2013.510B109 Published Online October 2013 (http://www.scirp.org/journal/eng)
Copyright © 2013 SciRes. ENG
Fuzzy Cluster Analysis of Alzheimer’s Disease-Related
Gene Sequences*
Jing Yang1#, Jiarui Si2, Xiaoxuan Gu1, Ouyan Shi2#
1School of Public Health, Tianjin Medical University, Tianjin, China
2School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
Email: #yangjing@tijmu.edu.cn, #shiouy@tijmu.edu.cn
Received 2013
ABSTRACT
The objective of this paper is to analyze the relationship among the interrelated gene sequences of Alzheimers disease
(AD). Further this paper will provide a study on genetic factor of the occurrence about Alzheimer’s disease, so as to
provide more information on the prevention of Alzheimers disease, the clinical diagnosis and gene therapy for Alzhei-
mers disease. The respective alignment of the Alzheimers disease interrelated gene sequences with those in The Na-
tional Center for Biotechnology Information (NCBI) database was studied, and the measurement relationship of these
sequences was identified and analyzed by the method of fuzzy cluster. The result of fuzzy cluster analysis indicates that
the gene sequences interrelated within one group is consistently having closer relationship within the group other than
in anothe r gr oup.
Keywords: Alzheimer’s Disease; Gene; mRNA; Sequence; Alignment; Fuzzy Cluster
1. Introduction
Alzheimers disease is a degenerative neurological dis-
order characterized by neural loss and brain lesio ns asso-
ciated with plaques containing large amounts of the beta/
A4 amyloid peptide. It has been identified in an emerging
multigene family. The more old people aged above 65
there are in China, the more old people have Alzhei-
mer’s disease. The cau se of this d isease is stil l not ful-
ly clear.
Mutations in presenilins are responsible for approx-
imately 40% of all early-onset familial Alzheimer disease
(FAD) cases in which a genetic cause has been identified
[1]. Missense mutations in the genes encoding amyloid
precursor protein (APP), presenilin-1 and presenilin-2
have been found to cause some forms of autosomal do-
minant early-onset Alzheimer disease. Autosomal domi-
nant point mutations in the APP gene are associated with
beta-amyloid peptide-related cerebral amyloid angiopa-
thy and Alzheimer’s disease [2]. In general, there are
more and more the results to drawn studiers attention
to genetic factor of the Alzheimer’s disease occur -
rence.
This paper is to study more genetic factor of the Alz-
heimer’s disease occurrence and provide information
about prevention, diagnosis and treatment in genetic lev-
el. The method is that fuzzy cluster analysis divides the
data of Alzheimers disease-related gene sequences into
groups such that the similar data objects belong to the
same cluster and the dissimilar data objects to different
clusters. The fuzzy clustering method is based on the
measure of distance. This measure is the score that re-
spective pair wise sequence alignment of interrelated
gene sequences with Alzheimers disease in NCBI data-
base. Fuzzy cluster analysis can yield useful information
on the intrinsic characters or property of this data. The
measurement relationship of these sequences was identi-
fied and analyzed by the fuzzy cluster.
In this paper, we aim to provide the results of clusters
(or groups) that are the Alzheimers disease associated
gene sequen ce s by different number of α-cuts (thre-
sholds). From fuzzy cluster point of view, the gene se-
quences in one group have been consisted to have close
relationship and similar functions and characters. This
may possibly be made use of information reference in
clinical diagnosis and treatment of the Alzheimers dis-
ease by the results of fuzzy cluster.
2. Materials and Methods
2.1. The Dataset
Through searching for keyword “Alzheimers” in Nu cleo-
tide database of The National Center for Biotechnology
Information (NCBI), there were 4 sequences to be found
*This work was supported by grant (30870791) from the National Nat-
ural Science Foundation of China and grant
(2011KZ87) from the
Scientific Research Foundation of Tianjin Bureau of Public Health.
#Corresponding authors.
J. YANG ET AL.
Copyright © 2013 SciRes. ENG
531
on March 2007 and 8 sequences to be found on May
2011. The 12 mRNA sequences were selected which
interrelated with Alzheimers disease and applied into
sequences alignment analysis. The sequences numbers of
identify (accession numbers in Nucleotide database) are
nm_005166; nm_001642; nm_001024807; nm_002704;
nm_023959;nm_080478; nm_017522; nm_033300; nm_
001018054; nm_004631; nm_001013018; nm_178003.
The set of all mRNA sequen ce snumbers is set M, i. e . M
= {nm_005166; nm_001642; nm_001024807; nm_
002704; nm_023959; nm_080478; nm_017522; nm_
033300; nm_001018054; nm_004631; nm_001013018;
nm_178003 }.
2.2. The Data Score Matrix
Let the data set M be defined as “X”. The set X consists
of 12 data points: X={x1, x2 ··· x12}, x1= nm_005166;
x2 = nm_001642; ···; x12 = nm_178003.
We performed the operation of the Smith-Waterman
algorithm is a member of the class of algorithms that
can calculate the best score on this sequence set X, by
using “EMBOSS Pairwise Alignment Algorithms
software (from:
http://www.ebi.ac.uk/emboss/align/index.html/).
We are trying to find the best region of similarity be-
tween two sequences, use the “Water” program. “Water”
program uses the Smith-Waterman algorithm (modified
for speed enhancements) to calculate the local alignment.
“Water” program finds an alignment with the maximum
possible score where the score of an alignment is equal
to the sum of the matches taken from the scoring ma-
trix.
There is a scoring matrix X of the sequencespair wise
alignment to be shown in the Figure 1.
2.3. The Fuzzy Clustering
In the fuzzy cluster the degree of belonging to a cluster is
quantified by means of membership function r(x, y). The
value of r(x, y) is belonged in the interval [0, 1].
The fuzzy relation matrix R (to be shown in Figure 2)
was obtained by method of “Maxim and Minim” [3] for
above the scoring matrix X, as in (1).
.
),(
),(
),(
),,,2,1,(,)(
1
1
=
=
==
==
n
kjkik
n
kjkik
jiij
ij
xx
xx
xxrr
njirR
The symbol “” is defined by (2) and The symbol
” is defined by (3).
),min(),(
jiji
xxxx =∧
),max(),(
jiji
xxxx =∨
From the fuzzy relation matrix R (Figure 2), one can
see that:
The relation matrix R is a reflexive matrix.
The relation matrix R is a symmetric matrix.
The relation m atrix R is not a m ax-m in tra nsitive matri x.
A fuzzy binary relation matrix that is reflexive, sym-
metric and transitive is known as a fuzzy equivalence
relation matrix. A binary relation matrix that is reflexive
and symmetric is called a compatibility relation matrix.
The fuzzy relation matrix R (Figure 2) is a fuzzy com-
patibility relation matrix and not an equivalence rela-
tion matrix. So that a transitive closure of the relation
matrix R is necessary for the fuzzy cluster. Transitive
closure of the relation matrix R can be obtained by (4)
and (5).
One can have fuzzy cluster analysis for a fuzzy equiv-
alent matrix. For the fuzzy relation matrix R (F ig ure 2),
there is a fuzzy equivalent matrix R6 (to be shown in
Figure 3) by the aid of above step (4) and step (5). The
fuzzy equivalent matrix R6 (Figure 3) can be used in
fuzzy clustering.
Figure 1. The score matrix X of the sequences’ pairwise alignment.
])[(
1
2jkik
n
k
xxRRR ∧∨==
=
kk
till
RRRRR =→=
2224
J. YANG ET AL.
Copyright © 2013 SciRes. ENG
532
( )
=
116.009.009.010.010.016.016.008.015.015.015.0
116.016.017.017.053.047.032.054.051.047.0
199.096.094.016.020.011.016.019.016.0
196.095.016.020.008.016.019.016.0
197.016.020.009.016.020.017.0
116.021.009.017.020.017.0
147.051.052.043.048.0
139.047.047.042.0
132.046.025.0
146.078.0
143.0
1
12
11
10
9
8
7
6
5
4
3
2
1
121110987654321
12*12
x
x
x
x
x
x
x
x
x
x
x
x
xxxxxxxxxxxx
rR
ij
Figure 2. The fuzzy relation matrix R.
Figure 3. The fuzzy equivalent matrix R6.
3. Conclusions
We can have fuzzy cluster analysis for the fuzzy equiva-
lent matrix R6 (Figure 3) and have the fuzzy cluster
graph (to be shown in Figure 4) by different number of
alpha-cuts (thresholds).
By the fuzzy cluster analysis, some sequences are in-
terrelated with one group and the sequences in one group
have been consisted to have close relationship and func-
tions. To choose the appropriate alpha-cuts, the fuzzy
cluster graph can be applied to analyze the problem. For
instance, in the case alpha = 0.50, the sequence set X =
(x1, x2 ··· x12) is divided into 5 groups: {x1, x3, x6,
x11}; {x2, x4}; {x7, x8, x9, x10}; {x5}; {x12}.
In the group {x1, x3, x6, x11}, the x1 = nm_005166 is
a sequence in connection with the APLP1 gene. Amylo-
id-precursor-like protein 1 (APLP1) is a membrane-as-
sociated glycoprotein, whose gene is homologous to the
APP gene, which has been shown to be involved in the
pathogenesis of Alzheimers disease [4]. That has been
Figure 4. The fuzzy cluster graph by different number of
α-cuts.
implicated on genetic factor of the Alzheimers disease
occurrence. The other correlated sequences in the same
group may have the same function or characteristic.
For the point of view of fuzzy cluster, the gene se-
quences interrelated within one group may be consisted
=
116.016.016.016.016.016.016.016.016.016.016.0
121.021.021.021.053.047.051.054.051.054.0
199.096.096.021.021.021.021.021.021.0
196.096.021.021.021.021.021.021.0
197.021.021.021.021.021.021.0
121.021.021.021.021.021.0
147.051.053.051.053.0
147.047.047.047.0
151.051.051.0
151.078.0
151.0
1
12
11
10
9
8
7
6
5
4
3
2
1
121110987654321
6
x
x
x
x
x
x
x
x
x
x
x
x
xxxxxxxxxxxx
R
J. YANG ET AL.
Copyright © 2013 SciRes. ENG
533
to have closer relationship and similar functions and
characters. The results should be applicable referenced
information in the study of Alzheimers disease. Some
results of analysis by fuzzy cluster about gene sequences
of the Alzheimers disease should be confirmed by med-
icine before it was used to Alzheimers disease preven-
tion, clinical diagnosis and treatment.
A large amount of gene sequences has already been
collected and deposited in public databases and these are
important resources not only for use as markers to iden-
tify disease-relates genes, but also to provide useful in-
formation to understand the disease mechanisms in ge-
nome level. Study on interrelated gene sequences of the
Alzheimers disease by fuzzy cluster is one of the topics
on gene research. By way o f th e d isease interrelated gene
sequences exploration, we hope our explanation will
prove more or less helpful in the Alzheimers disease
prevention, clinical diagnos is and tr ea tment.
REFERENCES
[1] A. Rovelet-Lecrux, T. Frebourg, H. Tuominen, K. Maja-
maa, D. Campion and A. M. Reme s , “APP Locus Dupli-
cation in a Finnish Family with Dementia and Intracere-
bral Haemorrhage,” Journal of Neurology, Neurosurgery
& Psychiatry, Vol. 78, 2007, p. 1158.
[2] O. Nel son, H. Tu, T. Lei, M. Bentahir, B. Strooper and I.
Bezprozvanny, “Familial Alzheimer Disease-Linked Mu-
tations Specifically Disrupt Ca2+ Leak Function of Prese-
nilin 1,” The Journal of Clinical Investigation, Vol. 117,
2007, pp. 1230-1239.
[3] B. Q. Hu, The Elements Theory of Muzzy Mathematics,”
Wu Han University Press, 2004, pp. 153-158.
[4] B. Weber, C. Schaper, J. Scholz, B. Bein, C. Rodde and P.
H. Tonner, “Interaction of the Amyloid Precursor like
Protein 1 with the Alpha2a-Adrenergic Receptor Increas-
es Agonist-Mediated Inhibition of Adenylate Cyclase,”
Cell Signal, Vol. 18, 2006, pp. 1748-1757.