J. Biomedical Science and Engineering, 2010, 3, 719-726 JBiSE
doi:10.4236/jbise.2010.37096 Published Online July 2010 (http://www.SciRP.org/journal/jbise/).
Published Online July 2010 in SciRes. http://www.scirp.org/journal/jbise
A novel voting system for the identification of eukaryotic
genome promoters
Lin Lei3, Kaiyan Feng4, Zhisong He5, Yudong Cai1,2*
1Institute of System Biology, Shanghai University, Shanghai, China;
2Centre for Computational Systems Biology, Fudan University, Shanghai, China;
3School of Computer Engineering, Nanyang Technological University, Singapore;
4Division of Imaging Science & Biomedical Engineering, The University of Manchester, Manchester, UK;
5Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China.
Email: cai_yud@yahoo.com.cn
Received 5 April 2010; revised 6 May 2010; accepted 9 May 2010.
ABSTRACT
Motivation: Accurate identification and delineation
of promoters/TSSs (transcription start sites) is im-
portant for improving genome annotation and devis-
ing experiments to study and understand transcrip-
tional regulation. Many promoter identifiers are de-
veloped for promoter identification. However, each
promoter identifier has its own focuses and limita-
tions, and we introduce an integration scheme to
combine some identifiers together to gain a better
prediction performance. Result: In this contribution,
8 promoter identifiers (Proscan, TSSG, TSSW,
FirstEF, eponine, ProSOM, EP3, FPROM) are cho-
sen for the investigation of integration. A feature se-
lection method, called mRMR (Minimum Redun-
dancy Maximum Relevance), is novelly transferred to
promoter identifier selection by choosing a group of
robust and complementing promoter identifiers. For
comparison, four integration methods (SMV, WMV,
SMV_IS, WMV_IS), from simple to complex, are
developed to process a training dataset with 1400 se-
quences and a testing dataset with 378 sequences. As
a result, 5 identifiers (FPROM, FirstEF, TSSG, epo-
nine, TSSW) are chosen by mRMR, and the integra-
tion of them achieves 70.08% and 67.83% correct
prediction rates for a training dataset and a testing
dataset respectively, which is better than any single
identifier in which the best single one only achieves
59.32% and 61.78% for the training dataset and
testing dataset respectively.
Keywords: MRMR (Minimum Redundancy Maximum
Relevance); Transcription Start Sites (TSS); Promoter
Identification; Promoter Identifier Integration
1. INTRODUCTION
Promoter, a short DNA sequence, is the binding site of
RNA polymerases. It determines the transcription start
site (TSS). After RNA polymerase binding to a promoter,
the promoter initiates the transcription and indicates
where the transcription should start. In order to be rec-
ognized by the RNA polymerases, the structure of pro-
moters is rather stable, e.g. in eukaryotic genome, many
promoters contain TATA box, which can help locate
promoters by searching TATA sequences. Besides TATA
box, functional motifs, oligonucleotide composition and
compositional features are also used for promoter identi-
fication [1-8]. However, each promoter identifier has its
own focuses. Even when the same identification strategy
is applied by some different identifiers, they differ in
detail. Since some promoter identifiers maybe comple-
ment with each other because their principles are differ-
ent, their integration will be able to enhance the pro-
moter identification performance [9,10]. This paper in-
vestigates a novel way to combine some promoter iden-
tifiers together to improve the identification rate.
Voting has long been recognized as a useful integra-
tion tool to improve the robustness of a decision system.
Nearly all investigations find that if a decision gains the
majority votes, that decision is more likely to be the
right decision. These investigations are found in all
kinds of research areas, including pattern recognitions
[11-13], character and hand-writing recognitions [14-17],
image analysis [18,19], credit card slip processing [20]
and speaker identification [21]. Voting has also been
applied to identify promoters/TSSs [9,22]. In [10], 6
promoter identifiers were investigated, and 5 of them
were integrated to enhance the recognition rate by ex-
cluding a non user-friendly and poor-performed on-line
promoter identifier. In a recent work [22], Won et al.
L. Lei et al. / J. Biomedical Science and Engineering 3 (2010) 719-726
Copyright © 2010 SciRes. JBiSE
720
investigated 8 promoter/TSS (transcription start site)
identifiers and tried to find out what combinations were
best for the identification. They introduced a cut-off
value to exclude any promoter identifiers whose identi-
fication rate was lower than the cut-off value. However,
the work in [22] did not take consideration of the order
of adding the promoters into the integration, whereas,
the order will also affect the identification performance,
as will be explained later in the paragraph. In this study,
8 promoter identifiers (Proscan, TSSG, TSSW, FirstEF,
eponine, ProSOM, EP3, FPROM) are investigated. For
the eight promoter identifiers, two criteria should be
considered for the integration: firstly, the better the iden-
tifiers perform, the more preferable the identifiers should
be chosen, and secondly, dissimilar/less-correlated iden-
tifiers complement each other better and are also more
preferable to be chosen. The first criterion is straight-
forward. The second criterion is applied because similar
identifiers may strengthen each other and dominate the
decisions, e.g. if one identifier is used twice, the decision
will be biased towards this identifier. Similarly, if too
many identifiers are similar to each other, the decision
will be biased towards that type of identifiers. Since the
two criteria could be incompatible with each other, op-
timization is needed to balance both criteria. In this pa-
per, identifiers are selected one by another. The order of
the identifiers for the selection is important, illustrated
by the following example. Suppose 4 identifiers 1
i, 2
i,
3
i and 4
i are under examination and the combination
of 1
i and 2
i produces best results, if identifiers are
added according to the list [1
i,4
i,3
i,2
i], the optimized
combination can never be found. Thus following the two
criteria to make a list is important. mRMR (minimum
Redundancy Maximum Relevance) method [23] is origi-
nally developed for feature selection, and is transferred
into the selection of promoter identifiers to satisfy the
two criteria. mRMR tries to maximize the relevance be-
tween variables and the targets, which is in accordance
with criterion 1, and at the same time, minimize the re-
dundancy between variables, which is in accordance
with criterion 2. mRMR is introduced in detail in section
2.3. However, voting cannot solve the intrinsic problems
of individual identifiers and the right decision of one
identifier will be ignored if most of identifiers vote for
the wrong decision. Therefore, future researches are still
needed.
For comparison, four integration methods SMV (sim-
ple majority voting), WMV (weighted majority voting),
SMV_IS (simple majority voting plus identifier selection)
and WMV_IS (weighted majority voting plus identifier
selection), from simple to complex, are developed to
process a training dataset with 1400 sequences and a
testing dataset with 378 sequences. As a result, WMV_
IS achieves the best TSS-based recognition rates with
70.08% and 67.83% correct recognition rates for the
training dataset and testing dataset respectively.
2. MATERIAL AND METHODS
2.1. Datasets
The EPD (The Eukaryotic Promoter Database Current
Release 95, http://www.epd.isb-sib.ch/) [24], a promoter
database of the EMBL Data Library, is an annotated
non-redundant collection of experimentally determined
eukaryotic polymerase II promoters. Since promoters are
defined and confirmed by experimentally determined
TSSs, the underlying promoter definition is given by the
position of TSS in EPD database.
First, 1871 human gene sequences were downloaded
from the EPD website, and 1778 DNA sequences were
chosen by excluding any sequence containing missing
base pairs. The sequence length is 1.5 kb while the true
TSS is located at a random position on the sequence.
These 1778 sequences are then divided into a training
dataset with 1400 sequences and a testing dataset with
378 sequences randomly. The training dataset is evalu-
ated by 5-fold cross-validation to obtain the recognition
rates for each promoter identifier. These recognition
rates are later fed back to weight the identifiers in voting
for both training and testing dataset. Since the recogni-
tion rates are gained from the training dataset and then
fed back to the training dataset for weighting, the recog-
nition might be biased. However, since the identification
accuracy is rather stable especially with a large dataset,
the bias is neglectable. For scrutiny, a testing dataset is
independently used for testing by taking the promoter
recognition rates from the training dataset. Please refer
to supplemental material 1 and supplemental material 2
for the training datasets and the testing datasets respec-
tively.
2.2. Promoter Identifiers
Many TSS predictors are available on the internet. Eight
identifiers are chosen as they have been actively main-
tained and widely used. These identifiers are Proscan,
TSSG, TSSW, FirstEF, eponine, ProSOM, EP3, FPROM.
Detail of these identifiers can be found in Supplemental
Material 3. Their different recognition mechanisms and
mathematical architectures may enable them to comple-
ment each other during voting.
2.3. MRMR (Minimum Redundancy Maximum
Relevance)
Minimum Redundancy Maximum Relevance (mRMR)
[23] is first developed by Peng. In mRMR analysis, a
good feature is characterized by its relevance with the
target variable and its correlation with other features – it
will be more likely to be chosen if it is more relevant to
L. Lei et al. / J. Biomedical Science and Engineering 3 (2010) 719-726
Copyright © 2010 SciRes. JBiSE
721
the target class and less correlated with other features.
Both relevance and correlation can be estimated by mu-
tual information (MI), indicating how much one vector
is related to another. MI is defined as follows:
(, )
(, )(, )log()()
px y
I
xy pxydxdy
pxpy
 (1)
where
x
and y are two vectors; (, )pxy is the joint
probabilistic density; ()px and ()pyare the marginal
probabilistic densities.
Let denote the whole vector set. The already-se-
lected vector set with m vectors is denoted by
s
,
and the to-be-selected vector set with n vectors is de-
noted by t
. Relevance D of a feature
f
in t
with a target variable c can be computed by Eq.2.
(,)DIfc (2)
Redundancy R of a feature f in t
with all the
features in
s
can be computed by Eq.3
1(, )
is
i
f
RIff
m
(3)
To maximize relevance and minimize redundancy,
mRMR function is obtained by integrating Eq.2 and Eq.
3:
,
1
max(,)()(1, 2,...,)
jt is
jji
ff
I
fcIff jn
m
 




(4)
Let the initial {}
i
f where i
f is the vector
produced by the best performed promoter identifier, and
121 1
{ ,,...,,,...,}
tiin
fff ff

 by excluding only i
f.
Eq.4 is used to obtain one vector by another in to-
tally 1nrounds, resulting a vector list with the selec-
tion order '' ''
01 1
, ,...,,...,
hN
Sff ff


where h de-
notes at which round the feature is selected.
In this research, mRMR method is used to rank the 8
promoter identifiers. The predicted/identified results are
coded by integer numbers as is described in Subsection
2.6. The real coded promoters are the target vector, and
the predicted ones are treated as the input features for the
mRMR method.
2.4. Voting Systems
Four voting systems are developed for the promoter re-
cognition. They are Simple Majority Voting (SMV),
Weighted Majority Voting (WMV), Simple Majority
Voting plus Identifier Selection (SMV_IS), Weighted
Majority Voting plus Identifier Selection (WMV_IS).
2.4.1. Simple Majority Voting (SMV)
Each promoter identifier will give a decision. The ma-
jority decisions over a TSS are taken as the predicted
TSS. This is the simplest voting system that does not
require any additional complex computation.
2.4.2. Weighted Majority Voting (WMV)
In SMV, all identifiers are treated equally regardless of
their identification capability, while, in WMV, the vote
of an identifier is weighted by its recognition rate. For
example, when integrating 4 predictors, assume the de-
tected rates for the eight promoter identifiers are 0.4,
0.45, 0.5 and 0.55. The to-be-predicted sample is judged
to be a positive one with the first two identifiers, while
the other two give out negative results. With SMV, the
score of this sample obtained is 2 because two identifiers
agree that this sample is in promoter region. But with
WMV, the score become (0.4 + 0.45)/(0.4 + 0.45 + 0.5 +
0.55) = 0.447. Here, if the score is no more than 0.5, the
sample would be predicted as a negative sample, i.e. not
located in promoter. So the output for this sample would
be negative. Because the performance of some identifi-
ers is much better than others, these better performed
identifiers should be weighted more heavily. The recog-
nition rate is obtained by evaluating the training dataset
using cross-validation.
2.4.3. Simple Majority Voting Plus Identifier
Selection (SMV_IS)
The 8 investigated algorithms are first ranked by the
mRMR method. Identifiers in the topper ranks are re-
garded to be less redundant between each other and
more relevant to the SST recognition. Next, we need to
find out how many identifiers should be chosen from the
mRMR ranking list'' '
01 7
, ,...,Sff f
by adding one
identifier by other from the list as the candidate identifi-
ers, starting from the first identifier'
0
f
. Each time when
an identifier is added, SMV is applied among the se-
lected identifiers. The integrated identifier through SMV,
with the highest correct recognition rate evaluated by
cross-validation test, is regarded as the optimized identi-
fier/predictor of SMV_IS.
2.4.4. Weighted Majority Voting Plus Identifier
Selection (WMV_IS)
The only difference between SMV_IS and WMV_IS is
that, towards the integration of the candidate identifiers,
WMV is applied instead of SMV.
2.5 Detection and Prediction Rate
If an identifier outputs a predicted promoter instead of
an explicit TSS, the prediction is regarded to be correct
if the predicted promoter is within the range from 200bp
upstream to 100bp downstream of the experimentally
determined TSS. The Detected TSS Rate is defined as
the number of recognized TSSs divided by the total
number of experimentally determined TSSs, and Non-
L. Lei et al. / J. Biomedical Science and Engineering 3 (2010) 719-726
Copyright © 2010 SciRes. JBiSE
722
detected TSS Rate is calculated as 1 minus the detected
rate. The Correct Prediction Rate is defined as the num-
ber of correctly recognized TSSs divided by the total
number of the predicted TSSs, and the False Prediction
Rate is calculated as 1 minus the correct prediction rate.
The prediction rates we use are defined as follows:
Detected TSS Rate
the number of recognized TSSs
= the total number of experimentally determined TSSs
Non-detected TSS Rate = 1 - Detected TSS Rate
Correct Prediction Rate
the number of correctly recognized TSSs
= the total number of the predicted TSSs
False Prediction Rate = 1 - Correct Prediction Rate
2.6. Generate Input Matrix for MRMR
Algorithm
First the prediction results need to be organized in a way
that can be input into mRMR. Each residue is predicted
by N predictors (N is the number of predictors used),
and its true identity is determined by experiment. Thus
each residue can be coded by N + 1 digits. The experi-
mentally determined TSSs are regarded as the target
variable of mRMR while the N predicted TSSs are re-
garded as the features of mRMR.
The final matrix is a 1-0 matrix of 2100000 × (N + 1)
(2100000 = 1400 × 1500), including the true TSS region
and N prediction results. Each sequence consists of 1500
nucleotides, and the matrix contains all the 1400 se-
quences. The input matrix is shown in Figure 1. The
first column: The sites from 200bp upstream to 100bp
downstream of the TSS are set to be 1, and others are set
to be 2. The other N columns: The predicted TSSs are set
to be 1, and others are set to be 0.
Then mRMR is applied to filter the features and get
the rank.
All supplemental materials mentioned above are avail-
able upon request.
3. RESULTS
3.1. Training Sets
3.1.1. The Prediction Results of the Eight Predictors
The training dataset (1400 sequences) were input into the
8 promoter predictors (refer to Subsection 2.2) and the
prediction results were produced. The prediction rates
(defined in Subsection 2.5) were calculated to rate the
performance of the predictors, which are shown in Table
1. Correct prediction rate is the best standard to evaluate
the prediction performance. If two predictors have similar
correct prediction rates, the one achieving significantly
better detection rate is regarded to perform better. The
predictor FPROM is considered to have best prediction
performance since it achieves the best correct prediction
accuracy, 59.32%, among all the 8 predictors, and at the
same time has a reasonable detected rate (64.57%). Cor-
rect prediction rates, obtained by the train- ing dataset,
are used to weight the corresponding predictors when
they are integrated by WMV and WMV_IS.
3.1.2. Voting Method (SMV and WMV)
The sequence of 1500 bps is divided into 15 regions,
each of which contains 100 bps. Votes are counted on
each region, i.e., if a predicted TSS falls on a region, the
region gets a vote. The prediction rates of Simple Major-
ity Voting are shown in Table 2. For the SMV, though its
correct prediction rate is a little lower (1.52%) than that
of the best predictor FPROM, the detected rate is much
higher (8.14%) than FPROM. For WMV, both its de-
tected rate and correct prediction rate are higher than
FPROM, indicating that WMV performs better than any
individual predictor and also the SMV.
3.1.3. Output of MRMR Program
The mRMR program used in this contribution is
downloaded from website http://research.janelia.org/peng/
proj/mRMR/. As all of the input vectors are integer vec-
tors, we specify the parameter 0t in the mRMR pro-
gram to satisfy the integral calculation. Submit the ma-
trix, resulted from the promoter identifiers to the mRMR
program, (resulted from Subsection 2.6) to get the ranks
of the identifiers. The list, provided by mRMR, is shown
in Figure 2.
Table 1. The performance of 8 promoter predictors.
Software Detected Rate Correct Prediction Rate
Proscan 35.79% 47.35%
TSSG 59.36% 47.35%
TSSW 66.21% 49.47%
FirstEF 31.29% 55.17%
eponine 42.86% 50.81%
ProSOM 50.43% 44.17%
EP3 29.50% 49.17%
FPROM 64.57% 59.32%
Table 2. The performance of SMV and WMV.
Software-Integration Detected Correct
SMV(8 identifiers together) 73.36% 63.01%
WMV(8 identifiers together) 68.57% 69.03%
L. Lei et al. / J. Biomedical Science and Engineering 3 (2010) 719-726
Copyright © 2010 SciRes. JBiSE
723
Figure 1. The input matrix of mRMR, suppose N = 8. The first row shows the titles of columns. The first column shows the
category of each sample, representing positive ones with 1 while negative ones with 2. The other eight columns show the
outputs of the eight individual predictors. If the sample is predicted as in promoter region, the corresponding value of the
sample will be assigned to 1, otherwise, 0.
Figure 2. The output rank of mRMR. The first two rows show the parameters of mRMR program, while entropy score was
calculated based on the probabilities of each feature obtained from the training dataset. The order of features was calculated
based on Eq.4 in methods.
3.1.4 Software-Integration (SMV_IS and WMV_IS)
According to the rank of mRMR result, we add the
identifiers to be integrated through voting one by one.
The integration results of SMV_IS and WMV_IS are
shown in Table 3 and Table 4. The best correct pre-
diction rate is achieved by integrating 5 identifiers,
FPROM, FirstEF, TSSG, eponine and TSSW using
WMV_IS, shown in Table 4. The correct prediction rate
of the integration is 70.08, a slight improvement to the
WMV. At the same time, the correct detected rate has
also been improved, confirming that WMV_IS performs
better than WMV. The integration of 7 identifiers of
SMV_IS is also slightly better than including all the 8
identifiers, as is shown in Table 3.
The best results obtained from different prediction
methods are shown in Table 5. When sorted by the
correct prediction rates, the order of these prediction
methods is WMV_IS>WMV>SMV_IS>SMV>FPROM.
3.2. Testing Sets
3.2.1. Results of the Eight Individual Identifiers
The testing dataset with 378 sequences was input into the
eight promoter predictors (refer to Subsection 2.2, two
predictors are deleted). The prediction rates (defined in
Subsection 2.5) were calculated to rate the performance of
the identifiers. These values are shown in Table 6.
3.2.2. Identifier-Integration
We use the same voting methods, SMV and WMV, as in
Subsection 3.1.2, and methods WMV_IS and SMV_IS,
as in Subsection 3.1.4. Because testing dataset is only
used for testing, the list and the number of the identifiers
are adopted from the training dataset. The purpose of the
testing dataset is to validate the results from the training
dataset, as it is regarded to be unbiased in the voting.
The prediction results of the testing dataset are shown in
Table 7. By comparing the results in Table 5 with those
L. Lei et al. / J. Biomedical Science and Engineering 3 (2010) 719-726
Copyright © 2010 SciRes. JBiSE
724
Table 3. The SMV_IS results.
Software-Integration Detected
Correct
prediction
FPROM 64.57% 59.32%
FPROM&FirstEF 73.00% 57.31%
FPROM&FirstEF&TSSG 77.07% 56.04%
FPROM&FirstEF&TSSG&eponine 76.07% 60.99%
FPROM&FirstEF&TSSG&eponine&TSSW 73.57% 63.35%
FPROM&FirstEF&TSSG&eponine&TSSW&Proscan 72.71% 62.71%
FPROM&FirstEF&TSSG&eponine&TSSW&Proscan&ProSOM 73.57% 64.12%
FPROM&FirstEF&TSSG&eponine&TSSW&Proscan&ProSOM&EP3 73.36% 64.01%
Table 4. The WMV_IS results.
Software-Integration Detected
Correct
Prediction
FPROM 64.57% 59.32%
FPROM&FirstEF 67.57% 60.37%
FPROM&FirstEF&TSSG 69.29% 64.58%
FPROM&FirstEF&TSSG&eponine 69.43% 68.62%
FPROM&FirstEF&TSSG&eponine&TSSW 69.01% 70.08%
FPROM&FirstEF&TSSG&eponine&TSSW&Proscan 68.79% 68.87%
FPROM&FirstEF&TSSG&eponine&TSSW&Proscan&ProSOM 68.64% 69.01%
FPROM&FirstEF&TSSG&eponine&TSSW&Proscan&ProSOM&EP3 68.57% 69.03%
Table 5. The prediction results of the training dataset.
Predicction Methods Detected Correct
Prediction
The best single software(FPROM) 64.57% 59.32%
SMV(8 softwares together) 73.36% 63.01%
WMV(8 softwares together) 68.57% 69.03%
SMV_IS(FPROM&FirstEF&TSSG&eponine&TSSW&Proscan&ProSOM) 73.57% 64.12%
WMV_IS(FPROM&FirstEF&TSSG&eponine&TSSW) 69.01% 70.08%
Table 6. The performance of 8 promoter identifiers using test-
ing sets.
Identifiers Detected Rate Correct Prediction Rate
ProSOM 48.15% 40.63%
EP3 28.04% 45.30%
Proscan 34.92% 45.52%
TSSG 59.52% 47.77%
TSSW 62.96% 49.64%
eponine 46.56% 49.93%
FirstEF 32.80% 60.08%
FPROM 67.99% 61.78%
in Table 7, we can tell that WMV_IS performs the best
in both training dataset and testing dataset. However, we
cannot tell whether SMV or SMV_IS performs better
than the best single software FPROM in the testing,
since they produces lower prediction accuracy than the
FPROM. As a conclusion, several observations can be
made: 1) The prediction rate of integrating several iden-
tifiers is not necessarily better than the best single iden-
tifier, e.g. the SMV and SMV_IS have lower correct
prediction rates than the best single identifier; 2) In all
cases, the prediction rate is greater with the identifier
selection than those without; 3) The prediction rate of
WMV is greater than that of SMV.
L. Lei et al. / J. Biomedical Science and Engineering 3 (2010) 719-726
Copyright © 2010 SciRes. JBiSE
725
Table 7. The prediction resluts of the testing dataset.
Software-Integration Detected Correct
The best single software(FPROM) 67.99% 61.78%
SMV(8 softwares together) 71.16% 59.33%
WMV(8 softwares together) 65.87% 67.30%
SMV_IS(FPROM&FirstEF&TSSG&eponine&TSSW
&Proscan&ProSOM) 70.10% 62.82%
WMV_IS(FPROM&FirstEF&TSSG&eponine&TSSW) 66.67% 67.83%
4. CONCLUSIONS
We introduce a voting system to integrate several eu-
karyotic promoter identifiers to predict promoters in the
human genome. We find that the integration of several
identifiers through a simple voting does not necessarily
improve the prediction performance. However, after the
identifiers are weighted using their prediction accuracies,
the prediction performance is improved. Moreover, fil-
tering the identifiers is able to improve the prediction
accuracy than using all identifiers without a filtering.
The order of the identifiers to be added, provided by the
mRMR, may not be truly optimized since mRMR makes
the list without an attempt to integrate the identifiers,
which could potentially be a topic for a future research.
5. ACKNOWLEDGEMENTS
This work is supported by National Basic Research Program of China
(2004CB518603), grant from Shanghai Commission for Science and
Technology (KSCX2-YW-R-112), and the grant supported by Shang-
hai Leading Academic Discipline Project (J50101).
REFERENCES
[1] Abeel, T., Saeys, Y., Bonnet, E., Rouze, P. and Van de
Peer, Y. (2008) Generic eukaryotic core promoter predic-
tion using structural features of DNA. Genome Research,
18(2), 310-323.
[2] Abeel, T., Saeys, Y., Rouze, P. and Van de Peer, Y. (2008)
ProSOM: Core promoter prediction based on unsuper-
vised clustering of DNA physical profiles. Bioinformat ic s,
24(13), i24-31.
[3] Davuluri, R.V., Grosse, I. and Zhang, M.Q. (2001) Com-
putational identification of promoters and first exons in
the human genome. Nature Genetics, 29(4), 412-417.
[4] Down, T.A. and Hubbard, T.J. (2002) Computational
detection and location of transcription start sites in ma-
mmalian genomic DNA. Genome Research, 12(3), 458-
461.
[5] Prestridge, D.S. (1995) Predicting Pol II promoter se-
quences using transcription factor binding sites. Journal
of Molecular Biology, 249(5), 923-932.
[6] Solovyev, V.V. and Shahmuradov, I.A. (2003) PromH:
Promoters identification using orthologous genomic se-
quences. Nucleic Acid Research, 31(13), 3540-3545.
[7] Solovyev, V.V. and Salamov, A. (1997) The Gene-Finder
computer tools for analysis of human and model organ-
ism genome sequences. The Fifth International Confer-
ence on Intelligent Systems for Molecular Biology, 294-
302.
[8] Werner, T. (1999) Models for prediction and recognition
of eukaryotic promoters. Mamm Genome, 10(2), 168-
175.
[9] Altincay, H. and Demirekler, M. (2000) An information
theoretic framework for weight estimation in the com-
bination of probabilistic classifiers for speaker identifica-
tion. Speech Communication, 30(4), 255-272.
[10] Liu, R. and States, D.J. (2002) Consensus promoter iden-
tification in the human genome utilizing expressed gene
markers and gene modeling. Genome Research, 12(3),
462-469.
[11] Lam, L. and Suen, C.Y. (1994) A theoretical-analysis of
the application of majority voting to pattern-recognition.
12th IAPR International Conference on Pattern Recogni-
tion, Jerusalem, Israel, 418-420.
[12] Lam, L. and Suen, C.Y. (1997) Application of majority
voting to pattern recognition: An analysis of its behavior
and performance. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 27(5), 553-568.
[13] Stajniak, A., Szostakowski, J. and Skoneczny, S. (1997)
Mixed neural-traditional classifier for character recogni-
tion. SPIE-International Society for Optical Engineering,
2949, 102-110.
[14] Huang, Y.S. and Suen, C.Y. (1995) A method of combin-
ing multiple experts for the recognition of unconstrained
handwritten numerals. IEEE Transactions on Pattern An-
alysis and Machine Intelligence, 17(1), 90-94.
[15] Lam, L., Huang, Y.S. and Suen, C.Y. (1997) Combination
of multiple classifier decisions for optical character rec-
ognition. In: Handbook of Character Recognition and
Document Image Analysis, Edited by Bunke, H. and
Wang, P.S.P., World Scientific Publishing Company, New
Jersey, 79-101.
[16] Rahman, A.F.R., Alam, H. and Fairhurst, M.C. (2002)
Multiple Classifier Combination for Character Recogni-
tion: Revisiting the Majority Voting System and Its
Variation. In: Lecture Notes in Computer Science, Spri-
nger Berlin/Heidelberg, 2423, 319-328.
[17] Suen, C.Y., Nadal, C., Mai, T.A., Legault, R. and Lam, L.
(1990) Recognition of totally unconstrained handwritten
numerals based on the concept of multiple experts. In:
International Workshop Frontiers in Handwriting Rec-
ognition, Montreal.
L. Lei et al. / J. Biomedical Science and Engineering 3 (2010) 719-726
Copyright © 2010 SciRes. JBiSE
726
[18] Ho, T.K., Hull, J.J. and Srihari, S.N. (1992) Combination
of Decisions by Multiple Classifiers. In: Structured Docu-
ment Image Analysis, Edited by Baird, H.S., Bunke, H.,
Yamamoto, K., Springer Verlag New York, Inc., NewJersy,
188-202.
[19] Rahman, A.F.R. and Fairhurst, M.C. (1997) Exploiting
second order information to design a novel multiple ex-
pert decision combination platform for pattern classifica-
tion. Electronics Letters, 33(6), 476-477.
[20] Rohlfing, T., Russakoff, D.B. and Maurer, C.R. (2004)
Performance-Based Classifier Combination in Atlas-
Based Image Segmentation Using Expectation-Maxi-
mization Parameter Estimation. IEEE Transactions on
Medical Imaging, 23(8), 983-994.
[21] Paik, J., Jung, S. and Lee, Y. (1993) Multiple combined
recognition system for automatic processing of credit
card slip applications. In: The Second International Con-
ference on Document Analysis and Recognition, IEEE
Computer Society Press, Washington, 520-523.
[22] Won, H.H., Kim, M.J., Kim, S. and Kim, J.W. (2008)
EnsemPro: an ensemble approach to predicting transcrip-
tion start sites in human genomic DNA sequences. Ge-
nomics, 91(3), 259-266.
[23] Peng, H., Long, F. and Ding, C. (2005) Feature selection
based on mutual information: criteria of max-dependency,
max-relevance, and min-redundancy. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 27(8),
1226-1238.
[24] Bucher, P., Périer, R.C., Praz, V. and Schmid, C. (2006)
The eukaryotic promoter database user manual. Nucleic
Acid Research, 34, D82-85.