Multitude Classifier Using Rough Set Jelinek Mercer Naïve Bayes for Disease Diagnosis

doi:10.4236/cs.2016.76059

Paper Menu >>

Journal Menu >>

Circuits and Systems, 2016, 7, 701-708

Published Onlin e May 2016 in Sci Res. http://www.sc irp.org/journal/cs

http://dx.doi.org/10.4236/cs.2016.76059

How to cite this paper: Prema, S. and Umamaheswari, P. (2016) Multitude Classifier Using Rough Set Jelinek Mercer Naïve

Bayes for Disease Diagnosis. Circuit s and Syste ms, 7, 701-708. http://dx.doi.org/10.4236/cs.2016.76059

Multitude Classifier Using Rough Set Jelinek

Mercer Naïve Bayes for Disease Diagnosis

S. Prema1, P. Umamah eswari2

1Information and Communication Engineering, Anna Univer sit y, Chennai, India

2Comput er Science and Technology, MIT Campus, Anna Uni ver sity , Chennai, India

Received 29 February 2016; accepted 9 May 2016; published 12 May 2016

This work is lic ensed under the Creati ve Commons Attribution International License (CC B Y).

http://creativecommons.org/licenses/by/4.0/

Abstract

Classificati on m odel has r eceived grea t atte nti on in any d om ain of rese a rch an d al so a reliable to ol

for medic al dise ase diag nosis. The d om ai n of clas sific ation m odel is us ed in disease d iagnosis,

disease predic tio n, bio i nform atics, c rime p redictio n and so on. H owever , an efficie nt diseas e di-

agnosis model w as comp rom is ed the disease predictio n. In this paper, a Rough Set Rule-based

Multitude Classifier (RS -RMC ) is dev eloped to improve the disease p redict ion r ate and enhance

the cla ss accur acy of dise ase bei ng d iagnosed . Th e RS-RMC involve s two steps. I nitiall y, a Roug h

Set mod el is used fo r F eat u re Sel ectio n ai ming at mi nimizin g th e execu tion tim e f or obt aining the

disease fe atu re s et. A Multitude Classifier mod el is presented in second step fo r detecti on of he ar t

disease and for efficient cl assification . The Naï ve Bayes Classifie r algo rith m is designed for effi-

cient identification of clas ses to meas u re th e rel atio nship bet ween d isease fea tu res and i mproving

disease prediction ra te. Ex p erimental an alysis sh ow s that RS -RMC is u sed to reduce the ex ecution

time for ext rac tin g the d is ease fe ature w ith minimu m false posi tive r ate compared to th e sta te-of-

the-art w orks .

Keywords

Classificati on M odel , Disea se Diagnosis, Rou gh Set M odel , Fe ature Sel ection , Multitude Classifier,

Mercer Naïve

1. Introduction

In a conventional classification model, the classification strategy identifies and selects the best classifier on the

basis of experimental assessment with various individual classifiers. In a diversion from the conventional ap-

proach, the use of Multitude Classifier System (MCS) has been presented as an alternative approach to improve

classification accuracy of the disease being detected.

S. Prema, P. Umamaheswari

702

Potential Management of Ventricular Arrhythmias (PPM-VA) [1] identified the stroke pattern and manage-

ment of ventricular Arrhythmias that wer e largely related to stroke for the effective diagnosis of disease at an

early stage. Prediction of Events using Spatio Spectro Temporal Data (PE-SST D) [2] provided with a case st udy

on stroke resulting in the accuracy of the disease being diagnosed. However, both the above methods lack the

class accuracy of disease diagnosis with the increase in the feature.

To improve the detection rate of disease for Interstitial Lung Disease (ILD) in [3], a method was presented

improving the detection of disease at an early stage. However, the classification accuracy remained unsolved. To

address the issues related to class accuracy, Striatial Binding Ratio (SBR) was used in [4] to improve the class

accuracy rate. A functional classification model [5] for early detection of heart failure using classification sche-

ma was presented for improving class accuracy rate. Classification methods for bipolar disorders [6] were de-

signed to improve diagnostic reliability. Another method using neural network and decision based support sys-

tem was designed [7] to improve the classification accuracy rate. However, accuracy with respect to scalability

remained unaddressed.

An efficient classification approach using ANN and Feature Subset Selection [8] was presented to improve

the accuracy of disease being detected. In [9], with the objective of i mproving classification rate, Diagnostic and

Statistical Manual of Mental Disorders was presented.

In accordance with the above-mentioned advantages of both disease classification and disease diagnosis, in

this paper, a new framework called Rough Set Rule-based Multitude Classifier (RS-RMC) is proposed to in-

crease the disease prediction rate and efficiency of the classification accuracy of disease being diagnosed.

2. Design of Rough Set Rule-Based Multitude Classifier

To address the problem of disease diagnosis at an early stage, a framework is proposed based on Rough Set

Rule -based Multitude Classifier. The Rough Set Rule-based Multitude Classifier uses disease features based on

similar type of medical diseased data to identify the relationship between the disease features for efficient classi-

fication. Figure 1 shows the b lock diagram of Rough Set Rule-based Multitude Classifier.

As sho wn in Figure 1, Cleveland Heart Disease dataset extracted from UCI reposito ry is given as input. T he

blo ck di agra m shows a two -stage process. In the first stage, Rough Set Feature Selection model is applied to the

input dataset to extract the disease features. Feature Reduct applied in Rough Set model reduce the disease fea-

ture without losing significant information. This is performed through lower and upper approximation without

chan ging the values of d i s eas e features.

The second stage goes through the Multitude Classifier model called as the Mercer Naïve Bayes Classifica-

tion model. This is performed using Naïve Bayes Disease Classifier algorithm aiming to improving the class

accuracy. Finally, the Jelinik Mercer Multitude Classifier is used as a smoothing technique for obtaining ap-

pro xi mati on f unc t ion increasing the disease prediction rate.

Figure 1 . Block diagram of rough set rule-based multitude classifier.

S. Prema, P. Umamaheswari

703

2.1. Construction of Rough Set (Model) Feature Selection (Minimize Execution Time)

The first step in Roug h Set Rule-based Multit ude Classifier ( RS-RMC) fr amewor k is the const ructi on of Ro ugh

Set Feature Selection (RSFC) to reduce the complexity by minimizing the redundant disease features contained

within the set of feature patterns. The feature selection using Rough Set model minimizes the redundant disease

features by selecting those significant disease features that are most essential for Multitude Classifier repre-

sented in the pattern set.

Let us consider “DFP” the set of all disease feature patterns, “F” the set of all disease features, then the value

of disease feature “a” in disease feature pattern “P” is given as below

( )

,, where&fa PaDFPPF∈∈

(1)

From (1), all disease features with disease feature patterns are identified. In order to reduce computational

complexity and memory requirements, the disease feature patterns with the features identified is herein treated

as a Rough Set model. The objective of using Rough Set model is to reduce the redundant features and therefore

minimizes the execution time to obtain relevant features.

With the reduced features, the indiscernible relationship for identifying the features is obtained. For any two

disease patterns,

( )

,ab

in “DFP” indisce r nib le relatio n for disease features is given as below

() ( )

,,,for all

faPfbPa P= ∈

(2)

For example, given the disease features (Chest Pain, Blood Pressure, Heart rate), the indiscernible set are,

“

() ()()

( )

{ }

1326354

,,,,,,PP PP PP P

”. The Rough Set Feature Selection in RS-RMC framework performs two op-

erations called, lower and upper approximation to measure the significance of the disease feature from the dis-

ease feature pattern set. Then, the lower and upper approximation is obtained as given below

( ){}

[]

DFP AxDFPxA

=∈⊆

(3)

( ){}

[ ]

DFP AxDFPxA

′=∈⊆ ≠∅

(4)

From (3) and (4), the disease features using lower approximation is identified, where “A” is t he set of dise ase

feature patterns in “DFP” tha t are surely in “A”. On the o the r hand the upp er ap proxi mat ion with “A” a s the set

of disease feature patterns in “DFP” i s probably in “A”. Finally, with the obj ective of reducing the disease fea-

ture set without losing significant information, disease feature Reduct Set is applied to the resultant set. A dis-

ease feature Reduct Set is then defined as a subset “P” of disease features (i.e., reduct disease features) “f” is

given as be lo w,

() ()

fDFPP DFP

αα

(5)

By applying (5), the RS-RMC framework searches for disease feature reduct set of least cardinality. As a re-

sult, RSFC preprocesses disease feature patterns without changing the values of the disease features, aiming at

minimizing the execution time for obtaining the disease features. Using this resultant disease features relation-

ship between features helps in the easy and early diagnosis of disease.

2.2. Jelinik Mercer Naïve Bayes Classifier

The second step in Ro ugh Set Rule-based Multitude Classifier ( RS-RMC) fra me wor k is the effective identifica-

tion of disease features relationship using Jelinik Mercer Naïve Bayes Classifier. With the objective of increas-

ing the disease prediction rate, Jelinik Mercer Naïve Bayes Classifier is applied for disease diagnosis after mi-

nimi zi ng the disease features.

Jelinik Mercer Naïve Bayes Classifier for RS-RMC framework is based on the Bayes rules. The Bayes rules

for Classifier a pplies the cond itional probability rule that measures the maximum likelihood of a property for the

given d isease features. Let us consider a scenario with a patient observed to have disease with certain symptoms

(i.e., features). In order to perform Multitude Classifier, Jelinik Mercer Naïve Bayes is applied to measure the

relationship between the disease features and identify whether the disease being diagnosed is correct or not.

The Naïve Bayes Classifier assumes that the presence of a specific disease feature is unrelated to the presence

of any other disease feature. The Naïve Bayes Classifier then efficiently predicts that given the reduct disease

S. Prema, P. Umamaheswari

704

features “f”, belongs to the class “

” then there exists disease feature relationship between “

” and “f” as

given as be lo w

( )

PPcf

Pf Pf



∗



 





(6)

where







denotes the maximum posterior hypothesis for class “

”. By applying the above formula,

conditional probability of a disease pattern belonging to each disease is efficiently identifie d improvi ng the cla ss

accuracy. Based on the conditional probability of disease pattern, the instance (i.e., feature) is classified as the

class with the hi ghest conditional prob a bility.

Figure 2 shows t he Naïve Ba yes Disease Classi fier algorit hm for efficie nt identificatio n of classes that helps

in measuring the relationship between disease features. As a result, disease diagnosis is made in an efficient

manner improving the disease prediction rate. Figure 2 shows the design of Naïve Bayes Disease Classifier al-

gorithm. From the above algorithm, for each dataset, the features in the dataset are identified. Once the features

are identified, the list of patients along with their associated classes is obtained. Based on the disease features,

reduct disease features are identified to reduce the complexity without losing the values of disease feature. Fi-

nally, maximum posterior hypothesis is applied to the reduct disease features for efficient disease diagnosis.

2.3. Jelinik Mercer Multitude Classifier Model

The Jelinik Mercer in RS-RMC framework classifies Multitude Classifier for effective disease diagnosis. Jelinik

Mercer Multitude Classifier is used as a smoothing technique to obtain an approximation funct ion for multit ude

of disease features. It is formulated as given below

()

f ff

P PP

cfcf cf

ββ

 

=−∗ +

 



 

(7)

From (7), “

Pcf







” represent the smoothened probability of a test, given the patient medical information

with e xisti ng tests a nd “

” ranges between “0” and “1”, “

Pcf







” representing the maximum li kelihood esti-

mation in class feature “

”. This in turn increases the disease prediction rate.

Figure 2 . Naïve bayes diseas e classifier algorithm.

S. Prema, P. Umamaheswari

705

3. Experimental Settings

The performance of the Rough Set Rule-based Multitude Classifier (RS-RMC) framework is experimented us-

ing Cleveland Heart Disease Dataset extracted from UCI repository from Cleveland Clinic Foundation. Heart

disease data set available at http://arc hive.ics.uci.ed u/ ml/datasets/heart+Disease [10]. The data set has 76 raw

attributes. However, all of the published experiments only refer to 11 of them. The RS-RMC framework is si-

mulated usin g MATLAB.

The experimental work is compared against the existing Prevention and Potential Management of Ventricular

Arrhythmias (PPM-VA) [1] and Prediction of Events using Spatio Spectro Temporal Data (PE-SSTD) [2] to

identify the effectiveness of RS-RMC framework. The performance of the RS-RMC framework is measured in

terms of disease prediction rate, execution time and false positive rate on effective disease diagnosis and class

accuracy.

3.1. Execution Time

The execution time is the time taken to obtain the disease feature set. It is expressed in terms o f milliseconds and

is formulated as given below:

( )

ETTime F=

(8)

From (8), “F” denotes the features in the Cleveland Heart Disease dataset. Lower the time taken to execute,

more efficient the method is said to b e

In Figure 3, results are reported for various classification methods for Heart Disease Dataset. On classific a-

tion using PPM-V A and P E-SSTD, the execution time for obtaining two features were observed to be 0.40 ms

and 0.51 ms, whereas using RS-RMC, the execution time reduced to 0.35 ms.

Figure 3 shows the time taken to obtain different features with differing sizes where features ranging from 2

to 13 were considered for experimental settings. As it can be seen the execution time steeply increases as the

number of features increases, regardless of the method applied. This is because an increasing fraction of the

features capacity is employed to obtain the features and diagnose disease at an early stage.

For instance wit h fo ur fea tur e s (i. e., Chest Pain, Blood Pressure, Blood Sugar, Heart Rate), the execution time

is 0.58 ms using RS-RMC framework, whereas 0.66 ms and 0.70 ms using the existing PPM-VA a nd P E -SSTD

respectively. Moreover, the execution time is comparatively minimized using RS-RMC compared to the other

methods which are demonstrated in Figure 3. This is because by applying Rough Set model that only selects

those features that are highly essential and therefore reduces the redundant disease features. Therefore using

RS-RMC the execution time for o btaining the features is reduced by 12.93% compared to PPM-VA. In a similar

manner, by applying inter values to the disease features reduces the execution time by 31.45% compared to

PE-SSTD.

Figure 3 . Measure o f execution time.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Execution time (ms)

Features

RS-RMC

PPM-VA

PE-SSTD

S. Prema, P. Umamaheswari

706

3.2. False Positive Rate

The false posit ive rate on disease diagnosis is the ratio o f absent e vents (i.e. , disea se diagnosis) that yield posi-

tive test (i.e. , identified as disease though not) outcomes. Therefore, False Positive Rate (FPR) is the ratio of

number of false positives to the total patients for conducting experiments. The mathematical formulation for

FPR i s given be l ow

( )

No of false positivesidentified with disease

FPR Total patientsdiagnosed with disease

(9)

From (9), denotes the false positive rate. Lower the false positive rate, more efficient the method is said to be

and is measured in terms of percentage (%).

Figure 4 shows the false positive rate under different simulation setting. The experiments were conducted

with different number of patients and the FPR for the correspo nding was measured. From Figure 4 we can see

that the value of FPR is comparatively lower in RS-RMC than the other two methods PPM-VA and PE-SST D.

Figure 4 illustrates the i mpact of false positi ve rate and compared with t wo state-of-the-art works for 30 pa-

tients. Figure 4 compares all the performance improvement based on the false positives provided by the three

methods. As the number of patient increases, small rise and small fall off value is recorded irrespective of the

methods used. But, RS-RMC recorded low FPR with the application of Feature Reduct in Rough Set model. By

applying the Feature Reduct in Rough Set model, false positive rate for disease feature identification is reduced

and therefore the disease prediction rate is also minimized. In addition by applying the Feature Reduct disease

feat ure set is red uced witho ut losi ng the infor matio n. Thi s sub seque ntly he lps i n reduc ing t he false posit ive ra te

by 36.48% compared to PPM-VA and 74.54% compared to PE-SSTD respectively.

3.3. Disease Prediction Rate

Disease prediction rate measures the rate of disease being predicted correctly without any assumption. The Dis-

ease Prediction Rate (DPR) is the ratio of successful prediction of disease to the total number of patients and is

given as be lo w.

Successfully predicted as disease

DPR Total patients

(10)

The performance of the different disease diagnosis method for Cleveland Heart Disease dataset is shown in

Figure 5. It is observed that by applying RS-RMC, the disease prediction rate is increased by 6.41% to 30.64%

with that of PPM-VA and PE-SST D.

Figure 4 . Measure o f false positive rate with respect to patients.

False Positi ve Rate (%)

No. of patients

RS-RMC

PPM-VA

PE-SSTD

S. Prema, P. Umamaheswari

707

Figure 5 . Measure o f d isease pred iction rate.

Figure 5 shows the comparison of the disease prediction rate of traces with number of patients ranging from 5

to 30 and applied in Matlab. Five features were considered for obtaining the disease prediction using the three

methods RS-RMC, PPM-VA and PE-SSTD respectively. From the figure, the value of disease prediction rate

achieved using the proposed RS-RMC framework is higher when compared to two other existing techniques

namely, PPM-VA [1] and P E-SSTD [2]. Besides we can also observe that by increasing the number of patients

who provide their disease features, the disease prediction rate is increased using all the methods. But compara-

tively, it is higher in RS -RMC framework because the relationship between disease features is efficiently identi-

fied using Naïve Bayes Classifier algorithm. By applying Naïve Bayes Classifier algorithm, e fficient identifica-

tion of classes through maximum posterior hypothesis is evaluated that helps in measuring the relationship be-

tween disease features and significantly improves the disease prediction rate by 18.52% and 38.74% compared

to PPM-VA a n d PE-SSTD respectively.

3.4. Classification Accuracy

Figure 6 shows the classification accuracy using RS-RM, PPM-VA and PE-SSTD respectively. To extract the

classification accuracy, 30 patients with 20 female and 10 male patients in the age group of 40 - 55 years were

considered.

Figure 6 sh ows the classif ication accuracy recorded using the three methods RS-RMC, PPM-VA and PE-SSTD.

From the fig ur e it is illustrative that the classification accuracy is improved in the prop osed RS-RM C fr amework

compared to two other methods. RS-RMC offers an improved disease diagnosis model by increasing disease

prediction rate and tru e positiv e rate an d decreasin g t he f alse positi v e rate f or disease diag n osis m odel. Unl ike th e

existing methods, RS-RMC used Jelinik Mercer where relationship between the features are efficiently identified

and obtain approximation function for multitude of disease features. The RS-RMC framework i mprove i t s clas-

sification accuracy by reducing the execution time for obtaining the features by 55.51% and by handling over half

of its disease feature set in effective disease diagnosis.

4. Conclusion

In this paper, we considered the design of a Multitude Classifier Disease Diagnosis framework to improve dis-

ease prediction rate and class accuracy in the field of medical domain is presented. A Multitude Classifier

framework is introduced, and considered the problem of efficient disease diagnosis in that framework. The

RS-RMC fr ame work o ffers le ss fa lse po sitive r ate with lesser execution time using Rough Set Feature Selection

and Feature Reduct model. Analysis of disease prediction rate demonstrates that RS-RMC framework provides

higher heart disease prediction rate with the aid of Naïve Bayes Classifier algorithm. Finally, Jelinik Mercer in

RS-RMC framework significantly classifies Multitude Classifier using approximation function for multitude of

disease features. The performance of RS-RMC framework was compared to other disease diagnosis model,

Disease Prediction Rate (%)

No. of patients

RS-RMC

PPM-VA

PE-SSTD

S. Prema, P. Umamaheswari

708

Figure 6 . Measure of classificat ion accurac y.

PPM-VA and PE-SSTD respectively. We compared the performance with many different system parameters,

and evaluated the performance in terms of different metrics, such as execution time, disease prediction rate, false

positive rate and classification accuracy. The results show that RS-RMC framework offers better performance

with an improvement of classification accuracy by 55.51% and disease prediction rate by 28.63% compared to

PPM-VA and PE-SSTD respectively.

References

[1] Koppikar, S., Baranch uk, A., Guzmán , J.C. and Morillo, C.A. (2013) Stro ke and Ventricular Arrhythmias. Internation-

al Journ al of Cardiology, Elsevier, 7.

[2] Kasab ov, N., Feigin, V., Hou, Z.-G., Chen, Y.X. , Li an g , L., Krishnamurthi, R., Othman, M. and P armar, P. (2014)

Evol vin g Spi kin g Neural Ne t wor ks for P erson alised Modelling, Classification and Pred iction of Spatio-Temporal Pat-

terns with a Case Study on Stro ke. Neu ro Co mpu ti ng, El sevier, V ol. 134, 269-27 9.

[3] Meyer, K.C. (2014) Diagnosis and Man agement of Interstitial Lung Disease. Springer Open Journal.

[4] Prashanth, R., Roy, S.D., Mandal, P.K. and Ghosh, S. (2014) Automatic Classification and P red iction M od els for Early

Par kin son ’s Disease Diagnosis fro m SPECT Imag ing. Exp ert S ystems with Applications, Elsevier, 41, 33 33 -3342.

http://dx.doi.org/10.1016/j.eswa.2013.11.031

[5] Ch awla, L.S., Herzog, C.A., Costanzo, M.R., Tumlin, J., Kellum, J.A., McCullough, P.A. and Ronco, C. (2014) Pro-

posal for a Funct ion al Classification System of Heart Failure in Patients with End-Stage Ren al Disease. Journal of the

American College of Cardiology, Elsevier, 63, 1246-1252. http://dx.doi.org/10.1016/j.jacc.2014.01.020

[6] de Dios, C., Goikol ea, J.M., Colomb, F., Morenoc, C. and Vietab, E. (20 14) Bipolar Disorders in the New DSM-5 and

ICD-11 Classifications. Elsevier, 7, 179-185.

[7] Ghwa nme h, S., Mohammad, A. and Al-Ibrahim, A. (2013) Innovati ve Artificial Neural Netwo rks -Based Decisio n

Support System for H eart Diseas e s Diagnosi s. Journal of Intelligent Learning Systems and Applications, 5, 176-183.

http://dx.doi.org/10.4236/jilsa.2013.53019

[8] Jabb ar, M.A., Deekshatu lu, B.L. an d Chandra, P. (2013) Classification of Heart Disease Using Artificial Neural Net-

wor k and Featur e Subset Selection . Global Journal of Computer Science and Technology Neural & Artificial Intelli-

gence, 13 , 5-14 .

[9] Hecker s, S., Barch , D.M., Bustillo, J., Gaebel, W., Gur, R., Malaspin a, D., Owen , M.J., S chultz, S., Tandon, R.,

Tsuang, M., Van Os, J. and Carpenter, W. (2013) Structure of the Psychotic Disorders Classification in DSM 5. Sch i-

zophrenia Research, Elsevier .

[10] Frank, A. and Asuncion, A. (2010) UCI Machine Learning Repository. Universit y of Californ ia, School of Information

and Computer Science, Irvine. http://archive.ics.uci.edu/ml

050 100

RS-RMC

PPM-VA

PE-SSTD

Classification accuracy

Classification

accuracy