Comparison of Various Classification Techniques Using Different Data Mining Tools for Diabetes Diagnosis 87

examples of class 1 and 268 of class 2.

This data set is extracted from a larger database origin-

nally owned by the National Institute of Diabetes and

Digestive and Kidney Diseases. The purpose of the study

is to investigate the relationship between the diabetes

diagnostic result and a list of variables that represent

physiological measurements and medical attributes. The

data set in the UCI repository contains 768 observations

and 9 variables with no missing values reported. How-

ever, as some researchers point out, there are a number of

impossible values, such as 0 body mass index and 0

plasma glucose. Furthermore, one attribute (2-hour se-

rum insulin) contains almost 50% impossible values. To

keep the sample size reasonably large, this attribute is

removed from analysis. There are 236 observations that

have at least one impossible value of glucose, blood

pressure, triceps skin thickness, and body mass index.

There are nine variables, including the binary response

variable, in this data set; all other attributes are numeric-

valued. The attributes are given below:

1) Number of times pregnant

2) Plasma glucose concentration a 2 hours in an oral

glucose tolerance test

3) Diastolic blood pressure (mm Hg)

4) Triceps skin fold thickness (mm)

5) 2-hour serum insulin (mu U/ml)

6) Body mass index (weight in kg/(height in m)^2)

7) Diabetes pedigree function

8) Age (years)

9) Class variable (0 or 1)

4. Methodology

We use different classification techniques in this research.

Those techniques with running parameters are given

below:

4.1. Multilayer Perceptron

Multilayer perceptron (MLP) [11] is one of the most

commonly used neural network classification algorithms.

The architecture used for the MLP during simulations

with PIDD dataset consisted of a three layer feed-for-

ward neural network: one input, one hidden, and one

output layer. Selected parameters for the model are:

learningRate = 0.3/0.15; momentum = 0.2; randomSeed

= 0; validationThreshold = 20, Number of Epochs = 500.

4.2. BayesNet

BayesNet [12] learns Bayesian networks under the pre-

sumptions: nominal attributes (numeric one are pre-de-

scretized) and no missing values (any such values are

replaced globally). There are two different parts for es-

timating the conditional probability tables of the network.

In this study we run BayesNet with the SimpleEstimator

and K2 search algorithm without using ADTree. K2 al-

gorithm is a greedy search algorithm that works as fol-

lows. Suppose we know the total ordering of the nodes.

Initially each node has no parents. The algorithm then

incrementally adds the parent whose addition increases

most of the score of the resulting structure. When no ad-

dition of a single parent can increase the score, it stops

adding parents to the node. Since an ordering of the

nodes is known beforehand, the search space under this

constraint is much smaller than the entire space. And we

do not need to check for cycles, since the total ordering

guarantees that there is no cycle in the deduced structures.

Furthermore, based on some appropriate assumptions, we

can choose the parents for each node independently.

4.3. Naïve Byes

The Naïve Bayes [12] classifier provides a simple ap-

proach, with clear semantics, representing and learning

probabilistic knowledge. It is termed naïve because is

relies on two important simplifying assumes that the pre-

dictive attributes are conditionally independent given the

class, and it assumes that no hidden or latent attributes

influence the prediction process.

4.4. J48graft (C4.5 Decision Tree Revision 8)

Perhaps C4.5 algorithm which was developed by Quinlan

[13] is the most popular tree classifier till today. Weka

classifier package has its own version of C4.5 known as

J48 or J48graft. For this study, C4.5 classifier used in

TANAGRA platform and J48graft classifier used in

WEKA platform. J48graft is an optimized implemen-

tation of C4.5 rev. 8. J48graft is experimented is this

study with the parameters: confidenceFactor = 0.25;

minNumObj = 2; subtreeRaising = True; unpruned =

False. C4.5 is experimented in this study with the pa-

rameters: Min size of leaves = 5; Confidence-Level for

pessimistic = 0.25. Final decision tree built from the al-

gorithm is depicted in Figure 1.

4.5. Fuzzy Lattice Reasoning (FLR)

The Fuzzy Lattice Reasoning (FLR) classifier is pre-

sented for inducing descriptive, decision-making knowl-

edge (rules) in a mathematical lattice data domain in-

cluding space RN. Tunable generalization is possible

based on non-linear (sigmoid) positive valuation func-

tions; moreover, the FLR classifier can deal with missing

data. Learning is carried out both incrementally and fast

by computing disjunctions of join-lattice interval con-

junctions, where a join-lattice interval conjunction cor-

responds to a hyperbox in RN. In this study we evaluated

FLR classifier in WEKA with the parameters: Rhoa = 0.5;

Number of Rules = 2.

Copyright © 2013 SciRes. JSEA