Precise detection of PD is important in its early stages. Precise result can be achieved through data mining, classification techniques such as Naive Bayes, support vector machine (SVM), multilayer perceptron neural network (MLP) and decision tree. In this paper, four types of classifiers based on Naive Bayes, SVM, MLP neural network, and decision tree (j48) are used to classify the PD dataset and the performances of these classifiers are examined when they are implemented upon the actual PD dataset, discretized PD dataset, and selected set of attributes from PD dataset. The dataset used in this study comprises a range of voice signals from 31 people: 23 with PD and 8 healthy people. The result shows that Naive Bayes and decision tree (j48) yield better accuracy when performed upon the discretized PD dataset with cross-validation test mode without applying any attributes selection algorithms. SVM gives high accuracy of 70% for training and 30% for the test when implemented on a discretized PD dataset and a splitting dataset. The MLP neural network gives the highest accuracy when used to classify actual PD dataset without discretization, attribute selection, or changing test mode.
Parkinson’s disease (PD) is a progressive, neurodegenerative disease that belongs to the group of conditions called motor system disorders. Parkinson’s disease sufferers get worse over time as the normal bodily functions, including breathing, balance, movement, and heart function worsen [
Other neurodegenerative disorders include Alzheimer’s disease, Huntington’s disease, and amyotrophic lateral sclerosis or Lou Gehrig’s disease. An estimated seven to 10 million people worldwide are suffering from Parkinson’s disease. Occurrence of Parkinson’s increases with age, but an estimated four percent of people with PD are diagnosed before the age of 50 [
Data mining techniques in medicine is a research area that combines sophisticated representational and computing techniques with the insights of expert physicians to produce tools for improving healthcare. Data mining is a computational process to find hidden patterns in datasets by building predictive or classification models that can be learnt from past experience and applied to future cases. With the vast amount of medical data available to hospitals, medical centers, and medical research organizations, the field of medicine supported by data mining techniques can increase healthcare quality and can help physicians make decisions about their patients’ care. There are various techniques for classification such as support vector machine (SVM), neural networks, decision tree, and Naïve Bayes. The objective of the study is to analyze and compare four of the abovementioned classification techniques’ performances upon Parkinson’s diagnosis. First, we compare the classifiers’ performance on actual and discretized PD dataset and then compare their performance using the attributes selection algorithm.
Several researches have focused on using data mining techniques for the automatic identification of Parkinson’s disease.
Mohammad S. Islam et al. [
Aprajita Sharma and Ram Nivas [
Shian Wu and Jiannjong Guo [
Geetha Ramani and G. Sivagami [
A. H. Hadjahamadi and Taiebeh J. Askari [
Yahia Alemami and Laiali Almazaydeh [
Rashidah et al. [
We conduct an analysis on real world PD data, where the disease is diagnosed using several features extracted from human voice [
These extracted features of human voices are used to diagnose PD and to determine who had actually entered the stages of the disease and who were healthy.
This study applies several classification methods, including Naïve Bayes, SVM, and decision tree (j48) on the PD dataset. The goals of this study are as follows:
1) Examine which of the above classifiers give better performance, when applied to the actual PD dataset.
2) Examine the effects of attributes selection for PD dataset on the performances of the mentioned classifiers.
3) Examine the effects of discretizing PD dataset on the performances of the classifiers.
Attribute selection, is the process of selecting a subset of relevant features for use in model construction. The central assumption when using a feature selection technique is that the data contains many redundant or irrelevant features [
No. | Feature name | Meaning | |
---|---|---|---|
MDVP: Fo (Hz) | F1 | Average vocal fundamental frequency | |
MDVP: Fhi (Hz) | F2 | Maximum vocal fundamental frequency | |
MDVP: Flo (Hz) | F3 | Minimum vocal fundamental frequency | |
MDVP: Jitter (%) | F4 | Jitter as percentage | |
MDVP:Jitter(Abs) | F5 | Absolute jitter in micro second | |
MDVP: RAP | F6 | Amplitude perturbation | |
MDVP: PPQ | F7 | Period perturbation quotient | |
Jitter DDP | F8 | Average absolute differences between cycle, divided by the average period. | |
MDVP: Shimmer | F9 | Local shimmer | |
Shimmer dB | F10 | Local shimmer in decibels | |
Shimmer APQ3 | F11 | 3 points amplitude perturbation quotient | |
Shimmer APQ5 | F12 | 5 points amplitude perturbation quotient | |
MDVP: APQ | F13 | 11 points amplitude perturbation quotient | |
Shimmer DDA | F14 | Absolute differences between the amplitude of consecutive periods. | |
NHR | F15 | Noise-to-harmonic ratio | |
HNR | F16 | Harmonic-to-noise ratio | |
DFA | F17 | Signal fractal scaling exponent | |
PPE | F18 | Pitch period entropy | |
spread1 | F19 | Three nonlinear measures of fundamental frequency variation | |
spread2 | F20 | ||
RPDE | F21 | ||
D2 | F22 | Recurrence period density analysis | |
Status | Health status of the subject (one) Parkinson’s, (zero) healthy |
a continuous-valued attribute consists of transforming it into a finite number of intervals and to re-encode, for all instances, each value of this attribute by associating it with its corresponding interval. There are many ways to realize this process [
Naïve Bayes classifier is used in supervised learning method and it is based on “probability” concept to classify new entities. It assigns a new observation to the most probable class. The classification process comprises two stages as follows [
1) Training stage: Using the training samples, the method computes the probability distribution of that sample.
2) Prediction stage: For test sample, the method computes the posterior probability of that unknown instance. The posterior is predicting that the sample belonging to each class according to the largest posterior probability, which is called Maximum A Posterior (MAP).
It is used in supervised learning models with associated learning algorithms that analyze data and recognize patterns used for classification. Given a set of training samples, each marked as belonging to one of two classes, an SVM training algorithm builds a model that assigns new examples into one class or the other, making it a non-probabil- istic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate classes are divided by a clear gap that is as wide as possible. New examples are then mapped into that space and are predicted to belong to a class based on which side of the gap they fall in [
MLP network comprises three layers. A three-layer MLP network is an entirely linked feed forward neural network consisting of an input layer, which is not calculated because its neurons are only for demonstration and therefore do no processing. In addition, a hidden layer and an output layer (PD or healthy), which correspond to the categorization result [
Decision trees represent a supervised approach to classification. A decision tree is a simple structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes. j48 is modified C4.5. The C4.5 algorithm
generates a classification decision tree for the given dataset by recursive partitioning of data. The decision is grown using depth-first search strategy. The algorithm considers all the possible tests that can split the data set and selects a test that gives the best information gain. For each discrete attribute, one test with outcomes as many as the number of distinct values of the attribute is considered. For each continuous attribute, binary tests involving every distinct value of the attribute are considered. In order to gather the entropy gain of all these binary tests efficiently, the training data set belonging to the considered node is sorted for the values of the continuous attribute. Further, the entropy gains of the binary cut based on each distinct value are calculated in a single pass of the sorted data. This process is repeated for each continuous attributes [
The supervised learning algorithms are applied one after the other. The confusion matrix is a useful tool that determines how well the classifier classifies the instances of different classes. This also shows values such as true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The classifier accuracy is calculated and a comparative study is done to retrieve the best classifier algorithm.
The PD dataset was divided as follows: 70% for training and 30% for testing. The experiment was performed on the abovementioned algorithms as follows:
Apply the abovementioned algorithms one by one on the actual PD dataset without applying filter algorithm. The Naïve Bayes Algorithm classifies the PD dataset and provides 58.6% accuracy. The SVM yields 86% accuracy. The MLP neural network offers 94.8% accuracy. The decision tree (j48) provides 74% accuracy.
Using attributes selection algorithm CfsSubsetEval-BestFirst-D1-N5 to the filter PD dataset, the attributes selected were MDVP: Fo (Hz), MDVP: Fhi (Hz), MDVP: Flo (Hz), MDVP: RAP, MDVP:APQ, NHR, Spread1, Spread2, and D2. The accuracy obtained for this case are: Naïve Bayes, 72.4%; MLP neural network, 91.3%; SVM, 86.2%; and decision tree (j48), 82.7%.
Classifier | Number of tested Instances | Confusion Results | Accuracy | |||
---|---|---|---|---|---|---|
FN | FP | TN | TP | |||
Naïve bayes | 58 (30%) | 23 | 1 | 10 | 24 | 58.6% |
SVM | 58 (30%) | 1 | 7 | 4 | 46 | 86.2% |
MLP | 58 (30%) | 1 | 2 | 9 | 46 | 94.8% |
Decision tree | 58 (30%) | 6 | 9 | 5 | 38 | 74% |
Classifier | Number of tested Instances | Confusion Results | Accuracy | |||
---|---|---|---|---|---|---|
FN | FP | TN | TP | |||
Naïve bayes | 58 (30%) | 14 | 2 | 9 | 33 | 72.4% |
SVM | 58 (30%) | 1 | 7 | 4 | 46 | 86.2% |
MLP | 58 (30%) | 0 | 5 | 6 | 47 | 91.3% |
Decision tree | 58 (30%) | 7 | 3 | 8 | 40 | 82.7% |
Applying the classifiers on discretized PD dataset, we obtained different values of accuracy: Naïve Bayes, 79.3%; MLP, 94.8%; SVM, 96.5%; and decision tree (j48), 89.6%.
When test mode is changed, the classifiers give different values of accuracy. Using cross validation test mode instead of presenting the split of dataset between training and test set, lead to significant change in the accuracy of some classifiers while others showed no change.
As a result, we conclude the following:
Naïve Bayes gives better performance when it implemented on the discretized PD dataset with cross-validation test mode, yielding 84.6%, which is the best accuracy obtained compared with its performance when implemented on the actual PD data and on selected attributes from PD data.
SVM yields 96.5%, which is a high accuracy when implemented on discretized PD data and percentage spilt test mode (70% training, 30% test).
Decision Tree (j48) gives better performance when implemented on discretized PD data yielding 89.6%. Its performance can be enhanced using cross-validation test mode, through which it yields 92.3%.
The results show that the best performance can be obtained by MLP neural network for both actual and discretized PD data, i.e., 94.8%. Moreover, the attributes selection algorithm and cross-validation test model had no significant effect on MLP performance when it is used in PDclassification (
The aim of this study was to recognize how different classifiers would perform when implemented across the PD dataset and to evaluate their performance and examine the effectiveness of attribute selection, discretization, and test mode on the selected classifier performance when implemented on the PD dataset. A comparative study of Naïve Bayes, SVM, MLP, and decision tree (j48) classifiers on PD dataset is performed. This is done by implementing the classifiers upon the following datasets:
Actual PD dataset.
Discretized PD dataset.
Classifier | Number of tested Instances | Confusion Results | Accuracy | |||
---|---|---|---|---|---|---|
FN | FP | TN | TP | |||
Naïve bayes | 58 (30%) | 9 | 3 | 8 | 38 | 79.3% |
SVM | 58 (30%) | 1 | 1 | 10 | 46 | 96.5% |
MLP | 58 (30%) | 3 | 1 | 11 | 44 | 94.8% |
Decision tree | 58 (30%) | 4 | 2 | 9 | 43 | 89.6% |
Classifiers | Testing mode: 10-fold cross-validation | ||
---|---|---|---|
Actual PD dataset | Applying discretization | Applying attributes selection | |
Naïve Bayes | 69% | 84.6 | 77.9 |
SVM | 87.6% | 93.8% | 87.1% |
Neural network | 91% | 91% | 90% |
Decision Tree | 85.6% | 92.3% | 87.6% |
Selected set of attributes from PD dataset.
Shifting between percentage split and 10-fold cross validation test modes.
From the experimental result, we conclude that Naïve Bayes and decision tree (j48) yield better accuracy when implemented upon the discretized PD dataset with cross- validation test mode without applying any attributes selection algorithms. SVM gives high accuracy when implemented on discretized PD dataset and splitting dataset (70% for training and 30% for test). The MLP neural network gives the highest accuracy when used to classify actual PD dataset without discretization, attribute selection, or by changing test mode.
In conclusion, data discretization enhanced the performance of all classifiers except MLP. Attribute selection algorithm increases only the performance of Naive Bayes and Decision Tree (j48). The training methods had no significant impact on all classifiers performances.
Mohamed, G.S. (2016) Parkinson’s Disease Diagnosis: De- tecting the Effect of Attributes Selection and Discretization of Parkinson’s Disease Data- set on the Performance of Classifier Algori- thms. Open Access Library Journal, 3: e3139. http://dx.doi.org/10.4236/oalib.1103139