In medical imaging, Computer Aided Diagnosis (CAD) is a rapidly growing dynamic area of research. In recent years, significant attempts are made for the enhancement of computer aided diagnosis applications because errors in medical diagnostic systems can result in seriously misleading medical treatments. Machine learning is important in Computer Aided Diagnosis. After using an easy equation, objects such as organs may not be indicated accurately. So, pattern recognition fundamentally involves learning from examples. In the field of bio-medical, pattern recognition and machine learning promise the improved accuracy of perception and diagnosis of disease. They also promote the objectivity of decision-making process. For the analysis of high-dimensional and multimodal bio-medical data, machine learning offers a worthy approach for making classy and automatic algorithms. This survey paper provides the comparative analysis of different machine learning algorithms for diagnosis of different diseases such as heart disease, diabetes disease, liver disease, dengue disease and hepatitis disease. It brings attention towards the suite of machine learning algorithms and tools that are used for the analysis of diseases and decision-making process accordingly.
Artificial Intelligence can enable the computer to think. Computer is made much more intelligent by AI. Machine learning is the subfield of AI study. Various researchers think that without learning, intelligence cannot be developed. There are many types of Machine Learning Techniques that are shown in
and Deep Learning are the types of machine learning techniques. These techniques are used to classify the data set.
1) Supervised learning: Offered a training set of examples with suitable targets and on the basis of this training set, algorithms respond correctly to all feasible inputs. Learning from exemplars is another name of Supervised Learning. Classification and regression are the types of Supervised Learning.
Classification: It gives the prediction of Yes or No, for example, “Is this tumor cancerous?”, “Does this cookie meet our quality standards?”
Regression: It gives the answer of “How much” and “How many”.
2) Unsupervised learning: Correct responses or targets are not provided. Unsupervised learning technique tries to find out the similarities between the input data and based on these similarities, un-supervised learning technique classify the data. This is also known as density estimation. Unsupervised learning contains clustering [
Clustering: it makes clusters on the basis of similarity.
3) Semi supervised learning: Semi supervised learning technique is a class of supervised learning techniques. This learning also used unlabeled data for training purpose (generally a minimum amount of labeled-data with a huge amount of unlabeled-data). Semi-supervised learning lies between unsupervised-learning (unlabeled-data) and supervised learning (labeled-data).
4) Reinforcement learning: This learning is encouraged by behaviorist psychology. Algorithm is informed when the answer is wrong, but does not inform that how to correct it. It has to explore and test various possibilities until it finds the right answer. It is also known as learning with a critic. It does not recommend improvements. Reinforcement learning is different from supervised learning in the sense that accurate input and output sets are not offered, nor sub- optimal actions clearly précised. Moreover, it focuses on on-line performance.
5) Evolutionary Learning: This biological evolution learning can be considered as a learning process: biological organisms are adapted to make progress in their survival rates and chance of having off springs. By using the idea of fitness, to check how accurate the solution is, we can use this model in a computer [
6) Deep learning: This branch of machine learning is based on set of algorithms. In data, these learning algorithms model high-level abstraction. It uses deep graph with various processing layer, made up of many linear and nonlinear transformation.
Pattern recognition process and data classification are valuable for a long time. Humans have very strong skill for sensing the environment. They take action against what they perceive from environment [
Initially, algorithms of ML were designed and employed to observe medical data sets. Today, for efficient analysis of data, ML recommended various tools. Especially in the last few years, digital revolution has offered comparatively low- cost and obtainable means for collection and storage of data. Machines for data collection and examination are placed in new and modern hospitals to make them capable for collection and sharing data in big information systems. Technologies of ML are very effective for the analysis of medical data and great work is done regarding diagnostic problems. Correct diagnostic data are presented as a medical record or reports in modern hospitals or their particular data section. To run an algorithm, correct diagnostic patient record is entered in a computer as an input. Results can be automatically obtained from the previous solved cases. Physicians take assistance from this derived classifier while diagnosing novel patient at high speed and enhanced accuracy. These classifiers can be used to train non- specialists or students to diagnose the problem [
In past, ML has offered self-driving cars, speech detection, efficient web search, and improved perception of the human generation. Today machine learning is present everywhere so that without knowing it, one can possibly use it many times a day. A lot of researchers consider it as the excellent way in moving towards human level. The machine learning techniques discovers electronic health record that generally contains high dimensional patterns and multiple data sets. Pattern recognition is the theme of MLT that offers support to predict and make decisions for diagnosis and to plan treatment. Machine learning algorithms are capable to manage huge number of data, to combine data from dissimilar resources, and to integrate the background information in the study [
Many researchers have worked on different machine learning algorithms for disease diagnosis. Researchers have been accepted that machine-learning algorithms work well in diagnosis of different diseases. Figurative approach of diseases diagnosed by Machine Learning Techniques is shown in
Otoom et al. [
experimenting Holdout test, 88.3% accuracy is attained by using SVM technique. In Cross Validation test, Both SVM and Bayes net provide the accuracy of 83.8%. 81.5% accuracy is attained after using FT. 7 best features are picked up by using Best First selection algorithm. For validation Cross Validation test are used. By applying the test on 7 best selected features, Bayes Net attained 84.5% of correctness, SVM provides 85.1% accuracy and FT classify 84.5% correctly.
Vembandasamy et al. [
Use of data mining approaches has been suggested by Chaurasia and Pal [
Parthiban and Srivatsa [
Tan et al. [
Analysis:
In existing literature, SVM offers highest accuracy of 94.60% in 2012 as in
Machine Learning Techniques | Author | Year | Disease | Resources of Data Set | Tool | Accuracy |
---|---|---|---|---|---|---|
Bayes Net | Otoom et al. | 2015 | CAD (Coronary artery disease) | UCI | WEKA | 84.5% |
SVM | 85.1% | |||||
FT | 84.5% | |||||
Naive Bayes | Vembandasamy et al. | 2015 | Heart Disease | Diabetic Research Institute in Chennai | WEKA | 86.419% |
Naive Bayes | Chaurasia and Pal | 2013 | Heart Disease | UCI | WEKA | 82.31% |
J48 | 84.35% | |||||
Bagging | 85.03% | |||||
SVM | Parthiban and Srivatsa | 2012 | Heart disease | Research institute in Chennai | WEKA | 94.60% |
Naive Bayes | 74% | |||||
Hybrid Technique (GA + SVM) | Tan et al. | 2009 | Heart disease | UCI | LIBSVM and WEKA | 84.07% |
Advantages and Disadvantages of SVM:
Advantages: Construct correct classifiers and fewer over fitting, robust to noise.
Disadvantages: It is a binary classifier. For the classification of multi-class, it can use pair wise classification. Its Computational cost is high, so it runs slow [
Iyer et al. [
Meta learning algorithms for diabetes disease diagnosis has been discussed by Sen and Dash [
An experimental work to predict diabetes disease is done by the Kumari and Chitra [
Sarwar and Sharma [
Ephzibah [
Analysis:
Naive Bayes based system is helpful for diagnosis of Diabetes disease. Naive Bayes offers highest accuracy of 95% in 2012. The results show that this system can do good prediction with minimum error and also this technique is important to diagnose diabetes disease. But in 2015, accuracy offered by Naive Bayes is low. It presents 79.5652% or 79.57% accuracy. This proposed model for detection of Diabetes disease would require more training data for creation and testing.
Advantages and Disadvantages of Naive Bayes:
Advantages: It enhances the classification performance by eliminating the unrelated features. Its performance is good. It takes less computational time.
Machine Learning Techniques | Author | Year | Disease | Resource of Data Set | Tool | Accuracy |
---|---|---|---|---|---|---|
Naive Bayes | Iyer et al. | 2015 | Diabetes Disease | Pima Indian Diabetes dataset | WEKA | 79.5652% |
J48 | 76.9565% | |||||
CART | Sen and Dash | 2014 | Diabetes Disease | Pima Indian Diabetes dataset from UCI | WEKA | 78.646% |
Adaboost | 77.864% | |||||
Logiboost | 77.479% | |||||
Grading | 66.406% | |||||
SVM | Kumari and Chitra | 2013 | Diabetes Disease | UCI | MATLAB 2010a | 78% |
Naive Bayes | Sarwar and Sharma | 2012 | Diabetes type-2 | Different Sectors of Society in India | MATLAB with SQL Server | 95% |
GA + Fuzzy Logic | Ephzibah | 2011 | Diabetes disease | UCI | MATLAB | 87% |
Disadvantages: This algorithm needs large amount of data to attain good outcomes. It is lazy as they store entire the training examples [
Vijayarani and Dhayanand [
A study on intelligent techniques to classify the liver patients is performed by the Gulia et al. [
Rajeswari and Reena [
Analysis:
To diagnose liver disease, FT Tree Algorithm provides the highest result as compare to the other algorithms. When FT tree algorithm is applied on the dataset of liver disease, time taken for result or building the model is fast as compared to other algorithms. According to its attribute, it shows the improved performance. This algorithm fully classified the attributes and offers 97.10% correctness. From the results, this Algorithm plays an important role in determining enhanced classification accuracy of data set. Accuracy graph of algorithms are shown in
Advantages and Disadvantages of FT:
Advantage: Easy to interpret and understand; Fast prediction.
Disadvantage: Calculations are complex mainly if values are uncertain or if several outcomes are linked.
Tarmizi et al. [
Machine Learning Techniques | Author | Year | Disease | Resource of Data Set | Tool | Accuracy |
---|---|---|---|---|---|---|
SVM | Vijayarani and Dhayanand | 2015 | Liver Disease | ILPD from UCI | MATLAB | 79.66% |
Naive Bayes | 61.28% | |||||
J48 | Gulia et al. | 2014 | Liver Disease | UCI | WEKA | 70.669% |
MLP | 70.8405% | |||||
Random Forest | 71.8696% | |||||
SVM | 71.3551% | |||||
Bayesian Network | 69.1252% | |||||
Naive Bayes | Rajeswari and Reena | 2010 | Liver Disease | UCI | WEKA | 96.52% |
K Star | 83.47% | |||||
FT tree | 97.10% |
by using the Models of Data Mining. Dengue is becoming a severe contagious disease. It creates trouble in those countries where weather is humid for example Thailand, Indonesia and Malaysia. Decision Tree (DT), Artificial Neural Network (ANN), and Rough Set Theory (RS) are the classification algorithms that are used in this study to predict dengue disease. Data set are taken from Public Health Department of Selangor State. WEKA data mining tool with two tests (10 Cross-fold Validation and Percentage split) is used. By using 10-Cross fold validation DT offers 99.95% accuracy, ANN presents 99.98% of Correctness and RS shows 100% accuracy. After using PS, Both Decision tree and Artificial Neural Network gives 99.92% of correctness. RS achieves 99.72% accuracy.
Fathima and Manimeglai [
Ibrahim et al. [
Analysis:
Different Machine learning techniques are used to diagnose dengue disease. Dengue disease is one of the serious contagious diseases. As in
Machine Learning Techniques | Author | Year | Disease | Resource of Data Set | Tool | Accuracy |
---|---|---|---|---|---|---|
DT | Tarmizi et al. | 2013 | Dengue Disease | Public Health Department of Selangor State | WEKA | 99.95% |
ANN | 99.98% | |||||
RS | 100% | |||||
SVM | Fathima and Manimeglai | 2012 | Arbovirus-Dengue disease | King Institute of Preventive Medicine and surveys of many hospitals and laboratories of Chennai and Tirunelveli from India | R project Version 2.12.2 | 90.42% |
MFNN | Ibrahim et al. | 2005 | Dengue disease | From 252 hospitalized patients | MATLAB neural network Tool box | 90% |
other algorithms. In 2005 and 2012, researchers used different algorithms but did not attain highest result and improvements. In 2013, accuracy is improved by using RS. It is capable to manage uncertainty, noise and missing data. For the purpose of classification, Developed RS classifier is based on the Rough set theory. Selection of attribute empowers the classifier to surpass the other models. RS is a promising rule based method that offers meaningful information. RS is also best from neural network in term of time. NN takes much time to build model. DT is complex as well as costly algorithm. RS does not need any initial and additional information about data but Decision tree needs information.
Advantages and Disadvantages of RS:
Advantages: It is very easy to understand and provides direct understanding of attained result. It evaluates data significance. It is appropriate for both qualitative and quantitative data. It discovers the hidden patterns. It also finds minimal set of data. It can find relationship that cannot be identified by statistical methods.
Disadvantages: It has not so many limitations still it is not widely used.
Ba-Alwi and Hintaya [
Karlik [
Sathyadevi [
Analysis:
Many algorithms have been used for diagnosis of different diseases.
Advantages and Disadvantages of NN:
Advantages: Adaptive Learning, Self-Organization, Real Time Operation Fault Tolerance via Redundant Information Coding.
Disadvantages: Less over fitting needs great computational effort. Sample Size must be large. It’s time consuming. Engineering Judgment does not develop the relations between input and output variables so that the model behaves like a black box [
For diagnosis of Heart, Diabetes, Liver, Dengue and Hepatitis diseases, several machine-learning algorithms perform very well. From existing literature, it is observed that Naive Bayes Algorithm and SVM are widely used algorithms for
Machine Learning Techniques | Author | Year | Disease | Resource of Data Set | Tool | Accuracy |
---|---|---|---|---|---|---|
Naive Bayes | Ba-Alwi and Hintaya, | 2013 | Hepatitis Disease | UCI | WEKA | 96.52% |
Naive Bayes updateable | 84% | |||||
FT | 87.10% | |||||
K Star | 83.47% | |||||
J48 | 83% | |||||
LMT | 83.6% | |||||
NN | 70.41% | |||||
Naive Bayes | Karlik | 2011 | Hepatitis Disease | UCI | Rapid Miner | 97% |
Feed forward NN with Back propagation | 98% | |||||
C4.5 | Sathyadevi | 2011 | Hepatitis Disease | UCI | WEKA | 71.4% |
ID3 | 64.8% | |||||
CART | 83.2% |
detection of diseases. Both algorithms offer the better accuracy as compare to other algorithms. Artificial Neural network is also very useful for prediction. It also shows the maximum output but it takes more time as compared to other algorithms. Trees algorithm are also used but they did not attain wide acceptance due to its complexity. They also shows enhanced accuracy when it responded correctly to the attributes of data set. RS theory is not widely used but it presents maximum output.
Statistical models for estimation that are not capable to produce good performance results have flooded the assessment area. Statistical models are unsuccessful to hold categorical data, deal with missing values and large data points. All these reasons arise the importance of MLT. ML plays a vital role in many applications, e.g. image detection, data mining, natural language processing, and disease diagnostics. In all these domains, ML offers possible solutions. This paper provides the survey of different machine learning techniques for diagnosis of different diseases such as heart disease, diabetes disease, liver disease, dengue and hepatitis disease. Many algorithms have shown good results because they identify the attribute accurately. From previous study, it is observed that for the detection of heart disease, SVM provides improved accuracy of 94.60%. Diabetes disease is accurately diagnosed by Naive Bayes. It offers the highest classification accuracy of 95%. FT provides 97.10% of correctness for the liver disease diagnosis. For dengue disease detection, 100% accuracy is achieved by RS theory. The feed forward neural network correctly classifies hepatitis disease as it provides 98% accuracy. Survey highlights the advantages and disadvantages of these algorithms. Improvement graphs of machine learning algorithms for prediction of diseases are presented in detail. From analysis, it can be clearly observed that these algorithms provide enhanced accuracy on different diseases. This survey paper also provides a suite of tools that are developed in community of AI. These tools are very useful for the analysis of such problems and also provide opportunity for the improved decision making process.
Fatima, M. and Pasha, M. (2017) Survey of Machine Learning Algorithms for Disease Diagnostic. Journal of Intelligent Learning Systems and Applications, 9, 1-16. https://doi.org/10.4236/jilsa.2017.91001