
510 B. Keltch et al. / J. Biomedical Science and Engineering 3 (2010) 509-516
Copyright © 2010 SciRes. JBiSE
with tasks like diagnosis [3].
Today the importance of diagnosis as a task requiring
computer support in routine clinical situations receives
much less emphasis. The strict focus on the medical set-
ting has now broadened across the healthcare spectrum,
and instead of AI Medical systems, it is more typical to
describe them as CDSS [3].
In our project, we evaluated and compared two tech-
niques that will be core forecasting tools in a CDSS,
which are Data Mining and Neural Networks.
The main purpose of doing data mining and knowledge
discovery on the medical database is to predict disease
and disease classification. Classification and prediction
are two forms of data analysis which can be used to de-
scribe the model of the important data type or predict the
future trends of the data [5].
Commonly used data mining algorithms are: associa-
tion rules, decision trees, rough sets, statistical analysis,
neural networks, support vector machines, fuzzy clus-
tering, Case-Based Reasoning (CBR), Bayesian fore-
casting and visualization technology [6]. The common
methods used in auxiliary diagnosis of clinical disease
are 1) Bayes discriminate analysis 2) artificial neural
network 3) decision tree.
However within the context of the study, the focus
will be concentrated on decision tree and neural net-
work.
2.1. Decision Tree
The decision tree is a very efficient machine learning
classification algorithm. It is the origin in the concept
of learning systems CLS, and then progress to ID3
method. In the end, it evolved to c5.0 which can han-
dle continuous attributes. Well-known decision tree
methods are CART and Assistant [5].
Decision tree learning uses a decision tree as a pre-
dictive model which maps observations about an item
to conclusions about the item’s target value. In these
tree structures, leaves represent classifications and bra-
nches represent conjunctions of features that lead to
those classifications. One of the biggest advantages of
the method is that the learning process does not require
the user to understand a lot of background knowledge
[6].
Nonetheless, data mining is a complex process to
identify the useful information from large data sets.
Although it is common to focus on the development,
analysis and application of algorithms, the data selec-
tion and data pre-processing are the most timecon-
suming activity in the entire data mining process,
which affects the process and results [6].
2.2. Neural Network
Artificial Neural Networks have been proven to build
efficient rule extraction/classification and forecasting
applications. They provide a powerful non-linear ma-
chine learning techniques and are able to extract rele-
vant features from large data sets. They are able to uti-
lize and compare equally quantitative and qualitative
data which is common in the clinical environments [7].
Neural Networks can handle redundant features as wei-
ghts are learned from the training data.
The primary disadvantage of neural networks is that
they will always arrive at a solution for any data set.
This means that the quality of the resulting model is
highly dependent on the quality and breath of the trai-
ning data. It is easy to over-train the model, so that it
can only predict the training data set. This can be mi-
nimized by assuring that convergence stopping stra-
tegies are effective and that the training data set is rep-
resentative of the solutions space. Also, the resulting
neural network solution is simply a node structure with
inter-node weights. This requires validation of the out-
put by additional statistical methods (i.e. decision trees)
that are more understandable by subject matter experts.
3. AI ASSISTED CLINICAL DECISION
SUPPORT SYSTEM
This study will focus on the demonstration and incorpo-
ration of neural network and decision tree techniques
into a Clinical Decision Support System. These two AI
techniques were selected because of their complemen-
tary attributes. Neural networks provide little definition
of their predictive result, while decision tree output pro-
vides clear connections to historical data. Both methods
will provide different information to the physician. Other
AI methods should be considered in future work.
The database we plan on using for this study was col-
lected at Chiba University hospital in Japan, and is a
Practice of Knowledge Discovery in Databases (PKDD)
2005 Discovery Challenge dataset [11]. The data set
contains patient data, laboratory data, and liver biopsies
data on 771 hepatitis B and C patients. The goal will be
to evaluate whether laboratory examinations can be used
to estimate the stage of liver fibrosis. If this is possible,
physicians may be able to use laboratory examinations
as substitutes for biopsies and to aid in the treatment
scenario. Liver biopsy is an invasive procedure and en-
tails risk to patients. Decision tree and neural network
methods based on a historical dataset will aid physicians
in the development of treatment plans for Hepatitis pa-
tients.
The overall architectural representation of the system
is shown in Figure 1.
The system is designed to be utilized by a physician to
assist them in the development of a treatment plan for
Hepatitis B and Hepatitis C patients. This system ad-
dresses an important question for the treatment – “Sho-
uld a liver biopsy procedure be conducted?” This proce-
dure provides important information on the advancement