
J. B. MACHADO, S. DO LAGO PEREIRA 293
Does it rain?
Yes
o
Take umbrella Don’t take umbrella
Figure 2. A decision tree for the “umbrella problem”.
input variable. The recursion terminates when the subset
at a node has all the same value for the target variable, or
when splitting no longer enhance the predictions. This
process of top-down partitioning is a kind of greedy al-
gorithm, and is the most common strategy for learning
decision trees from data. After construction and valida-
tion, the resulting d ecision tree can be used to emulate an
efficient making decision process.
To guarantee the efficiency of the decision tree, the
inductive learning algorithm uses the concepts of en tropy
and information gain [4] to choose the input variable to
label each nonterminal node. The entropy is a measure
based on the occurrence probability of each possible event
(i.e., values of the input variables). The information gain
represents the estimated reduction on the entropy value
resulting from the partition of the set of examples, ac-
cording to the values of the input variable selected to
label a node.
Formally, entropy and information gain can be defined
as follow. Let E be a training dataset with examples of
the form
12
,,,,
m
xxy, where each xi is the value of
an input variable vi, for 1, and y is the target
variable value. Also, for a given input variable vi with k
possible values, let pi be the proportion of tuples in E
where the input variable vi has value xi. The information
gain g for an input variable vi is defined in terms of en-
tropy h as follows:
im
2
1
log
j
ii
i
hEp p
1
,ij
ij
kvx
i
j
E
gEv hEhE
E
v
x
The information gain is equal to the total entropy for
an input variable if and only if, for each value of that
variable, the target variab le has the same value.
3.2. Expert Systems
In artificial intelligence, an expert system [13,14] is a
computer program that emulates the ability of decision
making of a human expert. In fact, by reasoning over
facts and rules available in a knowledge base, an expert
system is capable of solving very complex problems.
The standard architecture of an expert system (Figure
1) consists of a user interface that allows the communi-
cation with the user, a knowledge base that stores the
knowledge about the sp ecific application domain, and an
inference engine that uses the available knowledge to
solve problems propo sed by the user.
In the expert system proposed in this work, the knowl-
edge base is implemented as a set of decision trees (one
tree for each risk) and the inference engine is a procedure
that selects a proper decision tree in the knowledge base
and, by reasoning with the rules encoded on this tree,
decides whether a specific risk can or cannot occur, ac-
cording to the projects characteristics informed by the
user.
The decision trees used to populate the knowledge
base of the expert system are automatically generated by
an algorithm of supervised inductive learning. The in-
ductive reasoning implemented by this algorithm allows
the generation of rules about conditions that necessarily
implies specific risks, by analyzing a set of documents
with lessons learned in previously developed projects.
These rules form, in fact, a predictive model that can be
used to identify risks in new projects.
To identify risks in a new project, all that a risk man-
ager needs to do is to access the user interface of the ex-
pert system and inform the projects characteristics. Then,
the expert system should answer with a list of risks
automatically identified for that p roject.
4. The Experiment with the Expert System
To verify the effectiveness of the proposed solution for
automatic risk identification, the expert system of Figure
1 was implemented in the Java programming language,
based on the inductive tree learning algorithm ID3 [4].
This section reports some details of the experiment
performed with the system and discusses some empirical
results, as well.
4.1. Knowledge Base Populating
In order to decide whether a new project has a specific
risk, the expert system must use a list of known risks. As
said before, this list can be generated from a collection of
documents describing lessons learned in previous pro-
jects. Basically, there are two types of risk: generic risks,
which threat the most part of projects, and specific risks,
that threat the specific project under evaluation. Generic
risks can be easily detected by the expert system. On the
other hand, the detection of specific risks is more com-
plicated because, if they were not detected in previous
projects, the knowledge of the expert system might be
insufficient to detect their presence i n a new project.
Moreover, to compare new projects with previous
projects, and decide whether they are similar or not, the
expert system needs to use a predefined set of character-
istics which are common for all projects. These charac-
Copyright © 2012 SciRes. IIM