Journal of Software Engineering and Applications, 2013, 6, 13-17
doi:10.4236/jsea.2013.63b004 Published Online March 2013 (http://www.scirp.org/journal/jsea)
Copyright © 2013 SciRes. JSEA
13
Bayesian Network and Factor Analysis for Modeling Pine
Wilt Disease Prevalence
Mingxiang Huang1, Liang Guo2, Jianhua Gong3, Weijun Yang2
1T Information Center, Ministry of Environmental Protection of China, Beijing ,China; 2GuangZhou Urban Planning & Design Sur-
vey Research Institute, GuangZhou ,China; 3State Key Laboratory of Remote Sensing Science, Institute of Remote Sensing Applica-
tions, Chinese Academy of Sciences, Beijing ,China.
Received 2013
ABSTRACT
A Bayesian network (BN) model was developed to predict susceptibility to PWD(Pine Wilt Disease). The distribution
of PWD was identified using QuickBird and unmanned aerial vehicle (UAV) images taken at different times. Seven
factors that influence the distribution of PWD were extracted from the QuickBird images and were used as the inde-
pendent variables. The results showed that the BN model predicted PWD with high accuracy. In a sensitivity analysis,
elevation (EL), the normal differential vegetation index (NDVI), the distance to settlements (DS) and the distance to
roads (DR) were strongly associated with PWD prevalence, and slope (SL) exhibited the weakest association with PWD
prevalence. The study showed that BN is an effective tool for modeling PWD prevalence and quantifying the impact of
various factors.
Keywords: Pine Wilt Disease; Bayesian Network; Modeling; Factor Analysis
1. Introduction
Pine wilt disease (PWD) is caused by the pinewood ne-
matode, Bursaphelenchus xylophilus. This nematode is
vectored by the pine sawyer beetle, Monochamus alter-
natus, which disperses the nematode to healthy trees. The
pinewood nematode was first reported as a new species
during the 1930s in the USA [1,2], and it was introduced
to Japan at the beginning of the 20th century [3-6]. From
Japan, the pinewood nematode has spread to Korea,
Taiwan, and China and has devastated pine forests in
East Asia [7]. In China, PWD has become the most se-
rious disease of pine trees; the affected areas have
reached nearly 80,000 ha, and 50,000,000 trees have
been killed by the disease[8].
Bayesian networks (BNs), also known as belief net-
works, were first proposed by Pearl[9]. BN models
graphically and probabilistically represent correlative and
causal relationships among variables[10] and can be used
to analyze a problem domain and predict the conse-
quences of intervention [11]. BNs have several distinct
advantages when compared with other decision models,
such as decision trees and neural networks. The principal
advantage of BNs is their graphic construction, which
shows the relationships among the variables more clearly
and facilitates combining empirical data and expert
knowle d ge [10]. BNs have been successfully used to
solve ecological and environmental problems in nature
resource reserve and management [12-15]. In addition,
combining BNs and GIS enables the creation of spatial
representations of model-based management [16]. BNs
are also robust when using geospatial data that may con-
tain multiple uncertainties caused by positional, feature
classification, resolution, attribute, data completeness,
currency, and logical consistency errors[17].
The objectives of the present study were as follows: (i)
to construct a BN model for predicting PWD prevalence
using related independent variables, (ii) to conduct sensi-
tivity assessments that evaluate the influence of these
variables on PWD.
2. Study Area
The study area is situated in the southwest of Xiangshan
County, Zhejiang Province, China. The study area covers
approximately 1.37 km2, bounded by latitudes 29°22′47″
N–29°23′36″ N and longitudes 121°44′59″ E–121°45′53″
E. The area has a subtropical monsoon climate, with an
average annual temperature of 16 ºC, an average annual
rainfall of 1 463 mm, an elevation ranging from 16-215
m above sea level, and an annual total solar radiation of
103 kcal/cm2.
3. Material and Methods
3.1. Data collecting of PWD
Bayesian Network and Factor Analysis for Modeling Pine Wilt Disease Prevalence
Copyright © 2013 SciRes. JSEA
14
In the study, QuickBird images (PAN and MS) with high
spatial resolution (0.61 m) and unmanned aerial vehicle
(UAV) images with higher spatial resolution (0.3 m)
were captured by a FUJIFILM-FinepixZ10fd camera at
different times. A detailed description of the two types of
images is shown in Table 1. To detect areas damaged by
PWD, different factors were used to classify the images
after they were preprocessed (Table 1). The results for
the QuickBird and UAV images are shown in Figure 1.
3.2. Factor Selection and Discretization
In this study, seven factors that influence PWD transmis-
sion the normal differential vegetation index (NDVI),
elevation (EL), aspect (AS), slope (SL), distance to set-
tlements (DS), distance to roads (DR), and total number
(TN) of PWD cases in a 20 m × 20 m neighborhood
were chosen as the variables in the BN model. NDVI, EL,
Table 1. Table Type StylesThe characteristics of the pine
wilt disease (PWD) spatial data acquired from the Quick-
Bird and unmanned aerial vehicle (UAV) images.
Collected
Data
Image
Description Preprocess Classifi er Acc ura cy
for PWD
QuickBird
Pan and
MS image
Acquisition
date: April 20,
2006
Scene ID:
1010010 004E
D9A00
Resolution:
0.61 m
Orthorect ification
and Pansharp
fusion
Gaus s-Markov
Random
Field
Segmen t ation
[32]
User
accuracy:
89.19 %
Producer
accuracy:
86.84 %
U AV
Image
Acquisition
date: October
28, 2007
Flight Height:
650 m
Sensors:
FUJIFIL M-
FinepixZ10fd
Resolution:
0.3 m
Image mosaics
And
orthorecti fication
Baye s
Classifi er
of Idrisi,
Andes
Edition [33]
User
accuracy:
90.00 %
Producer
accuracy:
92.31 %
Figure 1. PWD identification based on images (A: UAV, B:
QuickBird).
AS, SL and TN represent environmental factors, whereas
DS and DR are measures of human activity.
3.3. Bayesian Networks
A Bayesian network is an annotated acyclic graph that
represents a joint probability distribution[18]. This re-
presentation consists of an ordered pair, (G, P). The first
component, G, is a directed acyclic graph (DAG) whose
vertices correspond to the random variables X1,..., Xn.
The second component, P, describes a conditional proba-
bility distribution for each variable given its parents in G.
Together, these two components specify a unique distri-
bution for X1,…, Xn.
Graph G represents conditional independence assump-
tions that allow the joint distribution to be decomposed,
which reduces the number of parameters. Graph G en-
codes the Markov assumption: each variable Xi is inde-
pendent of its non-descendants given its parents in G, i.e.,
P={p(x1|y1),…,p(xn|yn)}, where yi are the parents of xi.
By applying the chain rule for probabilities and proper-
ties of conditional independence, any joint distribution
can be decomposed into the following product form:
1
()(| )
n
i
i
pxpx y
=
=
Π
(1)
4. Result and Discussion
4.1. Structure and Validation
In the present study, tree augmented naive bayes (TAN)
was chosen as the BN structure. The Bayesian network
was developed using BNT (Bayes Net Toolbox for Mat-
lab). All of the datasets were partitioned into training sets
to construct the BN structure (88.9%, n=10056) and va-
lidation sets (11.1%, n=1257) to assess the model accu-
racy.
Figure 2 shows the tree structure of the BN model and
the strong dependences between the independent variable
nodes (the 7 impact factors) and the dependent variable
node (PWD), which constitute an optimized structure
with the PWD node as the root. The arcs between the
parent and child nodes and the CPTs were constructed
during the training process; therefore, they underpin the
BN structure. The CPT of each child node was specified
by the possible results for each combination of the parent
node values. The CPT of the PWD node is shown as an
example in Table 2. As shown in Figure 2, all of the
impact factor nodes are child nodes of the PWD node. In
addition, the EL node is the parent node of all other im-
pact nodes (except for NDVI) due to the strong relation
between EL and the other variables. The ROC curve was
created and the BN model performance measurements
shown in Table 3. The area under the ROC curve shown
in Figure 3 is 0.934 (95% confidence interval = 0.920-
Bayesian Network and Factor Analysis for Modeling Pine Wilt Disease Prevalence
Copyright © 2013 SciRes. JSEA
0.948), indicating excellent discriminatory capability. In
addition, an optimal cut-off point value is chosen by cal-
culating the maximum sum of the sensitivity and speci-
ficity, which occurs at a sensitivity value of 82.4% and a
specificity value of 89.9%.
AS
TN DS DR
PWD
SL
NDVI
EL
Figure 2. The trained TAN model structure.
Table 2. The conditional probability table (CPT) for the
pine wilt disease (PWD) node.
Parent nodes TN
EL PWD 0 1 2-5 >5
0-22.276 0 0.9957 0 .0021 0 .0021 0
0-22.276 1 0.9897 0 .0034 0 .0034 0.0 034
22.276 -30.104 0 0 .9473 0.0 176 0.0 336 0.0 015
22.276 -30.104 1 0 .7221 0.0 354 0.2 071 0.0 354
30.104-10 1.479 0 0.797 0.0764 0.1081 0 .0185
30.104 -101.479 1 0.6721 0.1152 0.1911 0.0216
>101. 479 0 0.8083 0 .0695 0 .0995 0.0 226
>101. 479 1 0.6868 0 .0951 0 .1621 0.056
Table 3. The performance measurements of the Bayesian
network (BN) model.
Are a TPR FPR
Asymptotic 95% Confidence In-
terval
Lower Bound Upper Bound
0.934 82.4% 89.9% 0.920 0.948
Figure 3. The ROC curve of the BN model.
4.2. BN Sensitivity Analysis
Table 4 shows the results of the sensitivity analysis. The
most important PWD impact factor is EL. The impor-
tance of EL can be explained by the strong relationships
between the elevation gradient and temperature and soil
moisture variations, which further affect the PWD dis-
tribution. In this study, EL also had distinctively high
positive correlations with DR and DS, implying that the
low-elevation regions were closer to human settlements
and roads. Due to the human activities that accelerate
PWD transmission, pine stands that are close to settle-
ments and roads are more likely to be affected by PWD.
These phenomena may explain why EL has to the
strongest association with PWD.
The second most important factor is NDVI. The NDVI
distribution for a plant (or an entire plot) characterizes
the state of the plant (age, leaf area index, and health to
some extent). The age of a tree also influences its sus-
ceptibility to PWD. There is an increased risk of devel-
oping pine wilt in trees that are more than 10 years old.
Relevant research has confirmed that the growing condi-
tions of pine trees, such as DBH (diameter at breast
height), crown diameter and height, are inversely corre-
lated with pine sawyer population density and the inci-
dence of PWD. Therefore, a larger NDVI value is asso-
ciated with a lower PWD incidence.
The influence of human activities on PWD is easily
understood. According to findings by Togashi and Shi-
gesada [7], human activities such as lumbering and
transporting pine logs infected with the nematodes and
their insect vector accelerate the spread of PWD by in-
creasing the risk of PWD transmission from infected pine
stands to surrounding trees. However, human interven-
tion can also consist of silvicultural methods for control-
ling PWD in infected areas. These preventive measures
specifically include clear-cutting infected pine trees and
burning the infected branches and logs.
Table 4. Sensitivity analysis results ranked in decreasing
order of influence on PWD prevalence based on mutual
information or entropy reduction.
Node Mutual Info
En t ro py
reduction(%)
PWD 0.18390 100
EL 0.05796 31.5
NDVI 0.04762 25.9
DS 0.02126 11.6
DR 0.01569 8.53
TN 0.00783 4.26
AS 0.00233 1.27
SL 0.00005 0.026
Bayesian Network and Factor Analysis for Modeling Pine Wilt Disease Prevalence
Copyright © 2013 SciRes. JSEA
16
The correlation between TN and PWD is also strong.
In fact, TN represents the density of pine sawyers in the
infected pine trees. Because they are the host-vector for
the pinewood nematode, pine sawyer movement is an
important means of PWD dispersal.
4.3. Marginal Probability Distributions
Figure 4 shows the prior and posterior marginal proba-
bility distributions for the impact factors; the solid lines
represent the distributions without evidence of PWD (the
a priori distributions), and the dotted lines represent the
distributions with evidence of PWD (the a posteriori dis-
tributions). From Figure 4, it is easy to obtain the belief
change associated with PWD evidence for every impact
factor. From Figure 4, the PWD prevalence is highly
sensitive to elevation when the value of elevation is
above 30 m, with belief increasing by 88.07% for eleva-
tions between 30 m and 100 m and decreasing by 26.16%
for elevations above 100 m. NDVI also has a significant
influence on PWD, with belief decreasing as NDVI in-
creases. Because the pine trees with greater DBH, crown
diameter and height have higher NDVI values, the pine
trees with better growth conditions are less prone to
Figure 4. The prior and posterior marginal probability dis-
tributions of the variables in the NB model.
PWD. The belief changes associated with AS are more
complex. In particular, a southern exposure increases
belief when there is evidence of PWD. Southern e xpo-
sure increases belief by 2.13%, showing that sun expo-
sure can enhance pine nematode survival. In fact, sun
exposure and adequate light is conducive to pine sawyer
breeding, resulting in higher PWD prevalence on the
sunny side. The influence of slope is also interesting. The
crossover point between a slope of 6°-26° in Fig. 4
shows that a gentle slope (6°-16°) is beneficial to the
health of pine trees and is associated with a decreased
belief of -31.40%. However, a steeper slope (16°-26°)
increases belief by 24.16% when there is evidence of
PWD.
5. Conclusions
In this study, a BN approach was used to model the pre-
valence of PWD. The results showed that EL, NDVI, DS
and DR were important impact factors for explaining
PWD prevalence in the study area. Based on the margi n-
al probability distributions, PWD prevalence is highly
sensitive to elevation above 30 m, with an increased be-
lief of 88.07% for elevations between 30 m and 100 m
and a decreased belief of 26.16% for elevations above
100 m. Future work should focus on the data uncertainty,
include additional impact factors in the independent va-
riables and apply the new hybrid BN model.
6. Acknowledgment
This research was supported by the National Natural
Science Foundation of China (Project No. 40901233).
REFERENCES
[1] Liebhold, A.M., Macdonald, W.L., Bergdahl, D., Maestro,
V.C., INVASION BY EXOTIC FOREST PESTS - A
THREAT TO FOREST ECOSYSTEMS. Forest Science
41(2) 1-49. (1995)
[2] Waage, J.K., Reaser, J.K., A global strategy to defeat
invasive species. Science 292(5521) 1486-1486. (2001)
[3] Beckenbach, K., Smith, M.J., Webster, J.M., Taxonomic
affinities and intra-and interspecific variation in Bursa-
phelenchus spp. as determined by polymerase chain reac-
tion. Journal of Nematology 24(1) 140. (1992)
[4] Iwahori, H., Tsuda, K., Kanzaki, N., Izui, K., Futai, K.,
PCR-RFLP and sequencing analysis of ribosomal DNA
of Bursaphelenchus nematodes related to pine wilt dis-
ease. Fundamental and Applied Nematology 21(6)
655-666. (1998)
[5] Mamiya, Y., History of pine wilt disease in Japan. Journal
of Nematology 20(2) 219. (1988)
[6] Tares, S., Abad, P., Bruguier, N., de Guiran, G., Identifi-
cation and evidence for relationships among geographical
Bayesian Network and Factor Analysis for Modeling Pine Wilt Disease Prevalence
Copyright © 2013 SciRes. JSEA
isolates of Bursaphelenchus spp.(pinewood nematode)
using homologous DNA probes. Heredity 68(2) 157-164.
(1992)
[7] Togashi, K., Shigesada, N., Spread of the pinewood ne-
matode vectored by the Japanese pine sawyer: modeling
and analytical approaches. Population Ecology 48(4)
271-283. (2006)
[8] Zhao, B.G., 2008. Pine Wilt Disease in China, Pine Wilt
Disease. . Springer Japan, pp. Part I,18-25.
[9] Pearl, J., 1988. Probabilistic reasoning in intelligent sys-
tems: networks of plausible inference. Morgan Kauf-
mann.
[10] McCann, R.K., Marcot, B.G., Ellis, R., Bayesian belief
networks: applications in ecology and natural resource
management. Canadian Journal of Forest Research 36(12)
3053-3062. (2006)
[11] Heckerman, D., A tutorial on learning with Bayesian
networks. Innovations in Bayesian Networks 33-82.
(2008)
[12] Bromley, J., Guidelines for the use of Bayesian networks
as a participatory tool for Water Resource Management.
(2005)
[13] Castelletti, A., Soncini-Sessa, R., Bayesian Networks and
participatory modelling in water resource management.
Environmental Modelling & Software 22(8) 1075-1088.
(2007)
[14] Marcot, B.G., Holthausen, R.S., Raphael, M.G., Rowland,
M.M., Wisdom, M.J., Using Bayesian belief networks to
evaluate fish and wildlife population viability under land
management alternatives from an environmental impact
statement. Forest Ecology and Management 153(1-3)
29-42. (2001)
[15] Nyberg, J.B., Marcot, B.G., Sulyma, R., Using Bayesian
belief networks in adaptive management. Canadian Jour-
nal of Forest Research 36(12) 3104-3116. (2006)
[16] Stelzenm, uuml, ller, V., Lee, J., Garnacho, E., Rogers,
S.I., 2010. Assessment of a Bayesian Belief Net-
workGIS framework as a practical tool to support ma-
rine planning.
[17] Dlamini, W.M., A Bayesian belief network analysis of
factors influencing wildfire occurrence in Swaziland. En-
vironmental Modelling & Software 25(2) 199-208. (2010)
[18] Friedman, N., Linial, M., Nachman, I., Pe'er, D., Using
Bayesian networks to analyze expression data. Journal of
Computational Biology 7(3-4) 601-620. (2000).