Open Journal of Applied Sciences, 2013, 3, 41-46
Published Online March 2013 (http://www.scirp.org/journal/ojapps)
Copyright © 2013 SciRes. OJAppS
Area-Correlated Spectral Unmixing Based on Bayesian
Nonnegative Matrix Factorization
Xiawei Chen, Jing Yu, Weidong Sun
Department of Electronic Engineering, Tsinghua University, Beijing, China
Email: hawaiix@gmail.com, yujing@tsinghua.edu.cn, wdsun@tsinghua.edu.cn
Received 2012
ABSTRACT
To solve the problem of the spatial correlation for adjacent areas in traditional spectral unmixing methods, we propose
an area-correlated spectral unmixing metho d based on Bayesian nonnegative matrix factorization. In the proposed me-
thod, the spa tial cor relatio n property between two ad jacent areas is expressed by a priori probability density function,
and the endmembers extracted from one of the adjacent areas are used to estimate the priori probability density func-
tions of the endmembers in the current area, which works as a type of constraint in the iterative spectral unmixing
process. Experimental results demonstrate the effectivity and efficiency of the proposed method both for synthetic and
real hyperspectral images, and it can provide a useful tool for spatial corr elation and co mparation analysi s between ad-
jacent or similar areas.
Keywords: Hyperspectral Image; Spectral Unmixin g; Area-Correlation; Bayesian Nonnegative Matrix Factorization
1. Introduction
Spectral unmixing is a technique to extract a collection
of pure spectra (called endmembers), by decomposing
mixed pixels of a hyperspectral image and estimating the
corresponding fractions (called as abundances) of these
endmembers [1]. Most of the existing spectral unmixing
met ho d s focus on the hyperspectral images themselves
and no extra information is needed in the spectral
unmixing procedure. Vertex Component Analysis (VCA)
[2], Pixel Purity Index (PPI) [3] and N-FINDR [4] are
wid el y-used typical meth ods, whose basic idea is to
extract the endmembers by identifying the vertexes of
the simplex formed by the endmembers. VCA and PPI
project the hyperspectral data into a low-dimensional
space based on the fact that vertexes of a simplex are still
vertexes even if the simplex is projected into a
low-dimensional space, while N-FINDR tries to find a
fixed number of pixels to minimiz e the volume of the
simplex formed by these pixels. Iterative Error Analysis
(IEA) [5] updates every endmember by selecting the
pixel with the maximum error in the reconstructed image
using the latest endmember set in an iterative process. A
new trend is to apply nonnegative matrix factorization to
the spectral unmixing, since all of elements in the
endmember matrix and the abundance matrix are
nonnegative [6, 7]. These meth ods mentioned above are
characterized by no requirement for extra information,
and therefore only a few constraints need be considered
when they are implemented.
But in the case of compr e he nsive a na lysis , we need
also consider some other information besides the
hyperspectral image itself, such as spectra from the
spectral library and hyperspectral images observed at
different time or in different area. If we could bring extra
information into the current framework of spectral
unmixing, the spatial correlation and comparation
analysis between different images can be carried out.
Plazza et al. put forward an Automatic Morphological
Endmember Extraction met h o d (AMEE) using the
correlation of spatial distribution for different substances
[8]. Zortea et al. propose a pre-processing method on a
pixel neighborhood scale to enhance the accuracy of
spectral unmixing [9]. Zhang fo c u ses on spectral
unmixing supported by digital geomorphology model
and geographical information system [10]. Dobigeon et
al. present a semi-supervised spectral unmixing based on
a hierarchical Bayesian model [11], where spectra from a
spectral library are used in the spectral unmixing
procedure. However, endmembers of hyperspectral
images observed at different time or in di fferent area
may hold not only strong time or sp atial correlation but
also certain difference at the same time, the problem is
how to use the se kinds of c o rrela tions in the spectral
unmixing procedure.
In this paper, we only concentrate on the spatial
correlation between hyperspectral images observed in
adjacent or similar areas. Our method is based on
Bayesian nonnegative matrix factorization [12, 13] and
the spatial correlation between hyperspectral images
X. CHEN ET AL.
Copyright © 2013 SciRes. OJAppS
observed in adjacent or similar areas is used in the form
of a priori probability density function as a constraint to
the spectral unmixing procedure. The remainder of this
paper is organized as follows: Section 2 gives the
detailed description of the proposed met ho d ; Section 3
discus ses the exper imental results both on synthetic and
real hyperspectral data; Section 4 is the conclusions.
2. Spatial Correl ation Method
2.1. Area-correlated Spectral Unmixing
A hyperspectral image
X
can be written in this form:
X=SA +E
, where
S =(s1; s2; ¢¢¢; sP) 2R+
L £P
is
the endmember matrix with the kth endmember
represented by
sk
,
A =(a
1
; a
2
; ¢¢¢; a
P
)
T
2 R
+
P £N
is
the abundance matrix in which
ak
corresponds to the
kth endmember and
E =(E
i ;j
)
L £N
2 R
L £N
is the
noise matrix with the number of endmembers P and the
number of mixed pixels N. The noise
Ei ;j
is assumed to
an i.i.d. white Gaussian noise with zero mean and
variance
¾
2
. The objective of spectral unmixing is to
find the proper
and
A
that minimize
kX¡SA k2
F
with the Frobenius norm
k¢k
F
(or other kinds of norm).
The basic idea of area-correlated spectral unmixing is
to use the common endmembers in adjacent or similar
areas as an aid in the spectral unmixing procedure. On
one hand, they can be used as initial values after proper
pre-processing with the curre nt hyperspectral image; on
the other hand, we bring them into the iterative spectral
unmixing procedure as a priori knowledge.
The whole procedure of our area-correlated spectral
unmixing metho d consists of three steps: pre-processing,
Bayesian nonnegative matrix factorization and post-
pro ce ssing.
In pre-processing, for each endmember extracted
from the adjacent area, we search for pixels similar to it
in the cur r ent hyperspectral image. If the number of these
similar pixels is too small, then we abandon this
endmember, otherwise, we will calculate the average
spectrum of these pixels and p ut it into the initial
endmember matrix. We fill the rest of the initial
endmember matrix with random pixels selected from the
curr ent hyperspectral image, and the abundance matrix
will be also initialized randomly.
The similarity of two spectra is measured by the
Spectral Corre lation-coe fficient Distance (SCD). The
SCD of spectrum
x
and
y
is defined by Equation (1).
SCD(x ;y )=
P
i
(x
i
¡x)(y
i
¡ y)
qP
i
(x
i
¡ x)
2
qP
i
(y
i
¡ y)
2
, (1)
whe re
x =
1
L
P
j
x
j
,
y =1
L
P
jyj
and L is the number
of bands.
Besides, we need also esti mate the number of
endmembers. If the estimated number is less than the real
one, then it’s e vid e nt that some endmembers cannot be
extracted. To avoid this situation, our strategy is to
increase the estimated number of endmembers.
After pre-processing, Bayesian nonnegative matrix
factorization with the spatial correlation constraint, as
described later in Section 2.2, will be applied to the
current hyperspectral data to estimate the endmembers
and their abundances. When the spectral unmixing
procedure is finished, the extracted endmembers are
clustered and the abundances are merged within every
cluster in the post-pro cessing stage .
2.2. Bayesian Nonnegative Matrix Factorization
Assuming that
,
A
and ¾
2
are statistically
independent, the Bayesian rule gives rise to
p
¡S; A; ¾
2
jX ¢=p¡XjS;A;¾
2
¢p¡S; A ; ¾
2
¢
p(X )
=p¡X jS;A ;¾
2
¢p(S) p(A ) p¡¾
2
¢
p(X )
_ p¡X jS;A ;¾
2
¢p(S) p(A ) p¡¾
2
¢
. (2)
One of the possible ways is to find the
and
A
by
maximiz ing Equation (2). But, due to the complexity of
Equation (2), the optimization process would be
computatively expensive. Instead, in Bayesian
nonnegative matrix factorization, Gibbs sampling [14], a
sampling method based on Markov Chain Monte Carlo
(MCMC) is applied to estimate
,
A
and ¾
2
.
Gibbs sampling estimates the model parameters by
sampling from the posterior probability density functions
of these parameters. These samples will converge to the
samples taken from the joint posterior probability density
functions of these model parameters. Reference [14]
gives the proof of the convergence of Gibbs sampling.
The procedure of Gibbs sampling can be described as: in
every round of sampling, we sample each parameter in
turn; when we sample one of the parameters, we fix the
rest and sample from the posterior probability density
function of the current parameter; the sample is then used
to update the posterior probability density function of the
next parameter. Therefore, the posterior probability
func ti ons of
,
A
and
¾
2
are required.
Ei ;j
is assumed to an i.i.d white Gaussian noise, that
is,
p
¡
XjS; A; ¾
2
¢
=
Y
i ;j
NX i ;j;^
Xi ;j
2, (3)
whe re
N¡x; ¹; ¾
2¢=1
p2¼¾
2e¡( x¡¹) 2
2
and
^
X =SA
.
An inverse Gamma distribution is chosen for the noise
variance
¾
2
and the posterior probability density
function of the noise variance is given by
42
X. CHEN ET AL.
Copyright © 2013 SciRes. OJAppS
p¡¾
2
¯
X; S; A ¢= G
¡ 1
¡¾
2
; k; µ¢
=µ
k
¡ (k)
¡¾
2
¢
¡ k¡ 1
e
¡
µ
¾2
, (4)
where
k =
L N
2
+ k
0
and
µ =
1
2
P
i ;j
E
2
i ;j
+ µ
0
.
According to the Bayesian rule, the posterior
probability density function of
sk
is
p¡s
k
¯
X; S
k
; A; ¾
2
¢_N (s
k
; ¹
s
k; §
s
k) p(s
k
)
, (5)
whe re
Sk
is the remaining matrix after deleting the
kth column of the endmembers matrix
,
N(x ; ¹; §)=1
p(2¼)Ldet (§)e¡1
2(x ¡¹)T§¡1(x ¡¹) is the
L-dimensional Gaussian distribution function, and
¹
s
k
and
§s
k
is the mean vector and the covariance matrix
of
s
k
and these parameters are estimated as
¹s
k
=¡¹S
1;k
; ¹S
2; k
; ¢¢¢; ¹S
L; k
¢
¹S
i; k
=
P
j
³
Xi ;j¡P
m 6=kSi ;mAm ;j
´
Am ;j
P
jA2
k;j
§s
k
= diag
n
¾
2
S
1;k
; ¾
2
S
2; k
; ¢¢¢; ¾
2
S
L; k
o
¾
2
S
i; k
=¾
2
P
jA2
k;j
i=1; 2; 3;¢¢¢; L:
, (6)
The posterior probability density functions of the
endmembers are actually multiple-dimensional Gaussian
distribution functions weighted by their priori probability
density functions. The next step is to choose proper
priori probability functions for the endmembers. We
adopt the following distribution as the priori probability
density function of the endmember
p(s k) _
(
1 ¡¸e¡jv
T
k
s
k
j0·sk· 1
0other wise
, (7)
where the vector
v
k is orthogonal to all the column
vectors of
Sk
,
and
1
represent all-zero and all-one
L-dimensional vectors respectively and
is the
parameter for adjusting. One of the reasons we form the
priori probability density function of the endmembers in
this way is that given a linear combination
of the
column vectors of
Sk
, we have
vT
kc=0
, and therefore
the corresponding density value will be zero, which
reduces the possibility of exacting a spectrum mixed by
the existing endmembers. Another reason is that when
projected to , endmembers tend to have larger values
of
¯
v
T
k
s
k
¯
which leads to larger probability density
values. By substituting the priori probability density
function in Equation (7), the posterior probability density
function of the endmember is written by
pkX ; Sk;A ; ¾
2
_N (k;
k
; §
k
)1 ¡¸e¡j
T
kk
j0 ·k· 1
0otherwise
.(8)
As to the abundances, similarly, the posterior
probability density function of the abundance
ak
is
p¡a
k
¯
X; S; A
k
; ¾
2
¢_N (a
k
; ¹
a
k
; §
a
k
) p(a
k
)
(9)
where
Ak
represent the remaining matrix after deleting
the kth row of
A
with the estimation of
¹a
k
and
§ak
¹
a
k
=¡¹
A
k; 1
; ¹
A
k;2
; ¢¢¢; ¹
A
k;N
¢
¹
A
k; j
=
P
i
S
i ;k
³
X
i ;j
¡P
n 6=k
S
i ;n
A
n ;j
´
P
i
S
2
i ;k
§
a
k
= diag
n
¾
2
A
k; 1
; ¾
2
A
k;2
; ¢¢¢; ¾
2
A
k;N
o
¾
2
A
k; j
=¾
2
P
i
S
2
i ;k
j=1; 2; 3; ¢¢¢; N :
, (10)
Since the kth endmember is assumed to d istribute
randomly in the observed area, the priori probability
densi t y function of the abundances is modeled as the
uniform distribution. Thus, the posterior probability
density function of the abundance
ak
becomes
pkX ; S; A k; ¾
2_N(k;
k
; §
k
)0 ·k· 1
0other wi se
. (11)
When estimating the model parameters ment ione d
above, the method of Slice Sa mplin g [15] is used to
sample from the posterior probability density functions
of these parameters.
3. Experiments and Discussions
3.1. Experiments on Synthetic Data
Ten spectra are selected from the spectral library of the
United States Geological Survey (USGS) for generating
synthetic hyperspectral images. For each pixel in the
synthetic hyperspectral image, we first pick several
spectra from the ten spectra, and then generate a set of
abund a nces which sum to one, and finally mix the
spectra linearly by their corresponding abundances.
Two synthetic hyperspectral images are generated to
simulate the hyperspectral images of two adjacent areas.
Six spectra are used to generate the first image while the
second image has eight, and four spectra are share d
between them. The white Gaussian noise is added to
each image with the SNR up to 30dB approximately.
Firstly, the first image is pre-processed using the Vertex
Component Analysis (VCA) meth od, to extract the
endmembers from it as the in itia l.
Figure 1 s hows the result, where the red curves
represent the real spectra and the endmembers extracted
by our method are represented by the blue curves.
Figure 1(a) is the final result of our method, in which
the first row corresponds to the four initial endmembers
from pre-processing and the second row is the other
extracted new endmembers obtained from the spectral
43
X. CHEN ET AL.
Copyright © 2013 SciRes. OJAppS
unmixing procedure. Table 1 lists the Spectral
Correlatio n-coefficient Distances and the Spectral Angle
Distances (SAD) between the real spectra and the
extracted endmembers. Figure 1(b) plots the error
between the original image and the reconstructed image,
whe re
error=kX¡SA k2
F
with the Fro b e ni us norm
k¢k
F
. As we can see from Figure 1(a) and Table 1, the
exacted endmembers fit the real spectra well. The error
between the original image and the reconstructed image
decreases sharply with the sampling process approaching,
which indicates that our method is able to estimate the
endmember matrix and the abundance matrix correctly.
Ar n gren et al. give another form of priori probability
density function with a volume constraint [16]. The
volume of the simplex formed by the endmembers is
used as a constraint in the priori probability density
function of the endmember matrix. Figure 2( a) shows
the results using the two different priori probability
density functions, where the red curves are still the real
spectra, the green curves are results under the volume
constraint and the blue curves are results of our method.
The results of our method fits the real spectra better.
Figure 2(b) shows the error comparison between the two
met ho d s. At the beginning, the errors of both met ho d s
decrease almost at the same pace. With the sampling
process going on, the error of our met ho d is dropping
faster than the met ho d with the volume constraint.
3.2. Experiments on Real Data
Two adjacent square areas of size 100×100 are
cropped from the AVIRIS data. The VD me t hod [17]
and the Hysime me tho d [18] are used to estimate the
number of endmembers of both images. Table 2 gi ve s
the estimation of both metho ds showing that the Hysime
met ho d gives a larger estimated number.
Table 1. SCD and SAD between the real spectra and the
extracted endmembers shared by both images.
End m em b er SCD SAD
1 0.950335 7.303504
2 0.973790 8.976964
3 0.977021 7.848447
4 0.997171 5.941172
Table 2. Estimation of numbers of endmembers in both
images using VD and Hysime method.
VD method Hysime method
Image 1 3 16
Image 2 4 18
According to [17], the VD method gives a relatively
better estimation than other methods when applied to
the AVIRIS data. Figure 3(a) illustrates the extracted
endmembers and the corresponding abundances. Five
spectra remain after clustering, which consistent with the
result of the VD method. Figure 3(b) plots the error
between the original image and the reconstructed image,
showing that the error declines rapidly in the first few
rounds of iteration and approaches zero in the end.
To avoid missing some endmembers, we increase the
estimated number of endmembers. But, there are two
disadvantages of this strategy. One is that some
endmembers may be extracted repeatedly and the other is
that mixed spectrum may be extracted. The solution to
the first problem is cluster. As to the second problem,
one possible solution is considering the abundances
while clustering. If a mixed spectrum was extracted, it is
mos t likely that its abundance will be small, which can
be used as a clue to discard the poorly extracted
endmembers.
Furthermore, our me t hod is based on the fact that there
exist some common endmembers between adjacent areas,
which means that it should be applied in some certain
occasions. For instance, a large hyperspectral image
cannot be processed immediately. The large image can
be split into several smaller pieces and then our method
is applied to the small hyperspectral images. Another
applicatio n is for areas that are not adjacent
geographically yet sharing some common ground surface
t ypes.
4. Conclusions
In this paper, we propose an area-correlated method of
spectral unmixing based on Bayesian nonnegative matrix
factorization. Common endmembers within adjacent
areas are used to assist the spectral unmixing procedure.
Expe r i me nt res ult s show that these endmembers can
serve as a kind of priori knowledge in the proced ur e to
improve the results. It is thought that the proposed
method can provide a useful tool for spatial correlation
and comparation ana l ysi s between adjacent or similar
areas. Future researches will focus on investigating other
correla tions between the hyperspectral images of
adjacent areas, optimizing the sampling process and how
to use the correlations more efficiently.
44
X. CHEN ET AL.
Copyright © 2013 SciRes. OJAppS
Figure 1. Experimental results on the synthetic data, (a)Comparison of the real spectra and the extracted endmembers,
(b)Error between the original image and the reconstructed image.
Figure 2. Comparison results of two different priori probability density functions, (a)Extracted endmembers of the two
methods, (b)Error of the two methods.
Figure 3. Experimental results on the real data, (a)Endmembers extracted from AVIRIS data with their abundances,
(b)Error between the original image and the reconstructed image.
5. Acknowledgement s
This work is supported by National Science Foundation
(No.61171117) and National Science & Technology Pil-
lar Program (No. 2012BAH31B01) of China.
45
X. CHEN ET AL.
Copyright © 2013 SciRes. OJAppS
REFERENCES
[1] N. Keshava and J. F. Mustard, “Spectral Unmixing,”
Signal Processing Magazine, IEEE, 2002. 19(1): p.
44-57. doi:10.1109/79.974727
[2] J. Nascimento and J. Dias, “Vertex Component Analysis:
A Fast Algorithm to Unmix Hyperspectral Data,” IEEE
Transactions on Geoscience and Remote Sensing, 2005.
43(4): p. 898-910. doi:10.1109/TGRS.2005.844293
[3] J. W. Boardman, F. A. Kruse and R. O. Green, “Mapping
Target Signatures via Partial Unmixing of AVIRIS Data,”
1995: Pasadena, CA.
[4] M. E. Winter, “N-FINDR: An Algorithm for Fast Auto-
nomous Spectral End-member Determination in Hyp e r-
spectral Data,” Proceedings of SPIE, 1999. p.
266-275. doi:10.1117/12.366289
[5] L. Sun, Y. Zhang and B. Guindon, “Improved Iterative
Error Analysis for Endmember Extraction from Hyper-
spectral Imagery,” Proceedings of SPIE,
2008. doi:10.1117/12.799232
[6] D. D. Lee and H. S. Seung, “Algorithms for Non-negative
Matrix Factorization,” Advances in Neural Information
Processing Systems, 2001: p. 556-562.
[7] V. P. Pauca, J. Piper and R. J. Plemmons, “Nonnegative
Matrix Factorization for Spectral Data Analysis,” Linear
Algebra and Its Applications, 2006. 416(1SI): p.
29-47. doi:10.1016/j.laa.2005.06.025
[8] A. Plaza, P. Martinez, R. Perez and J. Plaza, “Spa-
tial/spectral Endmember Extraction by Multidimensional
morphological Operations,” IEEE Transactions on Geos-
cience and Remote Sensing, 2002. 40(9): p. 2025-
2041. doi:10.1109/TGRS.2002.802494
[9] M. Zortea and A. Plaza, “Spatial Preprocessing for End-
member Extraction,” IEEE Transactions on Geoscience
and Remote Sensing, 2009. 47(8): p.
2679-2693. doi:10.1109/TGRS.2009.2014945
[10] B. Zhang, “Hyperspectral Data Mining Supported by
Temporal and Spatial Information,” PH.D. Thesis, Insti-
tute of Remote Sensing Applications, Chinese Academy
of Sciences, 2002.
[11] N. Dobigeon, J. Y. Tourneret and C. Chein-I,
“S emi -supervised Linear Spectral Unmixing Using a
Hierarchical Bayesian Model for Hyperspectral Imagery,”
IEEE Transactions on Signal Processing, 2008. 56(7): p.
2684-2695. doi:10.1109/TSP.2008.917851
[12] M. N. Schmidt, “Linearly Constrained Bayesian Matrix
Factorization for Blind Source Separation,” Advances in
Neural Information Processing Systems, 2009.22: p.
1624-1632.
[13] M. N. Schmidt, O. Winther and L. K. Hansen, “Bayesian
Non-negative Matrix Factorization,” Independent Com-
ponent Analysis and Signal Separation, 2009. p. 540-547.
[14] G. Casella and E. I. George, “Explaining The Gibbs
Sampler,” American Statistician, 1992. 46(3): p.
167-174. doi:10.1080/00031305.1992.10475878
[15] R. M. Neal, “Slice Sampling,” Annals of Statistics, 2003.
31(3): p. 705-767. doi:10.1214/aos/1056562461
[16] M. Arngren, M. N. Schmidt and J. Larsen, “Bayesian
Nonnegative Matrix Factorization with Volume Prior for
Unmixing of Hyperspectral Images,” Proceedings of the
2009 IEEE International Workshop on Machine Learning
for Signal Processing (MLSP 2009), 2009: p. 6 pp.-6
pp. doi:10.1109/MLSP.2009.5306262
[17] C. I. Chang and Q. Du, “Estimation of Number of Spec-
trally Distinct Signal Sources in Hyperspectral Imagery,”
IEEE Transactions on Geoscience and Remote Sensing,
2004. 42(3): p. 608-619. doi:10.1109/TGRS.2003.819189
[18] J. M. Bioucas-Dias and J. M. P. Nascimento, “Hyper-
spectral Subspace Identification,” IEEE Transactions on
Geoscience and Remote Sensing, 2008. 46(8): p.
2435-2445. doi:10.1109/TGRS.2008.918089
46