Journal of Signal and Information Processing, 2011, 2, 184-189
doi:10.4236/jsip.2011.23025 Published Online August 2011 (http://www.SciRP.org/journal/jsip)
Copyright © 2011 SciRes. JSIP
Improving Mutual Coherence with Non-Uniform
Discretization of Orthogonal Function for Image
Denoising Application
Hani Nozari1, Alireza Siamy2
1Department of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran; 2Department of Electrical Engi-
neering, Mazandaran University, Babol, Iran.
Email: hani_nozari@elec.iust.ac.ir
Received April 27th, 2011; revised May 20th, 2011; accepted May 27th, 2011.
ABSTRACT
This paper presented a novel method on designing redundant dictionary from known orthogonal functions. Usual way
of discretization of continuous functions is uniform sampling. Our experiments show that dividing the function defini-
tion interval with non-uniform measure makes the redundant dictionary sparser and it is suitable for image denoising
via sparse and redundant dictionary. In this case the problem is to find an appropriate measure in order to make each
atom of dictionary. It has shown that in sparse approximation context, incoherent dictionary is suitable for sparse ap-
proximation method . According to th is fact we defin e some op timization pro blems to find the best parameter of distribu-
tion measure (in our study normal distribution). For better convergence to optimum point we used Genetic Algorithm
(GA) with enough diversity on initial population. We show the effect of this type of dictionary design on exact sparse
recovery support. Our results also show the advantage of this design method on image denoising task.
Keywords: Grassmaniann Frames, Normal Distribution, Mutual Coherence, Genet i c Algori thm, Image Denoising
1. Introduction
Since the sparse representation of signals has leaded to
improvements in many applications such as coding, de-
noising, feature extraction and etc. in signal and image
processing, there has been a growing interest in the study
of this method in recent years. Sparse and redundant re-
presentation modeling of data assumes an ability to de-
scribe signal as linear combinations of a few atoms from
a pre-specified dictionary [1], where the linear coeffi-
cients are sparse. In this case we use a matrix
, as a dictionary, either to compactly
express or efficiently approximate a signal. Let
and
:
dN
dN
Dd
Nd
y
N
x be the given signal and the coefficient vec-
tor respectively. We can express sparse equation as

0, 0
2
:min
x
Px
subject toyDx

(1)
where 0
been proposed in order to reach an approximate solution
and all of them can be categorized as greedy algorithms
like matching pursuit (MP) [1] and its derivations [3],
and relaxation method, like basis pursuit (BP) [4]. Image
denoising via sparse and redundant representation uses
this fact. Suppose Z is zero-mean white Gaussian noise
with known standard deviation
, and X be a cleaned
image. Y = X + Z would be noisy image. Image denoising
via redundant dictionary is a kind of energy minimization
[5] and it can be expressed by following functional ex-
pression

2
2
22
0
,,
ij ij
ij i
ij
j
jiXDXY Rf

 
X
(2)
The first term of above expression says the difference
between measured image Y and its denoised version X.
second and third part of this expression represent sparse
approximation. Choosing a dictionary D which have to
be an overcomplete dictionary is one of the most impor-
tant fundamental choices that must be considered [6]. An
overcomplete dictionary D that leads to sparse represen-
x
is the norm, counting the nonzero en-
tries of a vector. With
0
l
0
, we have sparse representa-
tion which actually is a NP-hard problem and cannot be
solved in a reasonable time [2]. Many algorithms have
Improving Mutual Coherence with Non-Uniform Discretization of Orthogonal Function for 185
Image Denoising Application
tations can either be chosen as a pre-specified set of
functions (dictionary of this type include wavelets [7],
Curvelets [8], Contourlets [9] and bandelets [10], among
others.) or designed by adapt its content to fit a given set
of sample signals such as method of optimal direction
(MOD) [11], the K-SVD [12] and others. Using a
pre-specified dictionary leads to creating simple and fast
algorithms for the evaluations of the sparse representa-
tion. These types of dictionaries are called non-adaptive
dictionaries. Adaptive dictionary is time consuming for
sparse approximation so working with pre-specified
functions still have its own interest.
In this paper we used different ways to signals atom
from pre-specified functions. Each column of the dic-
tionary is selected from a set of orthogonal functions in a
specific dimension. In order to make a dictionary of size
we should sample d times from each N atoms.
The main question is how one can sample the function
definition interval in order to achieve a sparser dictionary.
To divide the interval of function definition in order to
select each sample of function formed with normal dis-
tribution
dN
,
, we used some optimization tech-
niques to find the best parameter of normal distribution.
For confidence in convergence we implement our opti-
mization task with Genetic Algorithm (GA). As describe
later our objective function has several extremum so
enough diversity on initial population decrease the possi-
bility of trapping in local minimum. With this discretiza-
tion, the result dictionary has smaller mutual coherence
than uniform one as a result of this fact better Peak-Sig-
nal-to-Noise-Ratio (PSNR) achieve in image processing.
This type of dictionary design is also improved the re-
quired time for image denoising.
2. Tight Frames
The concept of a frame is a generalized concept of an
orthonormal basis. Each vector in the space can be rep-
resented as the sum of the elements in the frame, but it is
not necessarily unique. The frame theory gives energy
equivalence conditions to solve both synthesis and analy-
sis problems with stable operators. A family
p
is a
frame of the Hilbert space . If then
we can have [7]:
d
V0BA
2
2
Λ
,,
p
m
hVAhh Bh
 
2
(3)
where the usual Hermitian inner product is denoted with
.,. , and . is written for the associated norm. when A =
B we will have tight frames. We can achieve the unit
norm tight frames when 1
p
. With concatenating N
frames of we have overcomplete dictionary. In fol-
lowing part of this section we explain the process of de-
signing overcomplete dictionary from a specific family
of frames.
d
2.1. Overcomplete Dictionary
Suppose
12
,
N

dN
D
are a set of N orthogonal con-
tinuous functions in the [a,b] interval. In order to make N
vectors of size “d” (d < N), we should sample these func-
tions d times on its definition interval. The outcome of
this process is a :
dN d
N
matrix called over-
complete dictionary that is used in sparse representations.
Which distribution we should use to sample these frames?
The answer of this question completely depends on the
type of orthonormal frames, but experiments show that
normal distribution is suitable for majority of orthonor-
mal functions. Normal distribution
,
has two
parameters; mean
and standard deviation
. Dif-
ferent parameters of this distribution define different type
of discretization so we have distinct dictionary for each
pair of parameters. In this paper we want to discuss dif-
ferent methods in order to find the best parameters of
normal distribution, after that we define some optimiza-
tion problem to find these parameters. At first we men-
tion some definitions for dictionary selection in sparse
representations. A substantial parameter in dictionary
selection for sparse representation is mutual coherence.
The mutual coherence of a dictionary D, , express
the coherency between atoms of dictionary. This pa-
rameter defined by inner product between each distinct
normalized atoms of dictionary D [13]
D

||
max
T
ij
ij TT
ijij
aa
Daaaa
(4)
Minimum
D
is suitable for sparse representation.
For an orthogonal matrix D, we have . But for
a redundant dictionary mutual coherence is greater than
zero because of its redundancy in known dimension. For
an overcomplete dictionary (N > d) it has shown that
there exist a lower bound. It has been shown in [14] that
for all full rank matrix of size

0D
dN

1
Gkn
nk

(5)
G
is the mutual coherence of Grassmannian frames
which a dictionary with minimum coherence [14]. In
order to have sparse representation, finding a dictionary
with minimum of
D
is the competing area [15],
[16]. As mention in previous section different sampling
of orthogonal set make different redundant dictionary so
with suitable tuning in sampling process we can hope to
find a dictionary that has minimum distance to Grass-
mannian frames. So we need an objective function to
Copyright © 2011 SciRes. JSIP
Improving Mutual Coherence with Non-Uniform Discretization of Orthogonal Function for
186
Image Denoising Application
minimize this distance. Following part represent con-
struction of our objective function.
2.2. Optimization Problem
Let
12
,,,
N
D

.
T
DDD
be a dictionary in , we de-
fine as gram matrix of D. Each element of
this matrix represents the coherency of each two atoms
that obtain with an inner product between corresponding
atoms. Suppose G
be a square matrix in which the
absolute value each of off diagonal elements are
dN
and
the diagonal element is 1 (Gram matrix of Grassmannian
frames). In order to make a dictionary, which have min-
imum distance in frobenious norm to Grassmannian
frames, we can define our objective function as a dis-
tance minimization of
D
and
G
ٛ
:DG
F
OBJ  (6)
.
F
stands for frobenious norm;
D
and G is
Gram matrix of dictionary D and Grassmannian frames
respectively
.
F
stands for frobenious norm. This is a
big objective function to minimization, so we need to
break down this function in order to achieve better opti-
mization problem. In other words, we want to link this
problem to sampling process of orthogonal set. The main
ongoing question is how to sample orthogonal functions.
Suppose

k
, be a set of N orthogonal func-
tions in the [a,b] interval. To divide this interval to d
sample we use methods that are explained as
1kN
,1
00 1
,,1,2,
kkk
ktttd
 (7)
where , and the values of
1
0
d
k
k
ba

k
are de-
termined by normal distribution
,,t
. As it has
been explained before, each element of
D

is an inner
product of corresponding atoms
(
 

,, ,,jti

 ,,t
ijD). We can
rewrite the objective function as below:

 

12
2
,,
DGD G
Fij
i
,
F
ujij
s





 (8)
So with this assumption the objective function only
depends on ,
. So we can use any optimization algo-
rithm to minimize this objective function [17]. The value
of
,
F
us or distance between Gram matrix of de-
signed dictionary and Grassmannian frames implicitly
depend on orthogonal function, so different set of or-
thogonal function has different value for

,
F
us.
2.3. Solving the Problem: GA
Objective function that defined in (7) is a non-convex
problem. We might be in trouble if we want to solve this
problem with gradient methods; because these methods
need an initial intelligent guess on initial point for confi-
dence in real convergence and not trapped in local min-
ima. GA is a method that we should use to conquest to
this problem. The main part of GA is its initial popula-
tion that should have enough diversity. Our experiments
show that uniform distribution on
and
is suit-
able for finding absolute minimum. This algorithm re-
peatedly modifies a population of individual solutions.
At each step, it randomly selects individuals from the
current population to be parents and uses them to pro-
duce children for the next generation [18]. Number of
variables in optimization process is two. In the next sec-
tion we implement our algorithm on trigonometric func-
tion as an orthogonal set.
3. Implementation and Results
This section illustrated our simulation results. As men-
tion above we can apply this algorithm on any set of or-
thogonal function. We implement our algorithm on Si-
nusoid functions
sin nx as a specific set of or-
thogonal function. At first we perform our optimization
process and find the best parameters of normal distribu-
tion and compare the distance to Grassmannian frames
with uniform version of this function. Second part of this
section demonstrated the ability of this kind of dictionary
design on sparse coding. The last part shows the effect of
normal sampling type dictionary design on image de-
noising.
3.1. Design Dictionary
sin nx is an orthogonal set in
0, π
. We want to
compare uniform and normal distributions in order to
make overcomplete dictionary in sparse representations.
To optimize the (8) function we used “ga” command of
Matlab software. We implement our algorithm on several
lengths of dictionary’s atom and different redundancy
factor and plotted the distance of Gram matrix of each
dictionary from Grassmannian frames in Figure 1. As it
is obvious, the dictionary that has been designed with
normal distribution is closer to Grassmannian frames
than the uniform one. Optimize parameters for normal
distribution, are listed in Table 1 for different atom
length and dictionary redundancy.
3.2. Sparse Coding
This part demonstrated the ability of sampling with nor-
mal distribution in terms of exact sparse recovery. In this
experiment we use the method that presented in [19] and
plotted the probability of success in sparse coding versus
the cardinality of solution and our result be depicted in
Figure 1. As it clear from the Figure 2, normal type dis-
ribution has better performance. t
Copyright © 2011 SciRes. JSIP
Improving Mutual Coherence with Non-Uniform Discretization of Orthogonal Function for
Image Denoising Application
Copyright © 2011 SciRes. JSIP
187
22.5 33.5 44.5 5
4
6
8
10
12
14
16
18
20
22
Redundancy
Distance Fr om Grassmannian Frames
22.5 33.5 44.5 5
10
15
20
25
30
35
40
45
50
Redundancy
Distance From Gras smannian frames
Uniform DistributionUniform Distribut ion
Normal Distributi onNormal Distributi on
(a) (b)
22.5 33.5 44.5 5
10
20
30
40
50
60
70
Redundancy
Distance From Grassmannian frames
22.5 33.5 44.5 5
20
30
40
50
60
70
80
90
100
Redundancy
Distance From Grassmannian frames
Uniform Distri bu tion
Uniform Distribution
Normal DistributionNormal Distributi on
(c) (d)
Figure 1. Distance from Grassmannian fr ames for different length of atom and redunda ncy factor (a) d = 16, (b) d = 64, (c) d
= 128, (d) d = 256.
3.3. Image Denoising process for Boat image in different noise level. Paying
attention to the figures the advantages of using this type
of design is obvious. In Figure 4 these two types of
overcomplete dictionaries are shown as an element in
gray scale.
As a result of this type of dictionary design, we per-
formed our algorithm on image denoising. We used im-
age denoising method which was proposed in [17]. In
this type of image denoising a controversial point is se-
lecting overcomplete dictionary. Signal to noise ratio
(PSNR) of resulted image with these two types of over-
complete dictionaries are illustrated in Table 2 (the value
of PSNR are average for 10 times running the pro-
gram )The length of each atom of dictionary in this ex-
periment selected to 64 and redundancy factor is 4. A
considerable point in this method of image denoising is
decreasing of denoising process time in normal type dic-
tionary design. Figure 3 shows the time of denoising
4. Conclusions
In this paper we defined new method on overcomplete
dictionary design from orthogonal functions. Our experi-
ment showed that non-uniform sampling of thesefunction
makes a dictionary that is closer to Grassman nian frames
Table 2. PSNR of denoised image for uniform and normal
type sampling dictionary (bolded numbers are related to
normal type dictionary de sign).
Table 1. Optimized
,
for each dictionary.
Redundancy
d
2 3 4 5
16 (–0.25,0.64) (–0.3,0.88) (1.76,0.1) (–0.6,0.7)
64 (–0.25,0.8) (–0.45,0.58) (1.02, –0.35) (–0.65,1.95)
128 (–0.45,1) (–0.5,0.8) (1.4, –0.35) (–0.65,1.22)
256 (–0.5,1.16) (–0.5,1.26) (1, –0.5) (–0.70,1.22)
Noise Level
Image
10 20 30 40 50
barbara 33.40/
33.59
29.80/
30.00
27.71/
27.83
26.16/
26.34
25.01/
25.13
lena 34.61/
34.73
31.51/
31.64
29.65/
29.76
28.34/
28.41
27.29/
27.41
boat 33.01/
33.10
29.61/
29.69
27.72/
27.83
26.48/
26.49
25.50/
25.54
pepper34.07/
34.18
31.43/
31.51
29.73/
29.87
28.40/
28.51
27.46/
27.54
Improving Mutual Coherence with Non-Uniform Discretization of Orthogonal Function for
188
Image Denoising Application
0 2 4 6810 12 1416
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Cardinality of the Solution
Pr o b ab ility of Success
N ormal Distribut ion Dictionary
Uniform Distribution Dictionary
Cardinality of the Solution
Probability of the Soccess
Figure 2. Comparison of two type dictionary design in exact
sparse recovery.
10 15 202530 35 40 45 50
0
50
100
150
200
250
300
350
Nois e Level
Required Tim e F or Denoising (sec ond)
Normal Distribution
Uniform Di stribution
Normal Distribution
Uniform Distri bution
Figure 3. Time required for denoising process for two type
dictionary design.
The uniform DST dic tionary
The non-uniform DS T dic tionary
(a) (b)
Figure 4. (a) uniform DST dictionary, (b) non-uniform
(normal) DST dictionary.
compared to the uniform one. Normal distribution is used
for sampling and its parameters (mean and standard de-
viation) are optimized through some optimization proce-
sses. Proposed method design has some advantage in
sparse coding and we show this result with plotted the
probability of success in exact sparse recovery. In this
framework we showed that this type of dictionary design
has better performance than uniform sample dictionary
design in image denoising application with improving the
PSNR of result image and decrease the required time for
image denoising. As mention in paper, we can apply this
type redundant dictionary design on any orthogonal set
and also we can use other distribution for sampling the
interval of function definition.
REFERENCES
[1] S. Mallat and Z. Zhang, “Matching Pursuits with Time
Frequency Dictionaries,” IEEE Transactions on Signal
Processing, Vol. 41, No. 12, 1993, pp. 3397-3415.
doi:10.1109/78.258082
[2] G. Davis, “Adaptive Nonlinear Approximations,” Ph.D.
Dissertation, New York University, New York, 1994.
[3] T. Blumensath and M. Davies, “Gradient Pursuits,” IEEE
Transactions on Signal Processing, Vol. 56, No. 6, 2008,
pp. 2370-2382. doi:10.1109/TSP.2007.916124
[4] S. Chen, D. Donoho and M. Saunders, “Atomic Decom-
position by Basis Pursuit,” SIAM Journal on Scientific
Computing, Vol. 20, No. 1, 1998, pp. 33-61.
doi:10.1137/S1064827596304010
[5] M. Elad, “Sparse and Redundant Representations: From
Theory to Applications in Signal and Image Processing,”
Springer, New York, 2010.
[6] R. Rubinstein, A. M. Bruckstein and M. Elad, “Diction-
aries for Sparse Representation Modeling,” IEEE Pro-
ceedingsSpecial Issue on Applications of Sparse Rep-
resentation & Compressive Sensing, Vol. 98, No. 6, April
2010, pp. 1045-1057.
[7] S. Mallat, “A Wavelet Tour of Signal Processing,” 3rd
Edition, Academic, New York, 2009.
[8] E. J. Candès and D. L. Donoho, “Curvelets—A Surpris-
ingly Effective Nonadaptive Representation for Objects
with Edges,” Vanderbilt University Press, Nashville, 1999.
[9] M. N. Do and M. Vetterli, “The Contourlet Transform: An
Efficient Directional Multiresolution Image Representa-
tion,” IEEE Transaction on Image Processing, Vol. 14, No.
12, 2005, pp. 2091-2106. doi:10.1109/TIP.2005.859376
[10] E. LePennec and S. Mallat, “Sparse Geometric Image
Representations with Bandelets,” IEEE Transaction on
Image Processing, Vol. 14, No. 4, 2005, pp. 423-438.
doi:10.1109/TIP.2005.843753
[11] K. Engan, S. O. Aase and J. H. Husoy, “Method of Opti-
mal Directions for Frame Design,” IEEE International
Conference on Acoustics, Speech, and Signal Processing,
Phoenix, 15-19 March 1999, pp. 2443-2446.
[12] M. Aharon, M. Elad and A. M. Bruckstein, “The K-SVD:
An Algorithm for Designing of Overcomplete Dictionaries
for Sparse Representation,” IEEE Transactions on Signal
Processing, Vol. 54, No. 11, 2006, pp. 4311-4322.
doi:10.1109/TSP.2006.881199
Copyright © 2011 SciRes. JSIP
Improving Mutual Coherence with Non-Uniform Discretization of Orthogonal Function for
Image Denoising Application
Copyright © 2011 SciRes. JSIP
189
[13] M. Aharon, M. Elad and A. M. Bruckstein, “On the Uni-
queness of Overcomplete Dictionaries, and a Practical
Way to Retrieve Them,” Journal of Lineaar Algebra and
Applications, Vol. 416, No. 1, 2006, pp. 48-67.
[14] T. Strohmer and R. Heath, “Grassmannian Frames with
Applications to Coding and Communication,” Applied and
Computational Harmonic Analysis, Vol. 14, No. 3, 2003,
pp. 257-275. doi:10.1016/S1063-5203(03)00023-X
[15] J. Tropp, I. Dhillon, R. Heath Jr. and T. Strohmer, “De-
signing Structural Tight Frames via an Alternating Pro-
jection Method,” IEEE Transactions on Information
Theory, Vol. 51, No. 1, 2005, pp. 188-209.
doi:10.1109/TIT.2004.839492
[16] M. Yaghoobi, T. Blumensath and M. Davies, “Dictionary
Learning for Sparse Approximations with the Majorization
Method,” IEEE Transactions on Signal Processing, Vol.
57, No. 6, 2009, pp. 2178-2191.
doi:10.1109/TSP.2009.2016257
[17] J. C. Lagarias, J. A. Reeds, M. H. Wright and P. E. Wright,
“Convergence Properties of the Nelder-Mead Simplex
Method in Low Dimensions,” SIAM Journal of Optimiza-
tion, Vol. 9 No. 1, 1998, pp. 112-147.
doi:10.1137/S1052623496303470
[18] D. E. Goldberg, “Genetic Algorithms in Search, Optimi-
zation and Machine Learning,” Addison Wesley, Reading,
1989.
[19] M. Elad and M. Aharon, “Image Denoising via Sparse and
Redundant Representations over Learned Dictionaries,”
IEEE Transactions on Image Processing, Vol. 15, No. 12,
December 2006, pp. 3736-3745.