Journal of Signal and Information Processing, 2011, 2, 117-124
doi:10.4236/jsip.2011.22016 Published Online May 2011 (http://www.SciRP.org/journal/jsip)
Copyright © 2011 SciRes. JSIP
117
Speech Enhancement Using Cross-Correlation
Compensated Multi-Band Wiener Filter Combined
with Harmonic Regeneration
Venkata Rama Rao1, Rama Murthy2, K. Srinivasa Rao3
1Deptment of ECE, Gudlavalleru Engineering College, Gudlavalleru, India; 2Jayaprakash Narayan College of Engineering, Dhar-
mapur, Mahabubnagar, India; 3Principal, TRR College of Engineering, Pathancheru, India.
Email: chvramaraogec@gmail.com, mbrmurthy@gmail.com, principaltrr@gmail.com
Received December 6th, 2010; revised April 25th, 2011; accepted April 29 th, 2011.
ABSTRACT
The speech signal in general is corrupted by noise and the noise signal does not affect the speech signal uniformly over
the entire spectrum. An improved Wiener filterin g method is p roposed in this pap er fo r redu cing bac kground noise from
speech signal in colo red noise environmen ts. In view of nonlinear variation of huma n ear sensibility in frequency spec-
trum, nonlinear multi-band Bark scale frequency spacing approach is used. The cross-correlation between the speech
and noise signal is considered in the proposed method to reduce colored noise. To overcome harmonic distortion in-
troduced in enhanced speech, in the proposed method regenerate the suppressed harmonics are regenerated. Objective
and subjective tests were carried out to demonstrate improvement in the perceptual quality of speeches by the proposed
technique.
Keywords: Speech Enhancement, Wiener Filter, Critical Band and Speech Harmonics
1. Introduction
In many speech communication systems, recognition of
speech signal from a corrupted speech signal with back-
ground noise is a challenging task especially at low SNR
(signal to noise ratio) values. Speech quality and intelli-
gibility might significantly deteriorate in the presence of
background noise, especially when the speech signal is
subjected to In many speech communication systems,
background noise in corrupted speech is a challenging
task especially at low SNR (signal noise ratio) values.
Speech quality and intelligibility might significantly de-
teriorate in the presence of background noise, especially
when the speech signal is subject to subsequent process-
ing, such as automatic speech recognition and speech
coding. Due to use of automatic speech processing sys-
tems in a variety of real world applications, speech en-
hancement has become an important topic of research.
Several speech enhancement systems are available in the
literature [1-4]. The enhancement of noise corrupted
speech signal can be done using the Wiener filtering
technique [5,6], spectral subtraction method [7] or Kal-
man filtering technique. The power spectral subtraction
and the Wiener filtering algorithms are widely used be-
cause of their low computational co mplexity and impres-
sive perform a nce.
In general, in these algorithms the enhanced speech
spectrum is obtained by subtracting an estimated noise
spectrum from noisy speech spectrum or by multiplying
the noisy spectrum with a gain function. Let the noisy
speech, clean speech and noise signals are denoted by
y
n,
x
n and
dn respectively in time domain.
If it is assumed that noise is additive, th en
y
n can be
expressed as:

y
nxndn (1)
applying the Fast Fourier transform (FFT) to (1),at the
frame and frequency bin,
th
mth
k

y
n can be rep-
resented as:

,,YmkXmk Dmk
, (2)
where
,Ymk,
,
X
mk and are the noisy
speech, clean speech and noise signals FFT coefficients.
An estimate of the clean speech component denoted as
,Dmk
,
X
mk can be obtained by multiply ing with filter gain
function
k,Wm as given in (3 )

ˆ,,
,
X
mkW mkY mk (3)
Speech Enhancement Using Cross-Correlation Compensated Multi-Band
118 Wiener Filter Combined with Harmonic Regeneration
The phase of the noisy speech is kept unchanged since
it is assumed that the phase distortion is not perceived by
the human ear. It is well-known the frequency resolution
of human’s hearing is non-uniform and usually described
by critical bands or bark scale. The real-world noise does
not affect the speech signal uniformly over the whole
spectrum therefore; multiplying with a constant factor of
noise spectrum over the whole range may remove speech
also.
A new multi-band approach to the Wiener filter method
that reduces colour noise is developed. The method uses
a different weighting factor for each frequency sub-b and.
The factor includes cross-correlation components be-
tween clean speech and noise signal also. Enhanced
speech quality can b e improved in perceptual sens e using
non-linear Bark-scaled frequency spacing based on the
fact that human ear sensibility varies nonlinearly in fre-
quency spectrum.
In most spoken languages, voiced sounds represent a
large amount (around 80%) of the pronounced sounds. In
the classic short-time suppression techniques some har-
monics are considered as noise only components and are
consequently suppressed by the noise reduction process.
This is one major limitation of those methods. To over-
come this limitation, a method, called regeneration of
suppressed harmonics that takes into account the har-
monic characteristic of speech, is proposed. In this ap-
proach, the output signal of classic noise reduction tech-
nique is further processed to create an artificial signal
where in the missing harmonics are automatically regen-
erated. This artificial signal is used to refine the apriori
SNR used to compute a spectral gain.
2. Multi-Band Wiener Filter
In real environments, noise spectrum is not uniform for
all the frequencies. For example, in the case of engine
noise the most of noise energy is concentrated in low
frequency. The human ear sensibility varies nonlinear in
frequency spectrum. The principle of psychoacoustics
[8,9] suggests that a spectral gain may be shared among
adjacent high frequency components. A commonly used
scale for signifying the critical bands is the Bark scale
that divides the audible frequency range of 16 KHz into
24 abutting bands. Figure 1 illustrates the relationship
between the frequency in hertz and the critical band rate
in Bark. An approximate analytical expression to de-
scribe the conversion from linear frequency f, into the
critical band number b (in Bark) is:
 
2
13 arctan0.763.5 arctan7.5
f
bff 







In the frequency range from 0-8 KHz, there are 18
(a)
(b)
Figure 1. (a) Critical band rate and (b) Frequency.
critical bands. Therefore the spectral Wiener filter was
modified for a critical band analysis to obtain the power
spectral density on a Bark scale k:
 

2,1,2
ii
YbYk iK

(4)
where i is the critical band number, K = 18 is the total
number of critical bands and is the frequency in-
dex depending on the lower and upper frequency bound-
ary of the critical band i.

i
The sub-band Wiener filter is derived accord ing to the
minimum mean square error (MMSE) criterion between
the ideal and estimated sub-band speech signals in each
of the sub-band. Its cost function in one sub-band is de-
fined as
 
2
ˆ
iii
ESk Sk

(5)
where i
represents the expectation operator.
ˆi
Sk
and
i
Sk denote the estimated and ideal sub-band
speech signals in the ith sub-band respectively.
In each sub-band, the noise suppression is performed
by multiplying the Wiener filter gain to the sub-band
noisy speech as:

ˆiii
Sk GYk
(6)
By substituting (6) in (5) and simplifying (5) we will
get i
as

 

 
2222
1
21
iii ii
iiii
GESkGEDk
GGESkDk



 

(7)
2.1. Conventional Wiener Filter
In conventional Wiener filter assumed that
i
Sk and
i
Dk are zero mean and uncorrelated in each sub-band
and (7) can be simplified to be

 
2222
1
iii ii
GESkGEDk



(8)
By setting the differentiation of (8) w.r.t weighting
factor i to zero and the weighting factor can be
derived to be
Gi
G
Copyright © 2011 SciRes. JSIP
Speech Enhancement Using Cross-Correlation Compensated Multi-Band
Wiener Filter Combined with Harmonic Regeneration
Copyright © 2011 SciRes. JSIP
119
of calculating the cross term between and

i
Sk
i
Dk,
we estimate the crosscorrelation between
Yk
i and
i
Dk. Then

 
2
22
22
22 2
ii
ii i
i
i
ii
SS
SD Y
ES k
GESkED k

 





(9)


  


2
iii ii
ii iii
EY kD kESkD kDk
ES kDkEYkDkEDk







 

(11)
where i
S
2
, i
2
D
and i
Y
2
represent the variance of the
sub-band clean speech, noise and noisy speech in the ith
sub-band respectively.
By considering the crosscorrelation between
i
Yk
and
i
Dk,

ii ii
EYkDk YkDk


(12)
2.2. Crosscorrelation Compensated Wiener Filter
The autocorrelation sequences of one frame of a clean
speech, together with the background and noisy version
of the same speech signal are shown in Figure 2. The
autocorrelation sequence of noisy speech signal is not
exactly equal to the sum of the autocorrelations of the
noise and clean speech signals. This indicates the exis-
tence of the crosscorrelation between clean speech signal
and noise signal [10].
where
is the crosscorrelation coefficient [9] for esti-
mating the correlation between noisy speech signal and
noise in a sub-band. By sub stitutin g (11) and (12) in (10),
filter gain can be obtained as
i
G

  


22
2
21
iii i
i
i
ii
i
i
ii
ESkYkDkED k
GEY
YkDk
kED k
Gk









(13)
Therefore, we cannot neglect the crosscorrelation be-
tween and . Then by differentiating (7)
w.r.t and equating to zero and simplifying, we get

i
Sk
i
G

i
Dk
 
 
2
22
2
iii
i
ii ii
ESkES kDk
GESkEDkES kDk

 


 

 
. (10) where
 

2
2
i
i
i
ESk
kEDk
and
 

2
2
i
i
i
EYk
kEDk
Since we have access only corrupt signal
i
Yk but
not it is not possible to estimate the cross corre-
lation terms between and . Hence, instead

i
Sk

i
Sk

i
Dk represen t the apriori SNR and the aposteriori SNR in the
ith sub-band respectively.
Figure 2. Autocorrelation sequences of clean speech, noisy speech, noise and sum of the clean speech and noise signals.
Speech Enhancement Using Cross-Correlation Compensated Multi-Band
120 Wiener Filter Combined with Harmonic Regeneration
3. Regeneration of Suppressed Harmonics
The output signal or

ˆi
Sk

ˆ
s
t in time domain, ob-
tained by the multiband Wiener filter presented in the
previous section still suffers from distortions. This is
inherent to the estimation errors introduced by the noise
spectrum estimation since it is very difficult to get reliable
instantaneous estimates in sing le channel noise reduction
techniques. Since 80% of the pronounced sounds are
voiced in average, the distortions generally turnout to be
harmonic distortion. Indeed, some harmonics are consid-
ered as noise only components and are suppressed. For
that reason, we propose to process the distorted signal to
create a fully harmonic signal where all the missing har-
monics are regenerated. This signal will then be used to
compute a spectral gain able to preserve the speech har-
monics. This will be called the speech harmonic regen-
eration step and can be used to improve the results of any
noise reduction technique and not only the multiband
Wiener filter.
A simple and efficient way to restore speech harmon-
ics consists of applying a nonlinear function NL (e.g.,
abs olu te va lue, mini mum o r max imu m rel ativ e to thr esh-
old, etc.) to the time signal enhanced in a first procedure
with a classic noise reduction technique. Then, the artifi-
cially restored signal is obtained by
 
ˆ
harm
s
tNLst (14)
In this work, half wave rectification is used as a
nonlinear function and applied to the signal. As a conse-
quence, this signal cannot be used directly as clean
speech estimation. Nevertheless, it contains very useful
information that can be exploited to refine the apriori
SNR.
  



22
2
ˆ
() 1
i ham
harm
i
i
kE S kkEsk
kED k








(15)
The parameter is used to control the mixing

k
level of

2
ˆi
ESk



and

2
harm
ES k
. The proper-
ties of this parameter are:
when the estimation of
ˆi
Sk provided by the
multiband Wiener filter algorithm is reliable, the
harmonic regeneration process is not needed and

k
should be equal to 1.
when the estimation of
ˆi
Sk provided by the
multiband Wiener filter algorithm is unreliable, the
harmonic regeneration process is required to cor-
rect the estimation and

k
should be equal to 0
(or any other constant value depending on the cho-
sen nonlinear function).
The
k
parameter can be chosen constant to real-
ize a compromise between the two estimators
ˆi
Sk
and
harm
Sk. In present work, we propose to choose
ki
G
to match above properties. And the apriori
SNR is refined which is used to compute a new spectral
gain [11-13]
 



2
i
i
Yk
ED
k
1
ii
harm
i
harm
i
Dk
kk
G


(16)
4. Results and Discussion
To evaluate and compare the performance of the pro-
posed method, simulations are carried out with the NO-
IZEUS [14], a database widely used in testing speech
enhancement algorithms. The noisy database contains 30
IEEE sentences (produced by three male and three fe-
male speakers) corrupted by eight different real-world
noises at different SNRs. Speech signals were degraded
with seven types of noise at global SNR levels of 0 dB,
5 dB, 10 dB and 15 dB. The noises were airport, car,
babble, train and street noises. The objective quality
measures used for the evaluation of the proposed method
are the segmental SNR and noise reduction (NR) values.
It is well known that the segmental SNR is more accurate
in indicating the speech distortion than the overall SNR.
The higher value of the segmental SNR and NR values
indicates the weaker speech distortions and better per-
ceived quality of the processed speech signal [15]. The
performance of the proposed method is compared with
Wiener filter and multi-band Wiener filter.
Table 1 shows the segmental SNR improvement with
segment size equal to 256 for various noise levels. The
performance of the proposed method almost outperforms
that of the Wiener filter and multi-band Wiener filter.
Table 2 demonstrates the comparison of NR values. It
reveals that the proposed method benefits low speech
distortion and retains the residual noise at an acceptable
level. The timing waveforms of the enhanced speech are
demonstrated in Figure 3. Clean speech signal is cor-
rupted by airport noise at 0 dB SNR. It shows that pro-
posed method can efficiently remove the background
noise.
Figure 4 shows the comparison of spectrograms. The
background noise can be efficiently removed by the pro-
posed method. It is evident from listening tests that the
proposed method efficiently reduces the background
noise with less speech distortion.
Copyright © 2011 SciRes. JSIP
Speech Enhancement Using Cross-Correlation Compensated Multi-Band 121
Wiener Filter Combined with Harmonic Regeneration
Table 1. Segmental SNR in the enhanced speech in various
noise environments.
Type of noise
and SNR (dB) Wiener
filter Multi-band
Wiener Proposed
method
Airport-0 –4.37 –2.39 –1.76
Airport-5 –2.57 0.67 0.87
Airport-10 –0.06 0.43 0.46
Airport-15 1.88 3.13 3.43
Babble-0 –4.59 –1.91 –1.14
Babble-5 –1.39 0.05 0.36
Babble-10 0.03 2.36 2.48
Babble-15 2.71 3.06 3.97
Car-0 –3.93 –1.02 –1.28
Car-5 –1.65 1.69 1.75
Car-10 0.68 2.40 2.60
Car-15 2.31 2.71 2.98
Street-0 –2.88 –1.97 –1.39
Street-5 –2.13 –0.29 –0.11
Street-10 1.20 2.42 2.54
Street-15 2.25 2.48 2.88
Train-0 –3.45 –2.13 –1.77
Train-5 –0.86 0.93 1.90
Train-10 –0.39 1.69 2.86
Train-15 2.62 2.57 3.57
Restaurant-0 –5.49 –3.44 –3.05
Restaurant-5 –3.61 –0.15 0.21
Restaurant-10 –0.49 1.28 1.56
Restaurant-15 1.80 2.47 2.68
Station-0 –3.62 0.51 0.89
Station-5 –1.93 1.18 1.57
Station-10 0.95 2.39 2.89
Station-15 2.72 2.86 3.52
Table 2. Noise reduction values in various noise environ-
ments.
Type of noise
and SNR (dB) Wiener
filter Multi-band
Wiener Proposed
method
Airport-0 –4.37 25.00 26.05
Airport-5 –2.57 25.98 26.86
Airport-10 –0.06 24.01 24.25
Airport-15 1.88 26.22 26.92
Babble-0 –4.59 25.9 –1.14
Babble-5 –1.39 25.85 26.08
Babble-10 0.03 26.60 26.69
Babble-15 2.71 26.12 26.46
Car-0 –3.93 26.30 26.56
Car-5 –1.65 27.69 27.75
Car-10 0.68 26.89 27.03
Car-15 2.31 25.81 25.89
Street-0 –2.88 25.34 25.92
Street-5 –2.13 25.36 25.55
Street-10 1.20 25.86 26.86
Street-15 2.25 25.24 25.67
Train-0 –3.45 25.63 25.99
Train-5 –0.86 26.64 26.89
Train-10 –0.39 25.96 26.10
Train-15 2.62 25.44 25.56
Restaurant-0 –5.49 23.50 23.88
Restaurant-5 –3.61 25.42 25.79
Restaurant-10 –0.49 25.61 25.89
Restaurant-15 1.80 25.42 25.56
Station-0 –3.62 28.37 28.78
Station-5 –1.93 27.11 27.67
Station-10 0.95 26.76 26.98
Station-15 2.72 25.87 26.02
Copyright © 2011 SciRes. JSIP
Speech Enhancement Using Cross-Correlation Compensated Multi-Band
Wiener Filter Combined with Harmonic Regeneration
Copyright © 2011 SciRes. JSIP
122
(a) (b)
(c) (d)
(e)
Figure 3. Timing waveforms of (a) the clean speech (b) noisy speech corrupted and the enhanced speech using (c) Wiener
filter (d) Multi-band Wiener filter and (e) the proposed method.
Speech Enhancement Using Cross-Correlation Compensated Multi-Band 123
Wiener Filter Combined with Harmonic Regeneration
(a) (b)
(a) (b)
(e)
Figure 4. Spectrograms of (a) the clean speech (b) noisy speech corrupted and the enhanced speech using (c) Wiener filter (d)
Multi-band Wiener filter and (e) the proposed method.
5. Conclusions
This paper presents an improved Wiener filtering method
that takes into account the non-uniform effect of colored
noise on the spectrum of speech. Proposed method in-
cludes the cross correlation terms between the clean
speech and noise. Multi-band Wiener filtering method
reduces residual musical tones that appear in enhanced
speech for Wiener filtering. A noise reduction technique
based on the principle of harmonic regeneration is also
proposed. Classic techniques, including the Multi-band
wiener, suffer from harmonic distortions when the SNR
is low. This is mainly due to estimation errors introduced
by the noise PSD estimator. To solve this problem, non-
linearity is used to regenerate the degraded harmonics of
the distorted signal in efficient way.
The resulting artificial signal is used to refine the apri-
ori SNR which is then used to compute a spectral gain
that preserves speech harmonics, and hence avoids dis-
tortions. Results are given in terms of segmental SNR
Copyright © 2011 SciRes. JSIP
Speech Enhancement Using Cross-Correlation Compensated Multi-Band
124 Wiener Filter Combined with Harmonic Regeneration
and noise reductio n values. All these results demonstrate
the good performance of the proposed method.
REFERENCES
[1] Y. Ephraim, “Statistical-Model-Based Speech Enhance-
ment Systems,” Proceedings of IEEE, Vol. 80, No. 10,
1992, pp. 1526-1555. doi:10.1109/5.168664
[2] J. R. Deller, H. G. Proakis and J. H. L. Hansen, “Dis-
crete-Time Processing of Speech Signals,” Macmillan,
New York, 1993.
[3] S. V. Vaseghi, “Advanced Digital Signal Processing and
Noise Reduction,” 2nd Edition, John Wiley & Sons ltd.,
Chichester, 2000. doi:10.1002/0470841621
[4] Y. Gui and H. K. Kwan, “Adaptive Subband Wiener Fil-
tering for Speech Enhancement Using Critical-Band
Gammatone Filterbank,” 48th Midwest Symposium on
Circuits and Systems, Vol. 1, 7-10 August 2005, pp. 732-
735.
[5] J. S. Lim and A. V. Oppenheim, “Enhancement and
Bandwidth Compression of Noisy Speech,” Proceedings
of IEEE, Vol. 67, No. 12, 1979, pp. 1586-1604.
doi:10.1109/PROC.1979.11540
[6] Y. Ephraim and D. Malah, “Speech Enhancement Using a
Minimum Mean Square Error Short-Time Spectral Am-
plitude Estimator,” IEEE Transaction on Speech Audio
Processing, Vol. 32, No. 6, 1984, pp. 1109-1121.
[7] S. F. Boll, “Suppression of Acoutics Noise in Speech
Using Spectral Subtraction,” IEEE Transaction on
Acoustics, Speech, Signal Processing, Vol. 27, No. 2,
1979, pp. 113-120. doi:10.1109/TASSP.1979.1163209
[8] E. Zwicker and H. Fastl, “Psychoacoustics,” Springer
Verlag, Berlin, 1990.
[9] K. A. Sheela, Ch. V. R. Rao, K. S. Prasad and A. V. N.
Tilak, “A New Noise Reduction Pre-Processor for Mobile
Voice Communication Using Perceptually Weighted
Spectral Subtraction Method,” 3rd International Confer-
ence on Mobile Ubiquitous and Pervasive Computing,
VIT University, 16-19 December 2006.
[10] G. Farahani, S. M. Ahadi and M. M. Homayounpoor,
“Robust Feature Extraction of Speech via Noise Reduc-
tion in Autocorrelation Domain,” Lecture Notes in Com-
puter Science 4105, Springer-Verlag, 2006, pp. 466-473.
[11] P. Scalart and J. V. Filho, “Speech Enhancement Based
On a Priori Signal to Noise Estimation,” Proceedings of
IEEE International Conference on Acoustics, Speech
Signal Processing, Atlanta, Vol. 2, May 1996, pp. 629-
632. doi:10.1109/ICASSP.1996.543199
[12] J. E. Porter and S. F. Boll, “Optimal Estimators for Spec-
tral Restoration of Noisy Speech,” Proceedings of IEEE
International Conference on Acoustics, Speech Signal
Processing, Vol. 9, March 1984, pp. 53-56.
[13] I. Cohen, “Optimal Speech Enhancement under Signal
Presence Uncertainty Using Log-Spectral Amplitude Es-
timator,” IEEE Signal Processing Letters, Vol. 9, No. 4,
2002, pp. 113-116. doi:10.1109/97.1001645
[14] A Noisy Speech Corpus for Evaluation of Speech En-
hancement Algorithms, 2011.
http://www.utdallas.edu/~loizou/speech/noizeus/
[15] Y. Hu and P. C. Loizou, “Evaluation of Objective Quality
Measures for Speech Enhancement,” IEEE Transaction
on Audio, Speech and Language Processing, Vol. 16, No.
1, 2008, pp. 229-238. doi:10.1109/TASL.2007.911054
Copyright © 2011 SciRes. JSIP