Journal of Software Engineering and Applications, 20 12, 5, 193-199
doi:10.4236 /js ea.2012.512b037 Published Online December 2012 (http://www.SciRP.org/journal/jsea)
Copyright © 2012 S ci R es. JSEA
193
Identification of Noisy Utterance Speech Signal using
GA-Based Optimized 2D-MFCC Method and a
Bispectrum Analysis
Benyamin Kusumoputro1, Agus Buono2, Lina3
1Departmen t of Elect rical En gineerin g, Uni versitas In donesi a, Jakart a, Indonesia; 2Department o f Computer S cience, Bogor Agricu l-
tural University, Bogor, Indonesia; 3Department of Computer Science, Tarumanagara University, Jakarta, I ndonesia..
Email: kusum o@ e e . ui. a c .i d, pu desha @yahoo. c o. i d, l ina @ untar . a c .id
Received 2012
ABSTRACT
One -dimensional Mel-Frequency Cepstrum Coefficients (1D-MFCC) in conjunction with a power spectrum analysis
method is usually used as a feature extraction in a speaker identification system. However, as this one dimensional fea-
ture e xtract ion subs ystem shows low recognition rate for identifying an utterance speech signal under harsh noise con-
ditions, we have developed a speaker identification system based on two-dimensional Bispectrum data that was theo-
retically more robust to the addition of Gaussian noise. As the processing sequence of ID-MFCC method could not be
directly used for processing the two-dimensional Bispectrum data, in this paper we proposed a 2D-MFCC method as an
extension of the 1D-MFCC method and the o pti mization o f the 2D filter de sig n using Genetic Algorithms. By using the
2D-MFCC method with the Bispectrum analysis method as the feature extraction technique, we then used Hidden
Markov Model as the pattern classifier. In this paper, we have experimentally shows our developed methods for ide nti-
fying an utterance speech signal buried with various levels of noise. Experimental result shows that the 2D-MFCC
method without GA optimization has a comparable high recognition rate with that of 1D-MFCC method for utterance
signal without noise addition. However, when the utterance signal is buried with Gaussian noises, the developed
2D-MFCC shows higher recognition capability, especially, when the 2D-MFCC optimized by Genetics Algorithms is
utilized.
Keywords: 2D Mel-Frequency Cepstrum Coefficients; Bispectrum; Hidden Markov Model; Genetics Algorithms
1. Introduction
Research on automatic speech and voice identification
system has attracted much interest in the last few years,
motivated b y the gro wth o f its app licatio ns in man y areas
such a s in diagnosis of a ro tor crack [1], classification o f
unknown radar targets [2], medical disease [3], and for
personal and gender identification for security system
[4,5]. Speaker based personal identification is the process
of determining a registered speaker when an utterance
speech signal is provided. In this machine-based speech
identification, a gallery of speeches is firstly enrolled to
the s ystem and cod ed for sub sequent searchi ng. W hen an
unide nt i fied sp e ec h is fe tche d to the s yste m, a tho ro u ghl y
comparison with the each coded speech in the gallery,
and the identi ficatio n is then a ccompli shed when a suita-
ble match occurs.
Speaker identification system can be divided into two
subsystems, i.e., a feature extraction subsystem and a
classifier subsystem. The main function of a feature ex-
traction subsystem is to transform the input utterance
speech signal into a set of features, while a classifier
subsystem have to identify and classify the speaker by
comparing the extracted-features from his/her speech
signa l input with the ones from a set of known speakers
database.
Conventional feature extraction subsystem usually
used Mel-Freq uency Cepst rum Coe fficients (M FCC) and
power spectrum analysis methods [6]. Power spectrum
analysis method, however, shows low recognition rate
for classifying the utterance speech signal under harsh
noise condition [7]. To solve this problem, higher order
signal analysis, i.e. bispectrum analysis method is util-
ized, since the bispectrum value is theoretic ally robust to
Gaussian noise [8], which can be empirically proved by
researchers such as in [7,9]. As the utterance speech in
bispectrum data is represented as a pattern in 2D decision
space, bispectrum analysis required two-dimensional
filter design, and for that purpose, we have developed
2D-MFCC filter design method that will be explained
here.
Identification of Noisy Utterance Speech Signal using GA-Based Optimized 2D -MFCC
Method and a Bispectrum Analysis
Copyright © 2012 SciRes. JSEA
194
The remainder of this paper is organized as follows. In
Section 2, we formulate the development of 2D-MFCC
filter develop ment. Section 3 p resents t he op timizatio n o f
2D-MFCC filter development by using Genetic Algo-
rithms. Section 4 shows the experimental setup and re-
sults to demonstrate the effectiveness of the proposed
method. Finally, Section 5 is dedicated to a summary of
this study and suggestions for future research directions.
2. Speaker Identification System
The focus of this paper is to develop a feature extraction
subsystem that could increased the recognition rate of the
classifier subsystem (HMM method), to classify an ut-
terance speech buried in a harsh noise condition. In the
developed method, the feature extracting subsystem is
composed of a 2D-MFCC filter design to extract the 2D
information contained in the Bispectrum data. The Bis-
pectrum data is represented as a 2D vector with MxM
elements in a 2D frequency space of f1 and f2, respec-
tively. In this section we will present a brief review of
1D-MFCC filter construction and the developed of
2D-MFCC filter construction for representing the Bis-
pectrum data.
We developed further the feature extraction subsystem
by using a Genetic Algorithm (GA) method. GA is used
to optimize the filter characteristics in such that the dif-
ference between the feature vector of a speech signal
without noise addition and the feature vector of a speech
signal with Gaussian noise addition will be as small as
possible. By reducing the difference between these two
signals from the same speaker, the possibility of the
speaker to be recognized correctly will be higher. As the
learning method of the classifier subsystem is important
aspect for increasing the recognition rate as in the soft
computing methods, in this research, a Hidden Markov
Model (HMM) trained by Baum Welch Algorithm is
utilized [10 ].
In the learning phase, samples of the speaker’s speech
for a certain phrase of a word is inputted to the speech
database, and by using these samples, the classifier sys-
tem is trained to develop the reference models for those
determined-speakers. In the application phase, the input
utterance speech signal is compared with each of the
models that has already been stored (as a reference model)
on the database, and the classifier decided the winning
speaker by determining the highest recognition rate for
all the reference models.
2.1. Power Spectrum Analysis with 1D-MFCC
Method
Suppose each tone of an utterance speech signal with an
actual frequency f (Hz) is represented in Mel-frequency
scale, following the relation ship of:
10
ˆ2595*log1 700
mel
f
f
= +


(1)
when the actual frequency f is higher than 1000 Hz, and
linear when the actual frequency is lower than 100 Hz.
The 1D-MFCC filter design method provides a trian-
gular filter with height of 1 at its middle point, and 0 at
their left and right parts for filtering the 1D
Mel-frequency data. As can be seen in the Fig.2,
1D-MFCC filter can be depicted as three vertex points:
(fi-1,0 ), (fi, 1), and (fi+1,0) for the ith filter, with i = 1 , …, M.
It is clearly seen that determining the center point of each
filter and the distance between the two adjacent center
points of the filter is essential [7,9 ,10].
The Mel-frequency spectrum coefficients MFSi is cal-
culated as the sum of the filtered 1D Mel-frequency X(j)
that can be expressed as:
1
0
log(( ))*()
N
ii
f
MFSabs XjHf
=

=

(2)
where i=1,…,M, M the number of the triangular filter, N
the number of FFT coefficients. The abs(X(j)) is the
magnitude of jth of the FFT pro cess o f the inp ut utter anc e
signal, and Hi(f) is the height of ith triangular at point f.
The MFCCk value is then calculated by using Discrete
Cosine Transform to transform the Mel-frequency spec-
trum coefficients back into its ti me domain throug h:
1
*( 0.5)*
*cos 20
M
ki
i
ki
MFCC MFS
π
=

=

(3)
where k=1,…,K the number of coefficients.
2.2. Bispectrum Analysis with 2D-MFCC
Method
Bispectrum analysis of an utterance speech signal can be
explained as follows. If {X(k)}, k=0,±1,...,±2 is a real
random process, then the cumulants of order 3 is
)2,1(
3
ττ
X
c
:
1
23
31
31
12
( 1,2)(1)(1)!
*
Xpk
piS
kk
iS is
C pEX
EXEX
ττ
ττ
=
++
∈∈

=−−









∏∏
(4)
where the summation extends over all partitions
(s1,s2,…,sp), p=1,2,3, of the set of integers (1,2,3). Bis-
pectrum, referred to as cummulant spectra, is a Fourier
transform of cummulant sequence, and formulated as:
( )
12
31 23 12
1122
(, ),
*exp{(,)}
xx
Cc
j
ττ
ωω ττ
ωτ ωτ
+∞ +∞
=−∞ =−∞
=
∑∑
(5)
Identification of Noisy Utterance Speech Signal using GA-Based Optimized 2D-MFCC
Method and a Bispectrum Analysis
Copyright © 2012 SciRes. JSEA
In the case of stationary process, the cummulant order
3 can be formulated as:
{ }
( ,)()()()
3121 2
x
cE xtxtxt
τττ τ
= ++
(6)
Basically, there are two approaches to predict the Bis-
pectrum, i.e. a parametric approach and a conven- tional
approach. The conventional approach consists of the fol-
lowing three classes, i.e. an indirect technique, a direct
technique and a complex demodulates method. Because
of its simplicity, in this research, the Bispectrum data is
predicted using the conventional indirect method, in
which the detail of this algorithm is presented elsewhere
[11].
Since Bispectrum data is represented in two dimen-
sions o f fre q uenc y f1a nd f2, a 2D-MFCC filter , instead of
a 1D-MFCC filter, should be used to process the data. To
develop 2D-MFCC filter, we firstly construct a 1D-
MFCC filter in each of f1 and f2 dimension, with in the
first dimension f1 as f1i ; i=1,. . .,M and in the second
dimension f2 as f2j; j=1,. .., N, with M=N. We then com-
bined the two separate 1D-MFCC Hi(f1m) and the other
1D-MFCC Hj(f2n) into an integrated 2D-MFCC
Hij(f1m,f2n) as a pyramid shape, that can be depicted in
Figure 1a. The base o f this pyra mid shape is a square with
its corner positions are (f1i-1,f2 j-1), (f1 i +1 , f2 j-1), (f1 i-1,f2 j +1)
and ( f1i +1 , f2 j+1), respectively, as can be seen in Figure 1b.
The connected lines between the center of the square
shape and each of the corner points determined as the
line a, line b, line c and line d, respectively, which its
line equation ca n be written as :
Figure 1. The construction of 2D-MFCC filter and its cal-
culation for Bispec tru m data B(f1m,f2n).
line a:
( )
1
1
22
211 2
11
jj
ij
ii
ff
fff f
ff

= −+

 (7)
line b:
( )
1
1
22
211 2
11
jj
ij
ii
ff
fff f
ff
+

= −+

 (8)
line c:
( )
1
1
22
211 2
11
jj
ij
ii
ff
fff f
ff
+
+

= −+

 (9)
line d:
( )
1
1
22
211 2
11
jj
ij
ii
ff
fff f
ff
+

= −+

 (10)
Using these lines, the square shape of the pyramid fil-
ter can be divided into four quadrants as can be seen in
Figure 1c. Suppose we have a Bispectrum data B(f1m,f2n)
in the t wo d imens io n freq uenc y space such as depicted at
Figure 1d. The height of the 2d filter is calculated by
firstly determined the quadrant of the data and then cal-
culate the Hi,j(f1m,f2n), using algorithm written below.
1. If B(f2n) > f2 j-1, and
( )
1
1
22
(2)( 1)12
11
jj
nmij
ii
ff
BfBff f
ff

< −+


( )
1
1
22
(2)( 1)12
11
jj
nmij
ii
ff
BfBff f
ff
+

< −+


Then B(f1m,f2n) Є quadrant I;
1
,1
( 2)2
( 1,2)22
nj
ij mnjj
Bf f
Hf fff
=
(11)
2. If B(f2n) < f2j+1 , and
( )
( )
1
1
1
1
22
(2)( 1)12
11
22
(2)( 1)12
11
jj
nmi j
ii
jj
nmi j
ii
ff
BfBff f
ff
ff
BfBff f
ff
+
+
+

> −+



> −+


Then B(f1m,f2n) Є quadrant II;
1
,1
2( 2)
( 1,2)22
jn
ij mnjj
f Bf
Hf fff
+
+
=
(12)
3. If B(f1m)>f1i-1, and
( )
( )
1
1
1
1
22
(2)( 1)12
11
22
(2)( 1)12
11
jj
nmi j
ii
jj
nmij
ii
ff
BfBfff
ff
ff
BfBfff
ff
+

> −+



< −+


Then B(f1m,f2n) Є quadrant III;
1
,1
(1) 1
( 1,2)11
mi
ij mnii
Bf f
Hf fff
=
(13)
Identification of Noisy Utterance Speech Signal using GA-Based Optimized 2D -MFCC
Method and a Bispectrum Analysis
Copyright © 2012 SciRes. JSEA
196
4. If B(f1m)<f1i+1, and
( )
( )
1
1
1
1
22
(2)( 1)12
11
22
(2)( 1)12
11
jj
nmi j
ii
jj
nmij
ii
ff
BfBfff
ff
ff
BfBff f
ff
+
+
+

> −+



< −+


Then B(f1m,f2n) Є quadrant IV;
1
,1
1 (1)
( 1,2)11
im
ij mnii
f Bf
Hf fff
+
+
=
(14)
Using the same calculation such as in the 1D-MFCC
method (see Eq. (2)), the Mel-frequency Bispectrum
coefficients MFS(i,j) is calculat ed thr ough:
128 128
11
,
( ,)log[(1,2)
*(1,2)]
mn
mn
ij mn
MFS ijBff
Hf f
= =
=∑∑
(15)
for the 2D-filter height Hi,j(f1m,f2n), with m=1,…,M,
n=1,…,N, and M=N=128. The MF CC(i,j) for 2D-MFCC
is then calculated through 2D-cosine transform, as:
( )
( 0,5)
(,) *cos
( 0.5)
*cos
NN
kij
ki
MFCCMFS ijN
kj
N
π
π

=

(16)
where k=1,2,3,…,K the number of the coefficient.
3. Optimization of 2D-MFCC Filter using
Genetic Algori t hms
The 2D-MFCC method is developed for calculating the
Mel-Bispectrum coefficients MFCCk (in Eq. 21) of
t wo-dimensions Bispectrum Mel-frequency B(f1m,f2n), b y
calculating the 2D-filter height Hi,j(f1m,f2n). Since the
center position of each filter is very essential in deter-
mining this 2D-filter height Hi,j(f1m,f2n), optimizing the
position of the filter’s center is necessary for reducing the
total error. Thus the goal of the optimization process, i.e.
Genetic Algorithms [17], is to minimize the difference
between the Mel-Bispectrum coefficients of a speech
signal buried with a Gaussian noise and that without a
Gaussian noise by designing the optimized 2D-filter
height Hi,j(f1m,f2n).
The chromosome representation is constructed as fol-
lows. Suppose M is the maximum number of triangular
filters on each frequency dimension f1 and f2, respec-
tively, and F is the maximum frequency for both each
dimensions. Suppose the distance between each of the
center positio n of those filter s, as x1, x2, x3, …, xM+1 such
that x1+x2+x3+ …+xM+1=F, where xi is the distance be-
tween ith filter center with the next (i +1)th filter center,
with i=2,3,4,…,M. For representing the optimized set of
these filters that will be used in the 2D-MFCC, the dis-
tance between two filters center is coded into 7 binary
digits. Then the chromosome that represents a set of fil-
ters could be coded by binary digit with a length of
7*(M+1) digits, i.e., the first seven digits for x1, the
second seven digits for x2, and so on. A simple illustra-
tion of the chromosome representation process is ex-
plained here. Suppose we have four triangular filters on
one-dimension frequency domain, with its center position
are 2.5, 4.5, 6.5 and 8.0, resp ectively, with the ma ximum
frequency F is 10. The distance between each of filter’s
center are x1=2.5, x2=4.5-2.5=2, x3=6.5-4.5=2,
x4=8-6.5=1.5, and x5=10-8=2. The chromosome then
consists of 5 locus, i.e. x1, x2, x3, x4, and x5, in which each
locus is coded by binary digit with length of 7 to be
7*5=35 digits.
The Fitness Function is calculated so that the deter-
mined set of filters produced the Mel-Bispectrum coeffi-
cient MFCCk, that have with very similar characteristics
between the input speech signal added with Gaussian and
that without a G aussian noise addition. T his fitness func-
tion can be mathematically formulated as follow:
),(*),(
),(*),(
)(
4321
4231BBdBBd
BBdBBd
ifitness =
(17)
where B1 is the Bispectrum data B(f1m,f2n) of the signal
without noise addition, B2 is the Bispectrum data
B(f1m,f2n) of the signal added with 20dB Gaussian noise,
B3 is the B2 - B1, B4 is the Bispectrum data B(f1m,f2n) of
20dB Gaussian noise, and d(B k,Bl) the distance between a
feature vector of Bispectrum data Bk and a feature vector
of Bispectrum data Bl.
A conventional roulette whee l is then used to select th e
winning chromosome in population. Chance for any
chromosome to be selected is proportional to their fitness
value. The Crossover technique is then used to alter two
chromosomes into their o ffspring, and in this res earch, an
arithmetic crossover technique is utilized. Suppose two
parents X=(x1,x2,x3,…, xN+1) and Y=(y1,y2,y3,…, yN+1) and
by using an arithmetic crossover technique, their
offspring are X=(x1,x2,x3,…, xN+1) and Y=(y1,y2,y3,…, yN+1)
with aЄ(0,1).
Mutation is a process of transforming any chromo-
some to its offspring through a changing of the internal
gene. The mutation is started by selecting a certain
chromosome to be mutated, followed by randomly gen-
erated two integer numbers p and q, p,q Є [0,N+1] with
N the number of the filter used. The mutation process is
done by inverting the order of locus between those se-
lected points.
Figure 2 shows a sample of the comparison of a 1D-
MFCC filter -bank design by using the conventional method
and a GA-based optimization met hod. It is clea rly shows
Identification of Noisy Utterance Speech Signal using GA-Based Optimized 2D-MFCC
Method and a Bispectrum Analysis
Copyright © 2012 SciRes. JSEA
that different pattern of the filter-bank are achieved,
which is lead to better per formance on its application.
4. Experiment Setup and Results
Several experiments were conducted to evaluate the
performance of the developed system. The utterance
speech signals were recorded as WAV files, conducted
by ten Indonesian people, within the ages range of 12
28 years old. They were asks to say ’pudesha’ with
normaly tones and intuation, but allowed to lengthening
their pronountiation. Each speaker uttered 80 times and
digitized by sampling rate of 11 kHz within duration of
1.28 second, and each frame that consists of 512 samples
per frame is read frame by frame with an overlaps of 256
samples between the adjacent frames. Training/testing
paradigm is taken to be 50%: 50%, in which 400
utterance speeches are used as the training set, while the
other 400 utterance speeches are taken as the testing data
set.
The bispectrum analysis of each frame is conducted by
using conventional indirect method as explained in [8].
We calculated the filtered bispectrum of each frame at
frequency B(f1m,f2n), and converted into MFCCk
coefficients such as in Eq. 3. Number of coefficients K is
determined to be 13, and as the consequence, the
bispectrum value of each frame could be written as a
feature vector that consists of 13 elements. For a balance
comparison, this value is also used for the other feature
extraction methods, including the conventional 1D-
MFCC method.
Hidden Markov Model is used as the classifier in all of
the experiments conducted here, and three different methods
of feature extraction subsystems, i.e. the conventional
1D-MFCC method, 2D-MFCC method and 2D-MFCC-
GA method are examined and compared. Figure 7 sh ows
(a)
(b)
Figure 2. (a) Conventional one-dimensional MFCC filter-
ban k, (b) MFCC filterbank optimized by using GA.
Figure 3. Comparison of recognition rate between 1D-
MFCC with 2D-MFCC for a speech signal without noise
addition.
a comparison of recognition rate between 1D-MFCC
with 2D-MFCC for uttered speech signal without an
addition of a Gaussian noise. Noted that in these
experiments, we ha ve used numerous hidden units in the
HMM classifier for comparison. Experimental results
depicted in Figure 3 show that when the three different
methods are used to classify an utterance speech signal
without Gaussian noise addition, the recognition rates
were very high, i.e. 98.4%, 99.4% and 99.0% for 1D-
MFCC, 2D-MFCC and 2D-MFCC-GA, respectively.
These comparable results sho w that the 2D -MFCC method
is not necessary be used, when it is used to classify an
utterance speech signal without noise addition.
This result also co nfirmed that the 1D-MFCC method,
which was usually used in the conventional system,
works appropriate enough to classify speakers when
there were no noise disturbances. It is also clearly seen
from this figure that the different number of hidden unit
used in the HMM classifier has no influence to the rec-
ognition rate of the system. In the next experiments af-
terward, we have determined to use a three hidden unit
HMM, for convenience.
When a Gaussian noise of 20dB is added to the utter-
ance speech signal, however, the recognition rate of the
1D-MFCC method is dropped significantly. In order to
increase the recognition rate of the systems, we have
analyzed the MFCCk values for both methods as a func-
tion of each coefficient k = 1,…,K, for both methods;
such as depicted in Figure 4. Clearly seen from these
figures, that the first coefficient of both methods is very
sensitive to the ad dition of the Ga ussian noise, sugge sted
that omitting this coefficient on calculating the MFCC
values increases the recognition rate of both methods.
In the next experiments, we have removed the first
coefficient of the MFCC methods, and by using the ut-
terance speech signal without Gaussian noise as the in-
putted signal, the experimental results are depicted in
Figure 5. As can be seen here, it is very clear that the
Identification of Noisy Utterance Speech Signal using GA-Based Optimized 2D -MFCC
Method and a Bispectrum Analysis
Copyright © 2012 SciRes. JSEA
198
three feature extraction subsystems without using the
first coefficient have shown a higher recognition rates,
especially when the 2D-MF CC with GA is utilized.
It is confirmed that removing the first coefficient do
not affected the recognition rate of the feature extraction
subs yste ms fo r t he utt era nce sp eec h signa l witho ut Ga us-
sian noise addition (OSS: original speech signal without
Gaussian noise addition). When the utterance speech
signal is buried with a 20dB Gaussian noise (OSS+20dB:
original signal with 20dB Gaussian noise addition),
however, the maximum recognition rates are 54.4% for
the 1D-MFCC method, 70.5% for 2D-MFCC and 88.5%
for 2D-MFCC -G A, respectively.
(a) 1D-MFCC
(b) 2D-MFCC
Original signal
Original+ noise 20
Original signal
Original+ noise 20
Figure 4 . Recogniti on rate compari son of t he three methods
by removing t he 1st coefficient of the MFCC method.
Figure 5. Comparison of the MFCCk coefficients for the
original signal and its addi tion w ith Gaussian noise 20 dB.
Figure 6. Recognition rate of utterance speech signal with
Gaussian noise addition of 20dB using 1D-MFCC,
2D-MFCC and 2D-M F CC-GA meth ods wi th K=12.
The next experiment was conducted by buried the ut-
terance speech signal in more harsh noise conditions, i.e.
10 dB and 0 dB, respectively. A complete comparison of
the recognition rate for the 2D-MFCC and the
2D-MFCC-GA using an utterance speech signal with an
addition of Gaussian noise of 20 dB, 10 dB and 0 dB,
respectively, is depicted in Figure 6. As shown in this
figure, when the Gaussian noise intensity is increasing,
the recognition rate is decreased accordingly. It can also
be se en that, for a ll of the G aussia n noise i ntensit y level,
the use of GA for optimization of 2D-MFCC for Bispec-
trum signal as this feature extraction subsystem always
performs better than that of 2D -MFCC wit hout GA.
5. Conclusions
We have developed the 2D-MFCC feature extraction
method for processing the Bispectrum data from utter-
ance speech signal. In this paper, we have developed an
optimization of the filter design through GA method for
increasing the recognition capability of the system, espe-
cially for the uttered speech signal under addition Gaus-
sian noise. It is shown that the recognition rate of the
systems by using 2D-MFCC, with or without GA opti-
mization is comparable with that of 1D-MFCC for ut-
tered speech signal under normal conditions. However,
these recognition rates are decreased significantly when a
Gaussian noise is added to the uttered speech signal.
Further analysis shows that the 1st-coefficient of both
2D-MFCC and 1D-MFCC are largely influence by the
addition of the Gaussian noise, and by eliminating this
coefficient, the performance of the 2D-MFCC is greatly
change to show higher recognition rate, i.e. 70.5% and
88.5% for 2D-MFCC without GA and 2D-MFCC with
GA, respectively, compare with 59.4% for 1D-MFCC
method. Further analysis of these coefficients for the
system performance is now under investigation in order
Identification of Noisy Utterance Speech Signal using GA-Based Optimized 2D-MFCC
Method and a Bispectrum Analysis
Copyright © 2012 SciRes. JSEA
to develop more robust speaker identification system,
especial ly under harsh noise envir onmen ts .
6. Acknowled ge ment
The Authors would like to acknowledge the Universitas
Indonesia for funding this research. Part of this research
is also supported by Ministry of National Education of
Indonesia.
REFERENCES
[1] Z. Li, J. Sun, J. Han, f. Chu and Y. He, Parametric bis-
pectrum analysis of cracked rotor based on blind identifi-
cation of time series models, IEEE Proceeding of Intelli-
gent Control and Automation, Vol. 2, 2006,
pp.5729-5733.
[2] I. Jouny, E.D. Garber and R.L. Moses, Radar target iden-
tification using the bispectrum: a comparative study,
IEEE Trans. Aerospace and Electronic Systems, Vol. 31,
No. 1, 1995, pp. 69-77.
[3] E.S. Fonseca, R.C. Guido, A.C. Silvestre and J.C. Pereira,
Discrete wavelet transform and support vector machine
applied to pathological voice signals identification, IEEE
Proceeding of International Symposium on Multimedia,
2005
[4] Z. Wang and H. Wang, Voice identification system based
on ser ver, IEEE Proceeding of Intern. Conf. on Computer
Application and System Modeling, Vol. 9, 2010, pp.
384-387.
[5] M. Abdollahi, E. Valavi and H.A. Noubari, Voice-based
gender identification via multiresolution frame classifica-
tion of spectro-temporal maps, IEEE Proceeding of Intern.
Joint Conf. on Neural Networks, 200 9, pp . 1-4.
[6] T.D. Ganchev, Speaker Recognition, PhD Dissertation,
Wire Communications Laboratory, Department of Com-
puter and Electrical Engineering, University of Patras
Greece, 2005
[7] B. Kusumoputro, A. Triyanto, M.I. Fanany and W. Jat-
miko, Speaker identification in noisy environment using
bispectrum analysis and probabilistic neural networks,
IEEE Proceeding of Intern. Conf. on Computational In-
telligence and Multimedia Application, 2001, pp.118-123.
[8] C.L. Nikeas and A.P. Petropulu, Higher order spectra
analysis: A Nonlinear Signal Processing Framework,
Prentice-Hall, Inc. New Jersey, 1993.
[9] T.E. Ozkurt and T. Akgul, Robust text-independent
speaker identification using bispectrum slice, IEEE Pro-
ceeding of Signal Processing and Communications Ap-
plications, 2004, pp. 418-421.
[10] L. Luo and L.F. Chaparro, Parametric identification of
systems using a frequency slice of the bispectrum, IEEE
Proceeding of Intern. Conf. on Acoustic, Speech and
Signal Processing, Vol. 3, 199 1, pp. 3481-3484
[11] L. Rabiner. A Tutorial on Hidden Markov Model and
Selected Applications in Speech Recognition. Proceed-
ing IEEE, Vol 77 N o. 2. Fe br uary 1989.
[12] Cornaz, C. dan U. Hunkeler. An Automatic Speaker
Recognition System. Mini-Project.
http://www.ifp.uiuc.edu/~minhdo/teaching/speaker_recog
nition, access : August, 15, 2008.
[13] Zbigniew M. Genetic Algorithms + Data structures =
Evolution Programs, 3th Edition , Spr inger , 19 96.