Journal of Information Security, 2011, 2, 59-68
doi:10.4236/jis.2011.22006 Published Online April 2011 (h t tp : // ww w .scirp.o rg/journal /j is)
Copyright © 2011 SciRes. JIS
An Authentication Method for Digital Audio Using a
Discrete Wavelet Transform
Yasunari Yoshitomi, Taro Asada, Yohei Kinugawa, Masayoshi Tabuse
Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Kyoto, Japan
E-mail: yoshitomi@kpu.ac.jp
Received November 11, 2010; revised February 15, 2011; accepted February 26, 2011
Abstract
Recently, several digital watermarking techniques have been proposed for hiding data in the frequency do-
main of audio files in order to protect their copyrights. In general, there is a tradeoff between the quality of
watermarked audio and the tolerance of watermarks to signal processing methods, such as compression. In
previous research, we simultaneously improved the performance of both by developing a multipurpose opti-
mization problem for deciding the positions of watermarks in the frequency domain of audio data and ob-
taining a near-optimum solution to the problem. This solution was obtained using a wavelet transform and a
genetic algorithm. However, obtaining the near-optimum solution was very time consuming. To overcome
this issue essentially, we have developed an authentication method for digital audio using a discrete wavelet
transform. In contrast to digital watermarking, no additional information is inserted into the original audio by
the proposed method, and the audio is authenticated using features extracted by the wavelet transform and
characteristic coding in the proposed method. Accordingly, one can always use copyright-protected original
audio. The experimental results show that the method has high tolerance of authentication to all types of
MP3, AAC, and WMA compression. In addition, the processing time of the method is acceptable for every-
day use.
Keywords: Authentication, Audio, Copyright Protection, Tolerance to Compression, Wavelet Transforms
1. Introduction
Recent progress in digital media technology and distribu-
tion systems, such as the Internet and cellular phones, has
enabled consumers to easily access, copy, and modify di-
gital content, such as electric documents, images, audio,
and video. Therefore, techniques to protect the copyrights
for digital data and prevent unauthorized duplication or
tampering are urgently needed.
Digital watermarking (DW) is a promising method of
copyright protection for digital data. Several studies have
investigated audio DW [1-12]. Two important properties
of audio DW are inaud ibilit y o f DW-introd uced d istor tio n,
and robustness to signal processing methods, such as
compression. In addition to these properties, the data rate
and complexity o f the DW have attracted attention when
discussing the performance of a DW.
We have attempted to develop a method in which 1)
the DW can be sufficiently extracted from the watermar-
ked audio, even after compression, and 2) the quality of
the audio remains high after embedding the DW. How-
ever, there is generally a tradeoff between these two pro-
perties. Therefore, we focus on this tradeoff and attempt
to overcome this critical difficulty by optimizing the po-
sitions of the DW in the frequency domain. Recently, di-
gital audio distributed over the Internet and cellular phone
systems is often modified by compression, which is one
of the easiest and most effective ways to defeat a DW
without significantly deteriorating the quality of the au-
dio.
In previous research, we simultaneously improved both
the extraction performance of the DW and the quality of
the DW-contained audio by developing a multipurpose
optimization problem for deciding the positions of the
DW in the frequency domain and obtaining a near-opti-
mum solution for the problem using a discrete wavelet
transform (DWT) and a genetic algorithm (GA) for reali-
zing high tolerance to MP3 compression, which is the
most popular compression technique [13,14]. Our method
enabled us to embed the DW in an almost optimal man-
ner within any digital audio. However, obtaining the
near-optimum solution was very time consuming. In the
Y. YOSHITOMI ET AL.
Copyright © 2011 SciRes. JIS
60
present study, to overcome this issue essentially, we have
developed an authentication method for digital audio to
protect the copyrights. In contrast to the DW, no addi-
tional information is inserted into the original audio by
the proposed method, and the digital audio is authenti-
cated using features extracted using the DWT and char-
acteristic coding of the proposed method. This paper
presents an analysis of the performance of the method.
2. Wavelet Transform
The original audio data

0
k
s
, which is used as the level-0
wavelet decomposition coefficient sequence, where k
denotes the element number in the data, are decompo-
sed into the multi-resolution representation (MRR) and
the coarsest approximation by repeatedly applying the
DWT. The wavelet decomposition coefficient sequence

j
k
s
at level j is decomposed into two wavelet de-
composition coefficient sequences at level 1j
by
using (1) and (2):
 
12
j
j
knkn
n
s
ps
(1)
 
12
j
j
knkn
n
wqs
(2)
where k
p and k
q denote the scaling and wavelet se-
quences, respectively, and

1j
k
w denotes the develop-
ment coefficient at level 1j. The development coeffi-
cients at level J are obtained using (1) and (2) itera-
tively from0j to 1jJ. Figure 1 shows the pro-
cess of multi-resolution analysis by DWT.
In the present study, we use the Daubechies wavelet
for the DWT, according to the references [14,15]. As a
result, we obtain the following relation b etween k
p and
k
q:

1
1k
kk
qp
 (3)
We select the Daubechies wavelet because we com-
pared the results by the proposed method with those by
our retorted method [13,14], where the Daubechies wa-
velet was used for the DW.
Figure 1. Multi-resolution analysis by the DWT.
3. Proposed Authentication Algorithm
It is known that the histogram of the wavelet coefficients
of each domain of the MRR sequences has a distribution
which is centered at approximately 0 when the DWT is
performed on audio data, as shown in Figure 2 [13]. In
the present study, the above phenomenon is exploited for
the authentication of the audio signal. The procedure is
described below.
3.1. Selective Coding
3.1.1. Setting of Parameters
For the coding of the audio, we obtain the histogram of
the wavelet coefficients V at the selected level of an
MRR sequence (see Figure 3). Like the DW techniques
for images [15,16] and digital audio [13,14], we set the
following coding parameters:
The values of
Th minus and ()Th plus (see Fig-
ure 3) are chosen such that the non-positive wavelet co-
efficients (m
S in total frequency) are equally divided
into two groups by
Th minus, and th e positive wav elet
coefficients (
p
S in total frequency) are equally divided
into two groups by ()Th plus. Next, the values of 1T,
2T, 3T, and 4T, the parameters for controlling the
authentication precision, are chosen to satisfy the fol-
lowing conditions :
1)

12034TTh minusTTThplusT.
2) The value of 1T
S, the number of wavelet coeffi-
cients in
1,TThminus, is equal to 2T
S, the number
of wavelet coefficients in

,2Th minusT
. In short,
12TT
SS
.
3) The value of 3T
S, the number of wavelet coeffi-
cients in
3,TThplus
, is equal to 4T
S, the number
ofwavelet coefficients in


,4Th plusT. In short,
34TT
SS
.
Figure 2. Histogram of the wavelet coefficients of an MRR
sequence at level 3 (jazz) [13].
Y. YOSHITOMI ET AL.
Copyright © 2011 SciRes. JIS
61
Figure 3. Schematic diagram of the histogram of MRR wa-
velet coefficients.
4) 13TmT p
SS SS.
In the present study, the values of both 1Tm
SS and
1Tp
SS are set to 0.3, which was determined experi-
mentally.
3.1.2. Domain Segmentation in a Wavelet Coefficient
Histogram
In preparation of a coding for authentication, the proce-
dure separates the wavelet coefficients V of an MRR
sequence into five sets (hereinafter referred to as A, B, C,
D, and E), as shown in Figure 4, using the following
criteria:
,1
SC
A
VV VVT ,
,1 2
SC
BVVVTVT ,
,2 3
SC
CVVVT VT ,
,3 4
SC
DVVVTVT ,
,4
SC
EVVVT V ,
where SC
V is the set of wavelet coefficients in the ori-
ginal audio file.
3.1.3. Selective Coding Algorithm
The wavelet coefficients of an MRR sequence are coded
according to the following rules, in which i
V denotes
one of wavelet coefficients:
When i
VC, flag i
f is set to be 1, and bit i
b is
set to be 0.
When

i
VAE, flag i
f is set to be 1, and bit
i
b is set to be 1.
When

i
VBD , flag i
f is set to be 0, and bit
i
b is set to be 0.5.
For the authentication of the digital audio, we use a
Figure 4. Five sets (A, B, C, D, and E) are described by a
histogram of wavelet coefficients V of an MRR sequence for
the assignment of a bit.
code C (hereinafter referred to as an original code),
which is the sequence of i
b defined above. For the co-
ding and authentication, we assign a sequence number
and a flag for each wavelet coefficient. The flag 1
i
f
for a i
V means that the i
V is assigned a bit (i
b=0 or 1)
for a coding. The flag 0
i
f
for a i
V provides that the
i
V is not assigned a bit of 0 or 1: i
b is externally set to
be 0.5 as an arbitrary constant and the value of i
b does
not influence the performance of the proposed method
described in Section 3.2. The exclusion of all i
V be-
longing to the sets B and D, where the magnitude of the
i
V are intermediate, from the objects for coding is a no-
vel feature of the present study.
3.2. Authentication
3.2.1. Setting of Parameters
We authenticate not only an original digital audio file but
also a signal-processed version. Compression, one exam-
ple of signal processing, is often applied to digital audio
for the purposes of distribution via the Internet or for
saving on a computer. Through the same procedure as
described in Section 3.1, we applied the DWT to digital
audio and obtained a histogram of wavelet coeffcients
V
at the same level of the DWT as that of the coding
for the original audio file, which is described in Section
3.1. Then, we set the authentication parameters as fol-
lows:
The values of
Th minus
and

Th plus
(see Fig-
ure 5) are chosen such that the non-positive wavelet co-
efficients (m
S
in total frequency) are equally divided
into two groups by
Th minus
, and the positive wave-
let coefficients (
p
S
in total frequency) are equally di-
vided into two groups by

Th plus
. Next, the values of
1T
, 2T
, 3T
, and 4T
, the parameters for control-
ling the authentication precision, are chosen to satisfy the
Y. YOSHITOMI ET AL.
Copyright © 2011 SciRes. JIS
62
following conditions:
1)
 
12034TTh minusTTThplusT
 

.
2) The value of 1T
S, the number of wavelet coeffi-
cients in


1,TTh minus
 , is equal to 2T
S, the number
of wavelet coefficients in

,2Th minusT

. In short,
12TT
SS

.
3) The value of 3T
S, the number of wavelet coeffici-
ents in

3,TThplus

, is equal to 4T
S, the number of
wavelet coefficients in


,4Thplus T
. In short,
34TT
SS

.
4) 13TmT p
SS SS
 
.
In the present study, the values of both 1Tm
SS

and
3Tp
SS

are set to be 0.3, the same as the settings used
for the coding for the original audio file, which is de-
scribed in Section 3.1.
3.2.2. Domain Segmentation in a Wavelet Coefficient
Histogram
In the preparation of a coding for authentication, the pro-
cedure separates the wavelet coefficients V of an MRR
sequence into three sets (hereinafter referred to as F, G,
and H), as shown in Figure 5, using the following crite-
ria:

,
AC
F
VVVVTh minus

 ,
 
,
AC
GV VVThminusVThplus
 
 ,

,
AC
H
VVVTh plusV
 
 ,
where AC
V is the set of wavelet coefficients of the a
target audio file for making the code for authentication.
Figure 5. Three sets (F, G, and H), indicated on the histo-
gram, of MRR wavelet coefficients used for the authentica-
tion.
3.2.3. Authentication Algorithm
The wavelet coefficients of an MRR sequence are coded
according to the following rules, in which i
V
denotes
one of wavelet coefficients:
When 1
i
f
and i
VG
, bit i
b is set to be 0.
When 1
i
f
and
i
VFH
 , bit i
bis set to be 1.
When 0
i
f
, bit i
b
is set to be 0.5.
When 0
i
f
, i
b
is externally set to be 0.5 as an ar-
bitrary constant and the value of i
b does not influence
the performance of the proposed method described be-
low.
For the authentication of the digital audio, we use the
code
C (hereinafter referred to as an authentication
code), which is the sequence of i
b defined above. The
authentication ratio
A
R (%) is defined by the follow-
ing:

1
1
100 1
N
iii
iN
i
i
f
bb
AR
f

(4)
where N is the number of wavelet coefficients assign-
ed flags in the coding for the original audio file, which is
described in Section 3.1. According to (4), the values of
neither i
b nor i
b
influence the value of
A
R in the
case that 0
i
f
, which provides that the corresponding
i
V
is not assigned a bit of 0 or 1 for the coding for the
original audio file.
For using the proposed method, we need store flags
i
f
and an original code C of each audio file whose
copy right we should protect. In calculating (4) for the
authentication of an original audio file, we do not use an
original audio file but the flags i
f
and the original code
C of the original audio file.
4. Experiment
In this section, we describe computer experiments and
their results for evaluating the performance of the propo-
sed method.
4.1. Method
The experiment was performed in the following compu-
tational environment: the personal computer was a Dell
Dimension 8300 (CPU: Pentium IV 3.2 GHz; main me-
mory: 2.0 GB); the OS was Microsoft Windows XP; the
development language was Microsoft Visual C++ 6.0.
Five music audio files, which were composed of the
first entries in the five genre categories: classical, jazz,
popular, rock, and hiphop in the music database RWC
for research purpose [17], were copied from CDs onto a
personal computer as WAVE files with the following
specifications: 44.1 kHz, 16 bits, and monaural. For each
Y. YOSHITOMI ET AL.
Copyright © 2011 SciRes. JIS
63
music audio file selected from the database, one 10-se-
cond clip of music audio (hereinafter referred to as an
original music audio clip) was extracted starting at 1 mi-
nute from the beginning of the audio file and saved on a
personal computer. In addition, for each of the five music
audio files mentioned above, several 10-second audio
clips were extracted by shifting the start-time 1 second at
a time from the beginning of the audio file and were
saved on a personal computer for use in evaluating the
performance of the proposed method. For the purpose of
evaluating the tolerance of authentication to compression,
MP3, AAC, and WMA compression systems were each
used to compress the original music audio clip to bit
rates of 64, 96, and 128 kbp s. The process of the experi-
ment was as follows: obtain the code of the original
WAVE file, compress the file by MP3, AAC, and WMA,
and then convert the compressed files into the WAVE
files used for the authentication.
For the DWT, we use Daubechies wavelets. Level 8
was chosen as the standard for the DWT based on an
analysis of preliminary experiments, and this level was
used for most of the experiments. The influence of the
level of the DWT on the authentication ratio was also
analyzed as part of the experiments.
Instead of the Dell Dimension 8300 (CPU: Pentium IV
3.2 GHz; main memory: 2.0 GB), Dell Dimension
DXC051 (CPU: Pentium IV 3.0 GHz, memory: 1.0 GB)
is used only for the comparison with the reported study
[13,14].
4.2. Results and Discussion
4.2.1. Au thenticatio n Pr ocess
Table 1 illustrates the process of authentication of audio
clips. The jumps in the wavelet coefficient number, such
as from 573 to 578, indicate that the intervening wavelet
coefficients belong to either the set B or D, which are out
of assignment of a bit to 0 or 1 for the coding for the ori-
ginal music audio clip. The authentication ratio
A
R de-
fined by (4) was

6 7100 in the case of Table 1,
where the bits of the music audio clip after MP3 com-
pression were equal to those of the original music except
for wavelet coefficient number 57 9.
4.2.2. Robustness to Comp ression
Whenever we applied the proposed method to the five
original music audio clips, the authentication ratio was
100%. When we applied it to several music clips com-
pressed by MP3, AAC, and WMA, the authentication
ratio was at or near 100% (Table 2).
4.2.3. Authenticatio n Ra ti os for Other
Non-Signal-Processed Music Audio Clips
The purpose of authentication is to protect the copyright
Table 1. Authentication process (hiphop).
Wavelet
coeffi-
cients No.Original After MP3
compression
(128kbps)
Bit
correspondence
set bit set bit
572 C 0 G 0 Yes
579 C 0 F 1 No
580 C 0 G 0 Yes
584 C 0 G 0 Yes
588 E 1 H 1 Yes
589 C 0 G 0 Yes
590 C 0 G 0 Yes
Table 2. Authentication tolerance to compression (%).
Compression
MethodBit rate
(kbps) ClassicalJazz Popular RockHiphop
128 99.86 99.86 99.86 100 99.57
96 99.86 100 99.86 100 99.86
MP3
64 99.72 99.72 99.86 100 98.28
128 100 100 100 100 97.55
96 99.86 100 99.86 100 100
AAC
64 100 100 99.14 99.7299.00
128 100 100 100 100 100
96 100 100 100 100 100
WMA
64 100 99.86 100 100 100
(%)
on audio data. When the music audio file targeted for
being authenticated was different from that used for ma-
king the code of the original music audio clip, the au-
thentication ratios
A
R defined by (4) were about 50%
(more precisely, they fell in the range 44.09 to 55.62%),
which was about half of the authentications ratios when
authenticating the same clip as the original music audio
clip (100% in all cases in this experiment; see Table 3).
An authentication ratio of 50% corresponds to the value
in the case that randomly generated bits are used for i
b
and/or i
b
in (4). Accor ding ly, th e pr oposed meth od dis-
tinguishes an original music audio clip from each of the
other four used in this experiment.
Using the original code obtained from the original mu-
sic audio clip, which was the 10-second clip extracted
staring at 1 minute from the beginning of each of the five
music audio files, we calculated the authentication ratio
to the 10-second clips extracted by shifting the start-time
for the clip 1 second at a time. For each of the original
music audio clips, the authentication ratio was 100%
when an original code was used as the authentication
code (Fi g u re 6 ). In Figure 6, the point 60 seconds on the
horizontal axis corresponds to the case that the original
code is used as the authentication code. Not including
these cases, the authentication ratio for jazz, popular, and
rock music audio fell mostly in the 40 to 60% range. In
contrast, the authentication ratio for classical and hiphop
varied according to the start time. Not including the case
of using the original code for the authentication code, the
highest authentication ratio was 93.95%, which was ob-
Y. YOSHITOMI ET AL.
Copyright © 2011 SciRes. JIS
64
Table 3. Authentication ratio (%) in all combinations of
original and authentication.
Original
Clas-
sical Jazz Popular RockHi-
phop
Classical 100 44.2247.98 52.8953.61
Jazz 55.62 100 49.86 51.0146.54
Popular 44.09 53.89100 49.5752.88
Rock 47.69 50.8649.42 100 48.13
Authenti-
cation
Hiphop 48.41 50.4347.55 51.44100
()
Figure 6. Authentication ratios using clips shifted 1 second
at a time for each of the five selected audio clips.
served for hiphop. Accordingly, the threshold of the au-
thentication ratio for judging authentication of an origi-
nal music audio clip should be about 95%. As the au-
thentication ratios to music clips extracted from the mu-
sic audio files, from which the original music audio clip
were obtained, stayed under 95% (again, excluding the
cases of using the identical clip), we conclude that the
probability of getting an authentication ratio above 95%
would be small if we applied the proposed method to
other music selected from the database. In other words,
we propose that music audio be judged as authenticated
when the file gives an authentication ratio of 95% or
higher for a certain clip taken from a music audio file.
When we used 95% as a threshold for the authentication
ratio, both the false negative and positive rates for the
authentication of the music audio clip were zero in the
both cases shown in Table 3 and Figure 6.
4.2.4. Influence of DWT Level on Authentication Ratio
All authentication ratios described above were obtained
using a DWT at level 8. The tolerances of the authentica-
tion ratio to signal processing by MP3, AAC, and WMA
at DWT levels of 2 to 8 with bit rates of 128, 96, and 64
kbps are shown for each bit rate in Tables 4-6, respecti-
vely. The authentication ratio does not noticeably change
at bit rates of from 64 to 128 kbps. The authentication
ratio tends to be slightly higher with increases in the
DWT level of the original coding, which is the same as
that of the authentication coding. For DWT levels of 7 or
8, the authentication ratio exceeds 95% for all settings of
MP3, AAC, and WMA compression tested. The lowest
authentication ratio, 94.57%, occurred for DWT level 6
applied to the hiphop audio clip compressed by AAC
with a bit rate of 128 kbps. The number of data of the
original music audio clip, which is treated as the amount
of data at DWT level 0, was 441,000. The number of
wavelet coefficients of MRR sequences was reduced by
half for an increase of DWT level by one, meaning that
the number of 0 or 1 bits in both the original and the au-
thentication coding was also reduced by half.
4.2.5. Comparison with Watermarking
There is generally a tradeoff between 1) the tolerance of
the DW to signal processing, such as compression, and 2)
the quality of the music audio after embedding the DW.
In other words, to improve the first property tends to cau-
se a deterioration of the second property. We had over-
come this critical difficulty of the DW by optimizing the
positions of the DW in the frequency domain [13,14],
[18-20]. However, it took much time to get the condition
for embedding the DW by the reported method.
Figure 7 shows the relationship between the quality of
music audio and the detection rate of the DW after MP3
compression, using the jazz clip as the original music au-
Y. YOSHITOMI ET AL.
Copyright © 2011 SciRes. JIS
65
Table 4. Authentication ratio (%) of music audio com-
pressed by MP3, AAC, and WMA at DWT levels of 2 to 8
with a bit rate of 128 kbps.
(1) Classical
Signal processing
DWT level MP3 AAC WMA
2 99.07 99.41 99.99
3 99.67 99.69 100
4 99.99 99.9 100
5 99.98 99.98 100
6 100 100 100
7 100 100 100
8 99.86 100 100
(2) Jazz
Signal processing
DWT level MP3 AAC WMA
2 98.63 99.33 99.95
3 99.64 99.8 100
4 99.97 99.99 100
5 100 100 99.98
6 100 100 100
7 100 100 99.93
8 99.86 100 100
(3) Popular
Signal processing
DWT level MP3 AAC WMA
2 92.78 95.08 99.95
3 95.05 98.5 100
4 97.37 99.7 99.99
5 98.53 99.98 100
6 99.71 100 100
7 100 100 100
8 99.86 100 100
(4) Rock
Signal processing
DWT level MP3 AAC WMA
2 94.2 95.88 99.81
3 96.72 98.79 99.98
4 98.89 99.85 100
5 99.64 99.98 100
6 100 100 99.96
7 100 100 100
8 100 100 100
(5) Hiphop
Signal processing
DWT level MP3 AAC WMA
2 95.65 61.31 99.6
3 96.27 72.89 99.69
4 96.85 84.3 99.69
5 98.19 91.64 99.89
6 99.67 94.57 100
7 99.93 96.67 100
8 99.57 97.55 100
(%)
Table 5. Authentication ratio (%) of music audio com-
pressed by MP3, AAC, and WMA at DWT levels of 2 to 8
with a bit rate of 96 kbps.
(1) Classical
Signal processing
DWT level MP3 AAC WMA
2 95.95 98.76 99.96
3 96.92 99.08 99.99
4 97.89 99.48 100
5 98.53 99.64 100
6 99.53 99.78 100
7 99.71 100 100
8 99.71 99.86 100
(2) Jazz
Signal processing
DWT level MP3 AAC WMA
2 96.48 98.62 99.97
3 98.42 99.39 100
4 99.79 99.95 100
5 100 99.98 99.98
6 100 100 100
7 100 100 99.93
8 100 99.86 100
(3) Popular
Signal processing
DWT level MP3 AAC WMA
2 85.61 89.7 99.62
3 94.74 94.7 99.9
4 97.43 98.4 99.85
5 98.79 99.42 99.98
6 99.89 100 100
7 99.86 100 100
8 99.86 100 100
(4) Rock
Signal processing
DWT level MP3 AAC WMA
2 89.79 92.56 99.33
3 95.37 96.3 99.78
4 98.58 98.94 99.98
5 99.69 99.8 100
6 99.96 99.89 99.96
7 100 100 100
8 100 100 100
(5) Hiphop
Signal processing
DWT level MP3 AAC WMA
2 92.76 94.86 99.46
3 94.13 96.16 99.54
4 95.38 97.43 99.6
5 97.01 98.84 99.84
6 98.91 99.67 100
7 99.57 99.93 100
8 99.86 100 100
(%)
Y. YOSHITOMI ET AL.
Copyright © 2011 SciRes. JIS
66
Table 6. Authentication ratio (%) of music audio com-
pressed by MP3, AAC, and WMA at DWT levels of 2 to 8
with a bit rate of 64 kbps.
(1) Classical
Signal processing
DWT level MP3 AAC WMA
2 98.18 98.08 99.88
3 98.87 98.51 99.96
4 99.54 99 99.98
5 99.91 98.73 100
6 99.96 99.13 100
7 99.93 99.42 100
8 99.86 100 100
(2) Jazz
Signal processing
DWT level MP3 AAC WMA
2 92.41 97.23 99.85
3 95.45 98.45 100
4 98.39 99.54 100
5 99.93 99.95 99.98
6 100 99.89 100
7 100 99.86 99.93
8 99.71 100 100
(3) Popular
Signal processing
DWT level MP3 AAC WMA
2 73.87 81.86 96.73
3 87.94 90.43 99.44
4 95.21 96.02 99.42
5 98.26 97.77 99.84
6 99.78 98.8 100
7 100 99.57 100
8 99.86 99.14 100
(4) Rock
Signal processing
DWT level MP3 AAC WMA
2 82.67 88.43 96.12
3 91.55 93.67 99.14
4 97.09 97.56 99.69
5 99.4 98.91 99.95
6 99.93 99.17 99.96
7 100 99.35 100
8 100 99.71 100
(5) Hiphop
Signal processing
DWT level MP3 AAC WMA
2 82.13 92.95 97.86
3 86 94.35 98.85
4 89.31 95.82 98.78
5 92.62 96.7 99.31
6 97.43 97.9 99.86
7 98.48 98.26 99.86
8 98.27 98.99 99.86
(%)
Figure 7. Relationship between sound quality after embed-
ding the DW and detection rate of the DW [13].
dio clip and 96-kbps MP3 compression [13]. The same
original music audio clip was also used in the present ex-
periment. In order to achieve a high detection rate of the
DW and high quality of the original music audio clip af-
ter embedding the DW, the reported method using a ge-
netic algorithm was effective, as shown in Figure 7. In
the present study, the authentication ratio for the same
original music audio clip as that used for getting the re-
sults of Figure 7 was 100%, and a deterioration in the
quality of the original music audio clip did not occur,
which corresponds to an infinite value on the horizontal
axis shown in Figure 7.
Moreover, it took 2.41 104 to 3.20 104 s and 1.59
102 to 1.85 103 s (with the personal computer referred
to as PC2), respectively, to embed the DW using as the
formula of the optimization problem the original problem
and the partial problem (which had a much smaller sear-
ch space) [14], while it took 2.05 101 to 2.10 101 s
(with the personal computer referred to as PC2) and 2.03
101 to 2.19 101 s (with the personal computer re-
ferred to as PC1) for one coding for an original music
audio clip in the present study (Tab le 7). In th e reported
study [13,14], the experiment was performed in the fol-
lowing computation environment: the personal computer
was a Dell Dimension DXC051 (CPU: Pentium IV 3.0
GHz; main memory: 1.0 GB), which is referred to as
PC2 in Table 7; the OS was Microsoft Windows XP; the
development language was Microsoft Visual C++ 6.0.
The average time for one coding for an original music
audio clip was less than 105 times that to embed the DW
using as the formula of the optimization problem the
original problem, and less than 103 times that to embed
the DW using as the formula of the optimization problem
the partial problem in the reported study. In addition, no
deterioration in quality of the original music audio clip
ever occurred using the proposed method. These two
factors strongly suggest that the proposed method is far
superior to th e re ported method.
Y. YOSHITOMI ET AL.
Copyright © 2011 SciRes. JIS
67
Table 7. Comparison of time(s) to obtain a coding of the
proposed method or to embed the DW using as the formula
of the optimization problem the original problem and the
partial problem of the reported study [14].
DW
Coding Original
problem Partial
problem
PC1 PC2 PC2 PC2
Classical 2.19 × 101 2.05 × 101 3.20 × 104 1.04 × 103
Jazz 2.03 × 1 0 1 2.06 × 101 2.52 × 104 5.12 × 102
Popular 2.19 × 101 2.10 × 101 2.41 × 104 1.85 × 103
Rock 2.19 × 101 2.08 × 101 2.44 × 104 2.02 × 1 02
HipHop 2.19 × 101 2.08 × 101 2.78 × 104 1.59 × 102
Average 2.16 × 101 2.07 × 101 2.67 × 104 7.5 3 × 102
PC1: Dell Dimension 8300 (CPU: Pentium IV 3.2 GHz; main memory: 2.0
GB); PC2: Dell Dimension DXC051 (CPU: Pentium IV 3.0GHz; main
memory: 1.0 GB).
5. Conclusions
We have developed an authentication method for music
audio using a DWT. When we applied this method to five
original music audio clips, the authentication ratio was
100%. Moreover, for music audio data compressed by
MP3, AAC, or WMA, the authentication ratio was always
at or near 100%. We used flags for distinguishing the
wavelet coefficients used for storing a 0 or 1 bit of the
original and authentication coding from other coefficien-
ts. The method never deteriorated the quality of the orig-
inal music audio because it does not change it. When a
level 8 DWT was used, which was the standard in this
experiment, the mean time for the coding for the original
music audio clip was 1
2.16 10
s and that for the au-
thentication was 1
2.22 10
s for a 10-second original
music audio clip. We propose that a music audio file
should be judged to be authenticated when the file gives
a 95% or higher authentication ratio for a certain clip
taken from the music audio file.
For using the proposed method, we need to store in a
data base 1) flags used for selective coding, and 2) an
original code of each audio file whose copy right we
should protect. In calculating the authentication ratio for
the authentication of an original audio file, we do not
need an original audio file but 1) the flags, and 2) the
original code of the original audio file.
6. References
[1] D. Kirovski and H. S. Malvar, “Spread-Spectrum Water-
marking of Audio Signals,” IEEE Transactions on Signal
Processing, Vol. 51, No. 4, 2003, pp. 1020-1033.
doi:10.1109/TSP.2003.809384
[2] K. Yeo and H. J. Kim, “Modified Patchwork Algorithm:
A Novel Audio Watermarking Scheme,” IEEE Transac-
tions on Speech and Audio Processing, Vol. 11, No. 4,
2003, pp. 381-386.
[3] S. Wu, J. Huang, D. Huang and Y. Q. Shi, “Efficiently
Self-synchronized Audio Watermarking for Assured Au-
dio Data Transmission,” IEEE Transactions on Broad-
casting, Vol. 51, No. 1, 2005, pp. 69-76.
doi:10.1109/TBC.2004.838265
[4] X. Y. Wang and H. Zhao, “A Novel Synchronization
Invariant Audio Watermarking Scheme Based on DWT
and DCT,” IEEE Transactions on Signal Processing, Vol.
54, No. 12, 2006, pp. 4835-4840.
doi:10.1109/TSP.2006.881258
[5] S. Xiang and J. Huang, “Histogram-based Audio Water-
marking against Time-Scale Modification and Cropping
Attacks,” IEEE Transactions on Multimedia, Vol. 9, No.
7, November 2007, pp. 1357-1372.
doi:10.1109/TMM.2007.906580
[6] S. Kirbiz, A. N. Lemma, M. U. Celik and S. Katzenbeis-
ser, “Decode-Time Forensic Watermarking of AAC Bit-
streams,” IEEE Transactions on Information Forensics
and Security, Vol. 2, No. 4, 2007, pp. 683-696.
doi:10.1109/TIFS.2007.908194
[7] D. J. Coumou and G. Sharma, “Insertion, Deletion Codes
with Feature-Based Embedding: A New Paradigm for
Watermark Synchronization with Applications to Speech
Watermarking,” IEEE Transactions on Information Fo-
rensics and Se curity, Vol. 3, No. 2 , 2008, pp. 153-165.
doi:10.1109/TIFS.2008.920728
[8] S. Xianga, H. J. Kimb and J. Huanga, “Audio Water-
marking Robust against Time-Scale Modification and
MP3 Compression,” Signal Processing, Vol. 88, No. 10,
2008, pp. 2372-2387.
doi:10.1016/j.sigpro.2008.03.019
[9] X. Y. Wang, P. P. Niu and H. Y. Yang, “A Robust, Digi-
tal-audio Watermarking Method,” IEEE Multimedia, Vol.
16, No. 3, 2009, pp. 60- 69.
doi:10.1109/MMUL.2009.44
[10] N. K. Kalantari, M. A. Akhaee, S. M. Ahadi and H.
Amindavar, “Robust Multiplicative Patchwork Method
for Audio Watermarking,” IEEE Transactions on Audio,
Speech and Language Processing, Vol. 17, No. 6, 2009,
pp. 1133-1141.
[11] X. Y. Wanga, P. P. Niub and H. Y. Yangb, “A Robust
Digital Audio Watermarking Based on Statistics Charac-
teristics,” Pattern Recognition, Vol. 42, No. 11, 2009, pp.
3057-3064.
doi:10.1016/j.patcog.2009.01.015
[12] K. Yamamoto and M. Iwakiri, “Real-Time Audio Wa-
termarking Based on Characteristics of PCM in Digital
Instrument,” Journal of Information Hiding and Multime-
dia Signal Processing, Vol. 1, No. 2, 2010, pp . 59-71.
[13] S. Murata, Y. Yoshitomi and H. Ishii, “Optimization of
Embedding Position in an Audio Watermarking Method
Using Wavelet Transform,” Autumn Research Presenta-
tion Forums of ORSJ, Japanese, October 2007, pp.
210-211.
[14] S. Murata, “Optimization of Embedding Position in an
Audio Watermarking Method Using Wavelet Trans-
form,” Master’s Thesis, Osaka University, Suita, Japa-
nese, 2006, pp. 53.
Y. YOSHITOMI ET AL.
Copyright © 2011 SciRes. JIS
68
[15] D. Inoue and Y. Yoshitomi, “Watermarking Using
Wavelet Transform and Genetic Algorithm for Realizing
High Tolerance to Image Compression,” Journal of the
IIEEJ, Vol. 38, No. 2, March 2009, pp. 136-144.
[16] M. Shino, Y. Choi and K. Aizawa, “Wavelet Domain
Digital Watermarking Based on Threshold-Variable De-
cision,” Technical Report of IEICE, DSP2000-86, Japa-
nese, Vol. 100, No. 325, September 2000, pp. 29-34.
[17] M. Goto, H. Hashiguchi, T. Nishimura and R. Oka,
“RWC Music Database: Database of Copyright-Cleared
Musical Pieces and Instrument Sounds for Research Pur-
poses,” Transactions of IPSJ, Japanese, Vol. 45, No. 3,
March 2004, pp. 728-738.
[18] M. Tanaka and Y. Yoshitomi, “Optimization Problem for
Embedding Position in an Audio Watermarking Based on
Logarithmic Amplitude Modification for Realizing High
Tolerance to MP3 Compression,” Autumn Research
Presentation Forums of ORSJ, Japanese, September 2006,
pp. 70-71.
[19] M. Tanaka and Y. Yoshitomi, “Digital Audio Water-
marking Method with MP3 Tolerance Using Genetic Al-
gorithm,” Proceedings of 11th Czech-Japan Seminar on
Data Analysis and Decision Making Under Uncertainty,
Sendai, September 2008, pp. 81-85.
[20] R. Tachibana, “Capacity Analysis of Audio Watermark-
ing Based on Logarithmic Amplitude Modification
against Additive Noise,” IEICE Transactions on Funda-
mentals of Electronics, Communications and Computer
Sciences, Japanese, Vol. J86-A, No. 11, November 2003,
pp. 1197-1206.