Audio Watermarking Using Wavelet Transform and Genetic Algorithm for Realizing High Tolerance to MP3 Compression

doi:10.4236/jis.2011.23010

Paper Menu >>

Journal Menu >>

Journal of Information Security, 2011, 2, 99-112

doi:10.4236/jis.2011.23010 Published Online July 2011 (http://www.SciRP.org/journal/jis)

Audio Watermarking Using Wavelet Tr ansform and

Genetic Algorithm for Realizing High Tolerance to MP3

Compression

Shinichi Murata1, Yasunari Yoshitomi2, Hiroaki Ishii3

1Panasonic Corporation, Kadoma, Osaka, Japan

2Graduate Sc h ool of Lif e an d Envi ronmental Scien ces, Kyoto Prefectural University, Kyoto, Japan

3School of Science and Techn ol o gy , Kwansei Gakuin University, Sanda, Hyogo, Japan

E-mail: yoshitomi@kpu.ac.jp

Received April 12, 2011; revised June 8, 2011; accept ed J u ne 20, 2011

Abstract

Recently, several digital watermarking techniques have been proposed for hiding data in the frequency do-

main of audio signals to protect the copyrights. However, little attention has been given to the optimal posi-

tion in the frequency domain for embedding watermarks. In general, there is a tradeoff between the quality of

the watermarked audio and the tolerance of watermarks to signal processing methods, such as compression.

In the present study, a watermarking method developed for a visual image by using a wavelet transform was

applied to an audio clip. We also improved the performance of both the quality of the watermarked audio and

the extraction of watermarks after compression by the MP3 technique. To accomplish this, we created a mul-

tipurpose optimization problem for deciding the positions of watermarks in the frequency domain and ob-

taining a near-optimum solution. The near-optimum solution is obtained by using a genetic algorithm. The

experimental results show that the proposed method generates watermarked audios of good quality and high

tolerance to MP3 compression. In addition, the security was improved by using the characteristic secret key

to embed and extract the watermark information.

Keywords: Audio Watermarking, Genetic Algorithm, Optimization, Wavelet Transforms, Secret Key

1. Introduction

Recent progress in digital media and digital distribution

systems, such as the Internet and cellular phones, has

enabled us to easily access, copy, and modify digital con-

tent, such as electric documents, images, audio, and

video. Under these circumstances, techniques to protect

the copyrights of digital data and to prevent unauthorized

duplication or tampering of these data are strongly de-

sired.

Digital watermarking (DW) is a promising method for

the copyright protection of digital data. Several studies

have investigated audio DW [1-12]. Currently, digital

audio clips distributed ov er the Internet or cellular phone

systems are often modified by compression, which is one

of the easiest and most effective ways to overcome DW

without significantly deteriorating the quality of the au-

dio. Two important properties in audio DW are the in-

audibility of the distortion due to DW, and the robustness

against signal processing methods, such as compression.

In addition to these properties, the data rate and the com-

plexity of DW have attracted attention when discussing

the performance of DW.

We developed a method in which 1) a digital water-

mark can be sufficiently extracted from watermarked

audio, even after compression, and 2) the quality of the

audio remains high after embedding the digital water-

mark. However, there generally is a trade-off relation

between these two properties.

In the present study, we improved both the extraction

of digital watermarks and the quality of the watermarked

audio by developing a multipurpose optimization prob-

lem for deciding the positions of digital watermarks in

the frequency domain and obtaining a near-optimum

solution by using a discrete wavelet transform (DWT)

and a genetic algorithm (GA) [13,14] for realizing high

tolerance to compression by MP3, which is the most

popular compression technique. The proposed method

100 S. MURATA ET AL.

enables us to embed digital watermarks in a near-opti-

mum manner for each audio file. In addition, the security

of the watermarked audio is improved by using a char-

acteristic secret key to embed and extract digital water-

marks.

2. Wavelet Transform

Original audio data (0)

is used as the level-0 wavelet

decomposition coefficient sequence, where denotes

the element number in the data. The data is decomposed

into the multi-resolution representation (MRR) and the

coarsest approximation by repeatedly applying a DWT.

The wavelet decomposition coefficient sequence

()

level is decomposed into two wavelet decomposition

coefficient sequences at level by (1) and (2):

j1j

(1) ()

knk





, (1)

(1) ()

knk





n

s, (2)

where 2nk

and 2nk

denote the scaling and wavelet

sequences, respectively, and denotes the devel-

opment coefficient at level

pq(1)j

w



. The development co-

efficients at level

are obtained by using (1) and (2)

iteratively from to

0j1jJ



. Figure 1 shows the

process of a multi-resolution analysis by DWT.

The signal is re-composed by using (3) repeatedly

from to .

1jJ 0j

(1)() ()

nnkknk

spsqw













j

(3)

In the present study, we use the Daubechies wavelet

for DWT. As a result, we obtain the following relation

between and :

2nk

p2nk

q



q － (4)

3. Wavelet Domain Digital Watermarking

Based on Threshold-Variable Decision

It is known that the histogram of the wavelet coefficients

of each domain of MRR sequences has a distribution that

Figure 1. Multi-resolution analysis by the DWT.

is centered at approximately 0 when DWT is performed

on a natural visual image [15]. For an audio clip, we also

found the same phenomena. Figure 2 shows an example

of an audio histogram.

In the present research, the technique [15] for exploit-

ing the above phenomena on a natural image for embed-

ding a digital watermark on the wavelet coefficients of

MRR sequences is applied to audio DW. The procedure

is described below.

3.1. Embedment of Watermark Information

3.1.1. Setti ng of Parameter s

For the watermarking of an audio clip, we obtain the

histogram of the wavelet coefficients at the selected

level of MRR sequences. Figure 3 shows a schematic

diagram of the histogram of the wavelet coefficients

of an MRR sequence. As with the DW techniques for

images [15,16], we set the following watermarking pa-

rameters:

The values of and (see Figure

3) are chosen such that the non-positive wavelet coeffi-

cients (m in total frequency) are equally divided into

two groups by , and the positive wavelet

coefficients (

(minus)Th

(plus)Th

S in total frequency) are equally divided

into two groups by . Next, the values of ,

, , and , which are the parameters for con-

trolling the embedment strength, are cho sen to satisfy the

following conditions :

(plus)Th

41T

2T3TT

1) 1 (minus)203(plus)4TThT TThT





2) The value of , the number of wavelet coeffi-

cients in , is equal to , the number

of wavelet coefficients in [( . In short,

minus(1, ())TTh 2T

us),Tmin2)Th

12TT



3) The value of , the number of wavelet coeffi-

cients in , is equal to , the number of

lus)]

(3,(pTTh 4T

Figure 2. Histogram of the wavelet coefficients of an MRR

sequence at level 3 (jazz).

S. MURATA ET AL.

101

Figure 3. Schematic diagram of the histogram of MRR

wavelet coefficients.

wavelet coefficients in (. In short, (plus), 4)Th T3T



13TmT p

SS SS.

In the present study, the values of both 1Tm

SS and

3Tp

SS are set to 0.2, which was determined experi-

mentally.

3.1.2. Embedment of Watermark Information

The wavelet coefficients of MRR are rewritten according

to the following rules in embedding digital watermark.

Here, deno tes one of the wavelet coefficients.

1) In the case that bit in watermark W is 0,

when , is changed to .

VT3VTi

V2T3Twhen , is changed to .

i2TV

Twhen , is kept.

2) In the case that bit in watermark W is 1,

when , is changed to .

TV

0VT i

4V1T4.Twhen , is changed to

i1VTi

Vwhen or , is kept.

i i

The wavelet coefficient i

V is set in the range of

when bit i in watermark W is 0,

whereas the DWT coefficient is set in the range of

or V when bit in watermark W is 1.

The frequency of the change of i toward the inside is

expected to be approximately equal to the change toward

the outside when the number of 0 bits is approximately

the same as the number of 1 bits.

iTV

TVT

VTi

4T

3.1.3. Generation of Water m arked Audio

The inverse DWT (IDWT) is performed to wavelet coef-

ficients embedded with the watermark to obtain the

audio with the watermark.

V

3.2. Detection of the Watermark

3.2.1. Presumption of Parameters

The watermarked audio, which may be modified by sig-

nal processing methods, such as MP3 compression, is

converted into wavelet coefficients. The wavelet coeffi-

cient in the region where the watermark information is

embedded is denoted as V



.

For the histogram of V



, the two parameters (minus)Th



and (plus)Th



, which correspond to Th and

for the histogram of V before embedding

the watermark, are obtained in the same manner as that

for and , mentioned in Section

3.1.1.

(minus)

Th lus)

(minus)

Th (minuTh(plus)Th



and Th (plus)



can be used as pre-

sumptive values for and Th , respec-

tively, because the distribution of the histogram of

(mTh inus)(plus) V



is expected to be approximately the same as that of

before embedding the watermark.

VThe watermarked audio can undergo certain types of

audio processing, including compression, such that the

difference between the distribution of the histogram of



 after audio processing and that of V before embed-

ding the watermark is not negligible. In such a case, it

may not be persuasive that and

(minus)Th(plus)Th



can be used as presumptive values for and

, respectively. (mTh inus)

Th lus)

3.2.2. Detection of Watermark Information

When the wavelet coefficient i

V is in the range of



(minus) (plus)

ThV Th



 



, the corresponding bit i



in measured watermark is judged to be 0. When

the DWT coefficient is in the range of



i

(minus)

VTh







or i(plus)VTh







W, the corresponding

bit



 in the measured watermark is judged to

be 1.



The detection rate (%) is defined as the per-

centage of correspondence between the bit i in wa-

termark W and the corresponding bit in measured

watermark

()dxW

W



4. Use of the Secret Key

When we embed a digital watermark by using the partial

problem described in Section 5, the watermark is pro-

duced by using a secret key ()S



, which is composed of

a row of



integers randomly selected once or less per

integer in the integer range from 1 to , as shown in

the example below N

(400)





, where



is the number

of bits of the watermark and is the total number of

wavelet coefficients that are candidates embedded with

DW.

102 S. MURATA ET AL.

(400)(271, 72, 39, 990, 524, 88, , 1011, 688, 312)S

(5)

Each value and order of numbers in ()S



indicate

the position of each bit of the digital watermark in the

DW region. Here, the position in the region is expressed

as a one-dimensional coordinate. For example, the first

number, 271, and the second number, 72, in

mean that the first and second bits of the watermark are

set for the wavelet coefficients at the coordinates of 271

and 72 in the DW region, respectively.

(400)S

Figure 4. The DW positions of wavelet coefficients decided

by secret key. S(4) = (271, 72, 39, 990) in the case of DWT

level 3.

As described in Section 3.1.2, the selected wavelet co -

efficient in the target region is changed according to the

value of each bit of the digital watermark, the secret key,

and the shift value of the coordinate, which is de-

scribed in Section 5. The value of each bit of the digital

watermark and the secret key decide an initial bit pattern

in the positions in th e DW region.

The DW positions of the wavelet coefficients decided

by secret key presented below in (6) are simply

demonstrated in Figure 4.

(4)S

Figure 5. The DW positions of wavelet coefficients decided by

secret key. S'(4) = {281, 82, 49, 1000} obtained from S(4) =

(271, 72, 39, 990) by using the shift value k = 10 in the case of

DWT level 3.

(4)(271, 72, 39, 990)S (6)

The coordinate shift is performed by generating ()S





such that the shift value is added to all values of the ele-

ments in ()S



. For example, it is assumed that the shift

value is 10. As a result,



, (12)

( 4){281, 82, 49, 1000}S (7)





,,,

xxx, (13)

The DW positions of the wavelet coefficients decided

by secret key described in (7) are simply demon-

strated in Figure 5.

(4)S{0, 1}



, (14)

where ,



denote the values of the -th sound data

before and after embedding the digital watermark, re-

spectively; denotes the total number of wavelet co-

efficients at the DWT level selected for embedment;

are constants; and i

,aR

is a 0 - 1 variable that de-

cides the embedment of the watermark on the corre-

sponding wavelet coefficient, where 1 denotes an em-

bedment and 0 denotes a non-embedment.

5. Optimal Watermarking Problem

Because our approach of DW optimization is on the first

and challenging stage, we formulate the problem in a

simple way on the viewpoint of optimization problem.

Therefore, in the present study, we formulate the opti-

mization problem as minimization for distortion. We can

also formulate the problem using the constraint of keep-

ing distortion less than the masking threshold. Such more

elaborate approach is our next target.

When the number of wav elet coefficients that are pos-

sible targets for digital watermark embedment becomes

larger, the solution space of becomes larger, with the

result that a search for an optimal or near-optimal solu-

tion is time-consuming or difficult. Accordingly, we de-

fine the partial problem as follows:

To minimize the error caused by watermarking

and to maximize the detection rate (%) of digital

watermark after compression by the MP3 technique un-

der a restriction condition on , an optimum water-

marking problem is formulated as follows:

e( )xd( )x

d( )xM inimize e()Ps



　, (15)

Maximize d()

, (16)

Minimize e()Px, (8) Subject to d()

a　　, (17)

Maximize d()x, (9)



e( )N









, (18)

Subject to d()a　x　, (10)



e( )N











, (11) is i





, (19)

S. MURATA ET AL.

103





, (20)



,,,

cc cc, (21)



,,,



xxx, (22)

{0, 1}

c, (23)

{0, 1}

x, (24)

where

is an integer variable ranging from 0 to 1N



;

i is a 0 - 1 constant that decides the digital watermark

embedment on the corresponding wavelet coefficient,

where 1 denotes an embedment and 0 denotes a non-

embedment, and i

are the same as

those described for . We prepare a random initial pa t-

tern for

, , , , ,

yyNaRx





,,,

cccP

d()

c to solve .



For getting the detection rate , we use the wa-

termark, the wavelet transformation level, and the

(near-)optimum solution for or as input data to

the decoder for the proposed method.

PP

In the present study, we use DWT. However other op-

timal watermarking problems can be formulated using

other transforms as shown in our previous study [17,18]

where discrete Fourier transform and a watermarking

method prop osed in the reported study [19] were used.

6. GA Approach

GA is one of the most acknowledged methods for near-

optimization. We presume that and might have

many locally optimum solutions. According to our ex-

periences, GA was fairly effective for searching the ac-

ceptable near-optimum solutions for many discrete opti-

mization problems even when many locally optimum

solutions could exist. We use GA in the present study

because we have much more successful experiences on

GA application to optimization problems than those on

other methods. Other acknowledged techniques such as

tabu search [20] for solving discrete optimization prob-

lems could be ot her options f or sol vi n g and P

PP



The GA, which is based on biological evolution, has

been applied for solving the optimization problem. The

solution of the optimization problem is expressed as a

genotype. In each generation, there is a population com-

posed of several individuals identified by their genotypes.

The basic idea of GA is that if the number of better indi-

viduals is increased by generation updating, an optimum

or approximately optimum solution, as expressed by an

individual, will eventually be obtained. Chromosome

composed of genes is string specifying an individual. For

the generation updating, crossover and mutation are per-

formed. The crossover takes two parent strings and gen-

erates two offspring strings. The mutation changes se-

lected strings in a random way. In the references [13,14],

the GA is explained in detail.

In this section, we explain our approach for obtaining

near-optimum solutions for and by using a GA.

PP

6.1. Coding

The GA coding for Experiment 1, described below, is

performed as follows.

A gene is expressed by a bit of value 0 or 1. Accord-

ingly, each chromosome is composed of a row of bits.

The total number of bits is , in which bits of the

higher ranks are associated with the level of DWT and

m n

(mn)



subordinate bits are associated with the shift

value

, described for in the binary expression

(Figure 6). P

When an individual associated with a level that actu-

ally does not exist in a list of levels used for DWT is

generated in the GA process, the individual is judged to

have a fatal gene and is deleted, and a new individual is

generated.

The GA coding for Experiment 2, described below, is

performed as follows.

A gene is expressed by a bit of value 0 or 1 for .

Accordingly, each chromosome is composed of a row of

bits. The total number of bits is . Each bit is assigned

to each DWT coefficient for possible embedment of the

watermark (Figure 7). The value of 1 for a bit means

that the corresponding DWT coefficient is selected as an

object of watermarking, whereas 0 denotes non-embed-

ment for the corresponding DWT coefficient.

A gene is also expressed by a bit of value 0 or 1 for



. Accordingly, each chromosome is composed of a

row of bits. The total number of bits is . The chromo-

some expresses shift value l

described in P



in the

binary expression (Figure 8).

Figure 6. Schematic diagram of a chromosome structure in

Experiment 1 for P'.

Figure 7. Schematic diagram of a chromosome structure in

Experiment 2 for P.

104 S. MURATA ET AL.

Figure 8. Schematic diagram of a chromosome structure in

Experiment 2 for P'.

6.2. Strategy

For , a one-point crossover and a mutation by the

exchange(s) of pairs of 0 and 1 on a chromosome are

used, while for a two-point crossover and a one-point

mutation are used. The fitness function

P

(1)fdae

is used for bo th P and , where d, a, and e were in tro-

duced in Section 5. P

7. Numerical Experiments

In this section, we describe our computer experiments

and the results for evaluating the performance of the pro-

posed method.

7.1. Method

The experiment was performed in the following compu-

tational environment: the personal computer was a Dell

Dimension DXC051 (CPU: Pentium IV 3.0 GHz; main

memory: 1.0 GB); the OS was Microsoft Windows XP;

the development language was Microsoft Visual C++

6.0.

For DWT, we use Daubechies wavelets, which were

successfully used in related research on DW techniques

for images [16]. Moreover, a string composed of 400

randomly generated bits is used as the watermark infor-

mation.

Five music audio files, composed of the first entry in

five genre categories (classical, jazz, popular, rock, and

hiphop) in the research music database RWC [21], were

copied from CDs onto the personal computer as WAVE

files with the follo wing specifications: 44.1 kH z, 16 bits,

and monaural. For each music audio file selected from

the database, one 10-sec clip of the music audio (hereaf-

ter referred to as the original music audio clip) was ex-

tracted starting at 1 minute from the beginning of the

audio file and saved on a personal computer. The water-

marked music audio clip was produced by embedding a

digital watermark on the original audio clip by the pro-

posed method.

In Experiment 1, where the near-optimum solutions

for were obtained by using GA to evaluate the tol-

erance of watermarking to compression, MP3, AAC, and

WMA compression systems were each used to compress

the watermarked music audio clip to bitrates of 64, 96,

and 128 kbps. The bitrate of 32 kbps was also used for

MP3. Moreover, for , the fitness function value of the

near-optimum solutions obtained with GA was compared

with the fitness function values of the feasible solutions

at the initial generation.

P

In Experiment 2, the near-optimum solutions for P and



were obtained by using GA, and the performances of

those solu tions were compared with respect to the calcu-

lation times for getting those solution s, the quality of the

watermarked music audio clip, and the detection rate of

the watermarks after compression by the MP3 technique.

Moreover, for P and P



, the near-optimum solutions

obtained by using GA were compared with the solutions

produced by random generation of individual, neglecting

the restrictions (10) for P and (17) for , with respect

to the quality of the watermarked music audio clip, and

the detection rate of the watermarks after compression by

the MP3 technique.

P

7.2. Procedure

The procedure in the experiment is as follows.

Step 1:

First, an initial population consisting of several indi-

viduals is generated. In the process of generating an ini-

tial population having a given number of individuals, the

individual that does not meet the restriction or that has a

fatal gene is deleted as soon as it is produced, and a new

individual is generated. If all individuals generated in

300 continuous trials do not meet the restriction or have

at least one fatal gene, the procedure is terminated. When

an initial population having a given number of feasible

individuals is generated, go to Step 2.

Step 2:

The embedment of digital watermark according to the

condition decided by each individual, the sound com-

pression with the MP3 technique and the detection of

digital watermark after MP3 decoding are performed,

and then the fitness is calculated. When the generation is

final, the procedure is terminated. Otherwise, go to Step

3. Step 3:

The roulette strategy for selection, crossover, and mu-

tation are performed. Go to Step 2.

The near-optimum solution, which is defined as the

solution having the highest fitness through all genera-

tions, is obtained by repeating the process from Steps 2

to 3 until the given final generation.

7.3. Conditions

Table 1 shows the conditions of the GA strategy. In ad-

dition to the conditions shown in Table 1, the lower

bound of the detection rate a, described by the restriction

conditions (10) and (17), was set to 90. Table 2 shows

S. MURATA ET AL.

105

Table 1. Conditions of the GA strategy.

Exp.

No. Problem Population

size Generation

loop Crossover

rate Mutation

rate

1 P' 30 100 0.6 0.1

P' 30 100 0.6 0.1

2 P 50 200 0.6 0.2

Table 2. Conditions of chromosome structure in GA.

Exp. No. Problem Parameter values

1 P' 19, 3mn

P' 16l

2 P 400k

the conditions of the chromosome structure in GA

( are introduced in Section 6).

, , , klmn

For Experiment 1, 32 and 64 kbps were used as the bi-

trates of the MP3 compression and DWT levels ranging

from one to eight were selected as the search range. For

Experiment 2, 96 kbps was used as the bitrate of the

MP3 compression for level 3 of DWT, and 32 kbps was

used for levels 4 to 6 of DWT.

For Experiment 1, we obtained the segmental-signal-

to- quantization-noise ratio (hereafter referred to as

SNR ), as defined by (25) and (26).



10, , ,

10log jj

jjrjr

SNRyy y

















 jr













, (25)

eg j

SNR SNR

N

, (26)

where ,,

rjr



j denote the values of the r-th sound data

of frame before and after embedding the digital wa-

termark, respectively, and ,

NN denote the number

of sound data at frame and the number of frames to

be measured for j

eg , respectively. In the present

study, we used 209 ms as the time length of one frame.

When calculating

SNR

eg , we excluded the frame with

j, which means there is no ch ange of all values

of the sound data of frame . An audio frame size of

209 ms is adopted for comparison with results obtained

by our previous watermarking method [17,18], where the

frame size was decided in the relation to the cond ition of

watermarking.

SNR

SNR  j

7.4. Results and Discussions

7.4.1. Experiment 1

In this subsubsection , to examine the performance of the

proposed method in Experiment 1, the fitness function

value of the near-optimum solution for the partial prob-

lem P



is first compared with that obtained from 30

feasible solutions generated at random for the partial

problem P



. Next, the tolerances of watermark obtained

by the proposed method to compression by MP3, AAC,

and WMA are shown, and the time to obtain the

near-optimum solution for the partial problem P



shown to check the practicability of the proposed meth-

od.

Each DWT level obtained as an element composed of

a chromosome of the near-optimum solution by GA was

4 for classical and jazz, 5 for rock and hiphop, and 6 for

popular music for 32 kbps of the bitrate condition of

MP3, while the DWT level for 64 kbps as the bitrate

condition of MP3 was 3 for classical, jazz, and rock mu-

sic, 4 for popular, and 5 for hiphop.

As shown in Figures 9-13, GA successfully found a

good solution considered as the near-optimum solution

for each case. Table 3 shows the tolerance of watermark

to the compression. Table 4 shows the time to obtain a

near-optimum solution. For the MP3 bitrate condition of

32 and 64 kbps in the process of GA, the average detec-

tion rate after compression by MP3 with the bitrate of 32

to 128 kbps was 98.39% and 93.88%, respectively, and

the average time to obtain the near-optimum solution

was 5.94 × 102 and 3.78 × 102 sec, respectively. As a

condition for obtaining the high tolerance of watermark

to MP3 compression, 32 kbps was better than 64 kbps.

However, it took more time to obtain the near-optimum

solution for the MP3 bitrate condition for 32 kbps than

the time for 64 kbps. Moreover, for the MP3 bitrate con-

dition of 32 kbps and 64 kbps in the process of GA, the

average detection rate after compression by AAC with

the bit rate of 64 to 128 kbps was 93.71% and 84.62%,

respectively, and that by WMA with the bitrate of 64 to

128 kbps was 98.33% and 95.07%, respectively. As

shown in Table 5,

eg after embedding the water-

mark on the condition of the near-optimum solution ob-

tained by the proposed method was 69.7 to 78.6 dB.

These values suggest that noise due to the watermarking

was difficult to perceive.

SNR

7.4.2. Experiment 2

In this subsubsection , to examine the performance of the

proposed method in Experiment 2, the performances of

the near-optimum solutions for the original problem P

and the partial problem P



and those obtained from the

solutions generated at random for the original problem P

and the partial problem P



are compared with respect

to the detection rates of the watermarks after MP3 com-

pression and by the errors of the watermarking. Next, the

times to obtain the near-optimum solution for the origi-

nal problem P and the partial problem are shown to P

S. MURATA ET AL.

106

check the practicability of the proposed method.

As shown in Figures 14 to 18, GA successfully found

a good solution considered as the near-optimum solution

in each case for the original problem P and the partial

problem . Moreover, the near-optimum solution for

the original problem P had better performance as a con-

dition for embedding watermarks than that for the partial

problem

P



. However, the time to obtain the near-op-

timum solution for the original problem P was approxi-

mately 2 to 200 times, compared with that for the partial

Figure 9. Comparison of fitness function value between the near-optimum solution and the 30 feasible solutions generated at

random (Experiment 1, music file: classical). Left figure: MP 3 bit rate; 32 kbps. Right figure: MP3 bit rate; 64 kbps.

Figure 10. Comparison of fitness function value between the near-optimum solution and the 3 0 feasible solutions generated at

random (Experiment 1, music file: jazz). Left figure: MP3 bit rate; 32 kbps. Right figur e : MP3 bit rate ; 64 kbps.

Figure 11. Comparison of fitness function value between the near-optimum solution and the 3 0 feasible solutions generated at

random (Experiment 1, music file: popular). Left figure: MP3 bit rate; 32 kbps. Right figure: MP3 bit rate; 64 kbps.

S. MURATA ET AL.

107

Figure 12. Comparison of fitness function value between the near-optimum solution and the 3 0 feasible solutions generated at

random (Experiment 1, music file: rock). Left figure : MP 3 bit rate ; 32 kbps. Right figur e: MP3 bit rate; 64 kbps.

Figure 13. Comparison of fitness function value between the near-optimum solution and the 3 0 feasible solutions generated at

random (Experiment 1, music file: hiphop). Left figure: MP3 bit rate; 32 kbps. Right figure : MP 3 bit r ate; 64 kbps.

problem (Table 6). Practically, the watermarking

decided by a near-optimum solution for the partial prob-

lem is recommended. In additio n, an initial solution

for the partial problem can be considered to be a

secret key. Therefore, the partial problem

P



has an

advantage over the original problem P from the view-

point of watermarking security.

7.4.3. Comparison with Another Technique

For making the technical level of the proposed method

clear, we compared the results by the proposed method

with those by our retorted method [17,18]. In our re-

ported study, another optimal watermarking problem was

formulated using discrete Fourier transform and a wa-

termarking method proposed in the reported study [19].

In our reported stud y, a string composed of 92 r andomly

generated bits was used as the watermark information,

the same clips as used in the present study were used,

and

eg was measured using the same condition as

that used in the present study. Table 7 shows the toler-

ance of watermark to the compression in using our re-

ported method. Comparing Table 7 with Table 3, it is

clear that the proposed method had higher tolerance to

compression by each of MP3, AAC, and MWA than that

by our reported study. As shown in Table 8,

SNR

after embedding the watermark on the condition of the

near-optimum solution obtained by our reported method

was 36.8 to 52.2 dB. Although the amount of watermark

information used in the present study was more than 4

times to that of our reported study, the proposed method

realized lower noise than that by our reported method

(Tables 5 and 8).

SNR

7.4.4. Discussion for Practi cal Setup

It takes much time to apply our techn ique to a song of 3 -

5 minutes as one clip. For practical usage of our ap-

proach, we will select some short clips of 10 second, for

example. Then, we will use our approach to each short

clip. The methodology for effective selections of short

lips in a song is our next target. c

S. MURATA ET AL.

108

Table 3. Watermarking tolerance to compression measured by detection rate (%) in Experiment 1.

(a) MP3 bitrate as GA condition: 32 kb ps

Compression

Method Bitrate (kbps)

Classical Jazz Popular Rock Hiphop

128 99.75 99.75 99.5 99.25 99.25

96 99.75 99.75 99.5 99.25 99.25

64 99.25 99.25 98.25 97.25 98.25

MP3

32 93.75 93.5 98.25 97.25 97.75

128 99 99 98.25 97 97.25

96 89.75 91.15 95.25 94.5 93.25

AAC

64 87.25 87 92.5 93 91.5

128 99.75 99.75 99.5 99.75 99.25

96 98.75 97.75 99.5 98.75 99

WMA

64 98.25 95.5 97.25 96 96.25

(%)

(b) MP3 bitrate as GA condition: 64 kbps

Compression

Method Bitrate (kbps)

Classical Jazz Popular Rock Hiphop

128 100 100 99.25 100 98.25

96 99.25 99.25 98.75 97.75 97.5

64 95.5 95.5 95.25 92 96.25

MP3

32 80.25 80.25 88.5 75.5 88.5

128 95.25 95.25 93.5 87.75 96.25

96 80.25 80.25 87 74.75 91.5 AAC

64 77.75 77.75 80.25 67.25 84.5

128 100 99 99.75 97.75 98.5

96 97.75 93 96.25 92.75 94.75 WMA

64 94.25 89 94 89 90.25

(%)

Table 4. Time (sec.) to obtain a near-optimum solution in

Experiment 1.

MP3 bitrate 32 kbps 64 kbps

Classical 4.02 × 102 3.98 × 102

Jazz 4.20 × 102 2.70 × 102

Popular 1.16 × 103 3.93 × 102

Rock 4.67 × 102 3.17 × 102

HipHop 5.19 × 102 5.12 × 102

Table 5. SNRseg [dB] after embedding a watermark on the

condition of near-optimum solution obtained by the pro-

posed method in Experiment 1.

MP3 bitrate 32 kbps 64 kbps

Classical 70.1 73.2

Jazz 76.3 78.6

Popular 69.7 73.9

Rock 72.9 78.1

HipHop 73.6 71.1

S. MURATA ET AL.

109

Figure 14. Comparison between the near-optimum solutions and the solutions generated at random (Experiment 2, music file:

classical).

Figure 15. Comparison between the near-optimum solutions and the solutions generated at random (Experiment 2, music file:

jazz).

110 S. MURATA ET AL.

Figure 16. Comparison between the near-optimum solutions and the solutions generated at random (Experiment 2, music file:

popular).

Figure 17. Comparison between the near-optimum solutions and the solutions generated at random (Experiment 2, music file:

rock).

S. MURATA ET AL.

111

Figure 18. Comparison between the near-optimum solutions and the solutions generated at random (Experiment 2, music file:

hiphop).

It is difficult to model the compression attack in gen-

eral. In this paper, the framework of watermark optimi-

zation is shown on th e viewpoint of realizing good qu al-

ity and high tolerance to MP3 compression. Because

MP3 compression is one of the easiest attacks to water-

mark, we have selected it as an example. In addition, the

high tolerance of watermark by the proposed method to

each of AAC and WMA compression, which are repre-

sentative audio compression techniques, is also shown in

this paper.

8. Conclusions

A method for embedding digital watermark using DWT

and GA to realize high tolerance to compression by MP3

is proposed. The proposed method enables us to embed

digital watermark in a near-optimum manner for each

music audio clip. Moreover, the near-optimum solution

for the original problem P and partial problem P



, and

the initial pattern of digital watermark for are used

as secret keys in extracting the digital watermark. The

experimental results show that the proposed method

generates watermarked audio of good quality and high

tolerance to MP3 compression.

P

9. Acknowledgements

The authors would like to thank Associate Professor H.

Okuhara of the Gr aduate School of Osaka University for

his valuable advice on this research. The authors would

also like to thank Associate Professor M. Tabuse of

Kyoto Prefectural University for his valuable support of

this research.

10. References

[1] D. Kirovski and H. S. Malvar, “Spread-Spectrum Water-

marking of Audio Signals,” IEEE Transactions on Signal

Processing, Vol. 51, No. 4, April 2003, pp. 1020-1033.

doi:10.1109/TSP.2003.809384

[2] I.-K. Yeo and H. J. Kim, “Modified Patchwork Algorithm:

A Novel Audio Watermarking Scheme,” IEEE Transac-

tions on Speech and Audio Processing, Vol. 11, No. 4,

July 2003, pp. 381-386. doi:10.1109/TSA.2003.812145

[3] S. Wu, J. Huang, D. Huang and Y. Q. Shi, “Efficiently

Self-Synchronized Audio Watermarking for Assured Au-

dio Data Transmission,” IEEE Transactions on Broad-

casting, Vol. 51, No. 1, March 2005, pp. 69-76.

doi:10.1109/TBC.2004.838265

[4] X. Y. Wang and H. Zhao, “A Novel Synchronization

Invariant Audio Watermarking Scheme Based on DWT

and DCT,” IEEE Transactions on Signal Processing, Vol.

54, No. 12, December 2006, pp. 4835-4840.

doi:10.1109/TSP.2006.881258

[5] S. Xiang and J. Huang, “Histogram-Based Audio Wa-

terMarking against Time-Scale Modification and Crop-

ping Attacks,” IEEE Transactions on Multimedia, Vol. 9,

112 S. MURATA ET AL.

No. 7, November 2007, pp. 1357-1372.

doi:10.1109/TMM.2007.906580

[6] S. Kirbiz, A. N. Lemma, M. U. Celik and S. Katzenbeis-

ser, “Decode-Time Forensic Watermarking of AAC Bit-

streams,” IEEE Transactions on Information Forensics

and Security, Vol. 2, No. 4, December 2007, pp. 683-696.

doi:10.1109/TIFS.2007.908194

[7] D. J. Coumou and G. Sharma, “Insertion, Deletion Codes

with Feature-Based Embedding: A New Paradigm for

Watermark Synchronization with Applications to Speech

Watermarking,” IEEE Transactions on Information Fo-

rensics and Security, Vol. 3, No. 2, June 2008, pp.

153-165. doi:10.1109/TIFS.2008.920728

[8] S. Xianga, H. J. Kimb, and J. Huanga, “Audio Water-

marking Robust against Time-Scale Modification and

MP3 Compression,” Signal Processing, Vol. 88, No. 10,

October 2008, pp. 2372-2387.

doi:10.1016/j.sigpro.2008.03.019

[9] X. Y. Wang, P. P. Niu and H. Y. Yang, “A Robust, Digi-

tal-Audio Watermarking Method,” IEEE Multimedia, Vol.

16, No. 3, July 2009, pp. 60-69.

doi:10.1109/MMUL.2009.44

[10] N. K. Kalantari, M. A. Akhaee, S. M. Ahadi and H.

Amindavar, “Robust Multiplicative Patchwork Method

for Audio Watermarking,” IEEE Transactions on Audio,

Speech, and Language Processing, Vol. 17, No. 6, Au-

gust 2009, pp. 1133-1141.

[11] X. Y. Wanga, P. P. Niub and H. Y. Yangb, “A Robust

Digital Audio Watermarking Based on Statistics Charac-

teristics,” Pattern Recognition, Vol. 42, No. 11, Novem-

ber 2009, pp. 3057-3064.

[12] K. Yamamoto and M. Iwakiri, “Real-Time Audio Wa-

termarking Based on Characteristics of PCM in Digital

Instrument,” Journal of Information Hiding and Multi-

media Signal Processing, Vol. 1, No. 2, April 2010, pp.

59-71.

[13] D. Goldberg, “Genetic Algorithm in Search, Optimization,

and Machine Learning,” Addison-Wesley, Reading, Bos-

ton, 1989.

[14] J. H. Holland, “Adaptation in Natural and Artificial Sys-

tems,” The University Michigan Press, Ann Arbor, 1975,

and MIT Press, Cambridge, 1992.

[15] M. Shino, Y. Choi and K. Aizawa, “Wavelet Domain

Digital Watermarking Based on Threshold-Variable De-

cision,” Technical Report of IEICE, DSP2000-86, in Ja-

panese, Vol. 100, No. 325, September 2000, pp. 29-34.

[16] D. Inoue and Y. Yoshitomi, “Watermarking Using Wave-

let Transform and Genetic Algorithm for Realizing High

Tolerance to Image Compression,” Journal of the Insti-

tute of Image Electronics Engineers of Japan, Vol. 38,

No. 2, March 2009, pp. 136-144.

[17] M. Tanaka and Y. Yoshitomi, “Digital Audio Water-

marking Method with MP3 Tolerance Using Genetic Al-

gorithm,” Proceedings of the 2006 IEICE General Con-

ference, Tokyo, 21 March 2006, p. 182.

[18] M. Tanaka and Y. Yoshitomi, “Digital Audio Water-

marking Method with MP3 Tolerance Using Genetic Al-

gorithm,” Proceedings of 11th Czech-Japan Seminar on

Data Analysis and Decision Making under Uncertainty,

Sendai, 15-17 September 2008, pp. 81-85.

[19] R. Tachibana, “Capacity Analysis of Audio Watermark-

ing Based on Logarithmic Amplitude Modification

against Additive Noise,” IEICE Transactions on Funda-

mentals of Electronics, Communications and Computer

Sciences, in Japanese, Vol. J86-A, No. 11, November

2003, pp. 1197-1206.

[20] F. Glover, “Future Paths for Integer Programming and

Links to Artificial Intelligence,” Computers and Opera-

tions Research, Vol. 13, No. 5, May 1986, pp. 533-549.

doi:10.1016/0305-0548(86)90048-1

[21] M. Goto, H. Hashiguchi, T. Nishi mura and R. Oka , “R WC

Music Database: Database of Copyright-Cleared Musical

Pieces and Instrument Sounds for Research Purposes,”

Transactions of IPSJ, in Japanese, Vol. 45, No. 3, March

2004, pp. 728-738