Journal of Information Security, 2011, 2, 99112 doi:10.4236/jis.2011.23010 Published Online July 2011 (http://www.SciRP.org/journal/jis) Copyright © 2011 SciRes. JIS Audio Watermarking Using Wavelet Tr ansform and Genetic Algorithm for Realizing High Tolerance to MP3 Compression Shinichi Murata1, Yasunari Yoshitomi2, Hiroaki Ishii3 1Panasonic Corporation, Kadoma, Osaka, Japan 2Graduate Sc h ool of Lif e an d Envi ronmental Scien ces, Kyoto Prefectural University, Kyoto, Japan 3School of Science and Techn ol o gy , Kwansei Gakuin University, Sanda, Hyogo, Japan Email: yoshitomi@kpu.ac.jp Received April 12, 2011; revised June 8, 2011; accept ed J u ne 20, 2011 Abstract Recently, several digital watermarking techniques have been proposed for hiding data in the frequency do main of audio signals to protect the copyrights. However, little attention has been given to the optimal posi tion in the frequency domain for embedding watermarks. In general, there is a tradeoff between the quality of the watermarked audio and the tolerance of watermarks to signal processing methods, such as compression. In the present study, a watermarking method developed for a visual image by using a wavelet transform was applied to an audio clip. We also improved the performance of both the quality of the watermarked audio and the extraction of watermarks after compression by the MP3 technique. To accomplish this, we created a mul tipurpose optimization problem for deciding the positions of watermarks in the frequency domain and ob taining a nearoptimum solution. The nearoptimum solution is obtained by using a genetic algorithm. The experimental results show that the proposed method generates watermarked audios of good quality and high tolerance to MP3 compression. In addition, the security was improved by using the characteristic secret key to embed and extract the watermark information. Keywords: Audio Watermarking, Genetic Algorithm, Optimization, Wavelet Transforms, Secret Key 1. Introduction Recent progress in digital media and digital distribution systems, such as the Internet and cellular phones, has enabled us to easily access, copy, and modify digital con tent, such as electric documents, images, audio, and video. Under these circumstances, techniques to protect the copyrights of digital data and to prevent unauthorized duplication or tampering of these data are strongly de sired. Digital watermarking (DW) is a promising method for the copyright protection of digital data. Several studies have investigated audio DW [112]. Currently, digital audio clips distributed ov er the Internet or cellular phone systems are often modified by compression, which is one of the easiest and most effective ways to overcome DW without significantly deteriorating the quality of the au dio. Two important properties in audio DW are the in audibility of the distortion due to DW, and the robustness against signal processing methods, such as compression. In addition to these properties, the data rate and the com plexity of DW have attracted attention when discussing the performance of DW. We developed a method in which 1) a digital water mark can be sufficiently extracted from watermarked audio, even after compression, and 2) the quality of the audio remains high after embedding the digital water mark. However, there generally is a tradeoff relation between these two properties. In the present study, we improved both the extraction of digital watermarks and the quality of the watermarked audio by developing a multipurpose optimization prob lem for deciding the positions of digital watermarks in the frequency domain and obtaining a nearoptimum solution by using a discrete wavelet transform (DWT) and a genetic algorithm (GA) [13,14] for realizing high tolerance to compression by MP3, which is the most popular compression technique. The proposed method
100 S. MURATA ET AL. enables us to embed digital watermarks in a nearopti mum manner for each audio file. In addition, the security of the watermarked audio is improved by using a char acteristic secret key to embed and extract digital water marks. 2. Wavelet Transform Original audio data (0) k is used as the level0 wavelet decomposition coefficient sequence, where denotes the element number in the data. The data is decomposed into the multiresolution representation (MRR) and the coarsest approximation by repeatedly applying a DWT. The wavelet decomposition coefficient sequence k () k at level is decomposed into two wavelet decomposition coefficient sequences at level by (1) and (2): j1j (1) () 2 j knk nn ps , (1) (1) () 2 j knk n wq n s, (2) where 2nk and 2nk denote the scaling and wavelet sequences, respectively, and denotes the devel opment coefficient at level pq(1)j k w 1j . The development co efficients at level are obtained by using (1) and (2) iteratively from to 0j1jJ . Figure 1 shows the process of a multiresolution analysis by DWT. The signal is recomposed by using (3) repeatedly from to . 1jJ 0j (1)() () 22 jj nnkknk k spsqw j k k p (3) In the present study, we use the Daubechies wavelet for DWT. As a result, we obtain the following relation between and : 2nk p2nk q 1 1k k q － (4) 3. Wavelet Domain Digital Watermarking Based on ThresholdVariable Decision It is known that the histogram of the wavelet coefficients of each domain of MRR sequences has a distribution that Figure 1. Multiresolution analysis by the DWT. is centered at approximately 0 when DWT is performed on a natural visual image [15]. For an audio clip, we also found the same phenomena. Figure 2 shows an example of an audio histogram. In the present research, the technique [15] for exploit ing the above phenomena on a natural image for embed ding a digital watermark on the wavelet coefficients of MRR sequences is applied to audio DW. The procedure is described below. 3.1. Embedment of Watermark Information 3.1.1. Setti ng of Parameter s For the watermarking of an audio clip, we obtain the histogram of the wavelet coefficients at the selected level of MRR sequences. Figure 3 shows a schematic diagram of the histogram of the wavelet coefficients of an MRR sequence. As with the DW techniques for images [15,16], we set the following watermarking pa rameters: V V The values of and (see Figure 3) are chosen such that the nonpositive wavelet coeffi cients (m in total frequency) are equally divided into two groups by , and the positive wavelet coefficients ( (minus)Th (minus)Th (plus)Th S S in total frequency) are equally divided into two groups by . Next, the values of , , , and , which are the parameters for con trolling the embedment strength, are cho sen to satisfy the following conditions : (plus)Th 41T 2T3TT 1) 1 (minus)203(plus)4TThT TThT S. 2) The value of , the number of wavelet coeffi cients in , is equal to , the number of wavelet coefficients in [( . In short, 1T minus(1, ())TTh 2T S us),Tmin2)Th 12TT SS . 3) The value of , the number of wavelet coeffi cients in , is equal to , the number of 3T lus)] S (3,(pTTh 4T S Figure 2. Histogram of the wavelet coefficients of an MRR sequence at level 3 (jazz). Copyright © 2011 SciRes. JIS
S. MURATA ET AL. 101 Figure 3. Schematic diagram of the histogram of MRR wavelet coefficients. wavelet coefficients in (. In short, (plus), 4)Th T3T S . 4 4) T S 13TmT p SS SS. In the present study, the values of both 1Tm SS and 3Tp SS are set to 0.2, which was determined experi mentally. 3.1.2. Embedment of Watermark Information The wavelet coefficients of MRR are rewritten according to the following rules in embedding digital watermark. Here, deno tes one of the wavelet coefficients. i 1) In the case that bit in watermark W is 0, V i when , is changed to . W 2 i VT3VTi V V2T3Twhen , is changed to . i2TV i Twhen , is kept. 3 i V W i 2) In the case that bit in watermark W is 1, i when , is changed to . 10 i TV 0VT i V 4V1T4.Twhen , is changed to i1VTi Vwhen or , is kept. i i The wavelet coefficient i V is set in the range of when bit i in watermark W is 0, whereas the DWT coefficient is set in the range of or V when bit in watermark W is 1. The frequency of the change of i toward the inside is expected to be approximately equal to the change toward the outside when the number of 0 bits is approximately the same as the number of 1 bits. 4 iTV 2 i TVT 1 i VTi 3W i V i W V 4T 3.1.3. Generation of Water m arked Audio The inverse DWT (IDWT) is performed to wavelet coef ficients embedded with the watermark to obtain the audio with the watermark. i V 3.2. Detection of the Watermark 3.2.1. Presumption of Parameters The watermarked audio, which may be modified by sig nal processing methods, such as MP3 compression, is converted into wavelet coefficients. The wavelet coeffi cient in the region where the watermark information is embedded is denoted as V . For the histogram of V , the two parameters (minus)Th and (plus)Th , which correspond to Th and for the histogram of V before embedding the watermark, are obtained in the same manner as that for and , mentioned in Section 3.1.1. (minus) (p Th lus) (minus) Th (minuTh(plus)Th s) and Th (plus) can be used as pre sumptive values for and Th , respec tively, because the distribution of the histogram of (mTh inus)(plus) V is expected to be approximately the same as that of before embedding the watermark. VThe watermarked audio can undergo certain types of audio processing, including compression, such that the difference between the distribution of the histogram of V after audio processing and that of V before embed ding the watermark is not negligible. In such a case, it may not be persuasive that and (minus)Th(plus)Th can be used as presumptive values for and , respectively. (mTh inus) (p Th lus) 3.2.2. Detection of Watermark Information When the wavelet coefficient i V is in the range of (minus) (plus) i ThV Th , the corresponding bit i W in measured watermark is judged to be 0. When the DWT coefficient is in the range of i W V (minus) i VTh or i(plus)VTh i W, the corresponding bit in the measured watermark is judged to be 1. W The detection rate (%) is defined as the per centage of correspondence between the bit i in wa termark W and the corresponding bit in measured watermark ()dxW i W W. 4. Use of the Secret Key When we embed a digital watermark by using the partial problem described in Section 5, the watermark is pro duced by using a secret key ()S , which is composed of a row of integers randomly selected once or less per integer in the integer range from 1 to , as shown in the example below N (400) , where is the number of bits of the watermark and is the total number of wavelet coefficients that are candidates embedded with DW. N Copyright © 2011 SciRes. JIS
102 S. MURATA ET AL. (400)(271, 72, 39, 990, 524, 88, , 1011, 688, 312)S (5) Each value and order of numbers in ()S indicate the position of each bit of the digital watermark in the DW region. Here, the position in the region is expressed as a onedimensional coordinate. For example, the first number, 271, and the second number, 72, in mean that the first and second bits of the watermark are set for the wavelet coefficients at the coordinates of 271 and 72 in the DW region, respectively. (400)S Figure 4. The DW positions of wavelet coefficients decided by secret key. S(4) = (271, 72, 39, 990) in the case of DWT level 3. As described in Section 3.1.2, the selected wavelet co  efficient in the target region is changed according to the value of each bit of the digital watermark, the secret key, and the shift value of the coordinate, which is de scribed in Section 5. The value of each bit of the digital watermark and the secret key decide an initial bit pattern in the positions in th e DW region. k The DW positions of the wavelet coefficients decided by secret key presented below in (6) are simply demonstrated in Figure 4. (4)S Figure 5. The DW positions of wavelet coefficients decided by secret key. S'(4) = {281, 82, 49, 1000} obtained from S(4) = (271, 72, 39, 990) by using the shift value k = 10 in the case of DWT level 3. (4)(271, 72, 39, 990)S (6) The coordinate shift is performed by generating ()S such that the shift value is added to all values of the ele ments in ()S . For example, it is assumed that the shift value is 10. As a result, 1 N i i R , (12) ( 4){281, 82, 49, 1000}S (7) 12 ,,, N xxx, (13) The DW positions of the wavelet coefficients decided by secret key described in (7) are simply demon strated in Figure 5. (4)S{0, 1} i x , (14) where , ii yy denote the values of the th sound data before and after embedding the digital watermark, re spectively; denotes the total number of wavelet co efficients at the DWT level selected for embedment; are constants; and i i N ,aR is a 0  1 variable that de cides the embedment of the watermark on the corre sponding wavelet coefficient, where 1 denotes an em bedment and 0 denotes a nonembedment. 5. Optimal Watermarking Problem Because our approach of DW optimization is on the first and challenging stage, we formulate the problem in a simple way on the viewpoint of optimization problem. Therefore, in the present study, we formulate the opti mization problem as minimization for distortion. We can also formulate the problem using the constraint of keep ing distortion less than the masking threshold. Such more elaborate approach is our next target. When the number of wav elet coefficients that are pos sible targets for digital watermark embedment becomes larger, the solution space of becomes larger, with the result that a search for an optimal or nearoptimal solu tion is timeconsuming or difficult. Accordingly, we de fine the partial problem as follows: P P To minimize the error caused by watermarking and to maximize the detection rate (%) of digital watermark after compression by the MP3 technique un der a restriction condition on , an optimum water marking problem is formulated as follows: e( )xd( )x d( )xM inimize e()Ps , (15) Maximize d() , (16) Minimize e()Px, (8) Subject to d() a , (17) Maximize d()x, (9) 2 1 e( )N ii i yy , (18) Subject to d()a x , (10) 2 1 e( )N ii i yy x , (11) is i c , (19) Copyright © 2011 SciRes. JIS
S. MURATA ET AL. 103 1 N i i cR , (20) 12 ,,, cc cc, (21) 12 ,,, N xxx, (22) {0, 1} i c, (23) {0, 1} i x, (24) where is an integer variable ranging from 0 to 1N ; i is a 0  1 constant that decides the digital watermark embedment on the corresponding wavelet coefficient, where 1 denotes an embedment and 0 denotes a non embedment, and i are the same as those described for . We prepare a random initial pa t tern for c , , , , , ii yyNaRx P ,,, 12 cccP d() x c to solve . For getting the detection rate , we use the wa termark, the wavelet transformation level, and the (near)optimum solution for or as input data to the decoder for the proposed method. PP In the present study, we use DWT. However other op timal watermarking problems can be formulated using other transforms as shown in our previous study [17,18] where discrete Fourier transform and a watermarking method prop osed in the reported study [19] were used. 6. GA Approach GA is one of the most acknowledged methods for near optimization. We presume that and might have many locally optimum solutions. According to our ex periences, GA was fairly effective for searching the ac ceptable nearoptimum solutions for many discrete opti mization problems even when many locally optimum solutions could exist. We use GA in the present study because we have much more successful experiences on GA application to optimization problems than those on other methods. Other acknowledged techniques such as tabu search [20] for solving discrete optimization prob lems could be ot her options f or sol vi n g and P PP P . The GA, which is based on biological evolution, has been applied for solving the optimization problem. The solution of the optimization problem is expressed as a genotype. In each generation, there is a population com posed of several individuals identified by their genotypes. The basic idea of GA is that if the number of better indi viduals is increased by generation updating, an optimum or approximately optimum solution, as expressed by an individual, will eventually be obtained. Chromosome composed of genes is string specifying an individual. For the generation updating, crossover and mutation are per formed. The crossover takes two parent strings and gen erates two offspring strings. The mutation changes se lected strings in a random way. In the references [13,14], the GA is explained in detail. In this section, we explain our approach for obtaining nearoptimum solutions for and by using a GA. PP 6.1. Coding The GA coding for Experiment 1, described below, is performed as follows. A gene is expressed by a bit of value 0 or 1. Accord ingly, each chromosome is composed of a row of bits. The total number of bits is , in which bits of the higher ranks are associated with the level of DWT and m n (mn) subordinate bits are associated with the shift value , described for in the binary expression (Figure 6). P When an individual associated with a level that actu ally does not exist in a list of levels used for DWT is generated in the GA process, the individual is judged to have a fatal gene and is deleted, and a new individual is generated. The GA coding for Experiment 2, described below, is performed as follows. A gene is expressed by a bit of value 0 or 1 for . Accordingly, each chromosome is composed of a row of bits. The total number of bits is . Each bit is assigned to each DWT coefficient for possible embedment of the watermark (Figure 7). The value of 1 for a bit means that the corresponding DWT coefficient is selected as an object of watermarking, whereas 0 denotes nonembed ment for the corresponding DWT coefficient. P k A gene is also expressed by a bit of value 0 or 1 for P . Accordingly, each chromosome is composed of a row of bits. The total number of bits is . The chromo some expresses shift value l described in P in the binary expression (Figure 8). Figure 6. Schematic diagram of a chromosome structure in Experiment 1 for P'. Figure 7. Schematic diagram of a chromosome structure in Experiment 2 for P. Copyright © 2011 SciRes. JIS
104 S. MURATA ET AL. Figure 8. Schematic diagram of a chromosome structure in Experiment 2 for P'. 6.2. Strategy For , a onepoint crossover and a mutation by the exchange(s) of pairs of 0 and 1 on a chromosome are used, while for a twopoint crossover and a onepoint mutation are used. The fitness function P P (1)fdae is used for bo th P and , where d, a, and e were in tro duced in Section 5. P 7. Numerical Experiments In this section, we describe our computer experiments and the results for evaluating the performance of the pro posed method. 7.1. Method The experiment was performed in the following compu tational environment: the personal computer was a Dell Dimension DXC051 (CPU: Pentium IV 3.0 GHz; main memory: 1.0 GB); the OS was Microsoft Windows XP; the development language was Microsoft Visual C++ 6.0. For DWT, we use Daubechies wavelets, which were successfully used in related research on DW techniques for images [16]. Moreover, a string composed of 400 randomly generated bits is used as the watermark infor mation. Five music audio files, composed of the first entry in five genre categories (classical, jazz, popular, rock, and hiphop) in the research music database RWC [21], were copied from CDs onto the personal computer as WAVE files with the follo wing specifications: 44.1 kH z, 16 bits, and monaural. For each music audio file selected from the database, one 10sec clip of the music audio (hereaf ter referred to as the original music audio clip) was ex tracted starting at 1 minute from the beginning of the audio file and saved on a personal computer. The water marked music audio clip was produced by embedding a digital watermark on the original audio clip by the pro posed method. In Experiment 1, where the nearoptimum solutions for were obtained by using GA to evaluate the tol erance of watermarking to compression, MP3, AAC, and WMA compression systems were each used to compress the watermarked music audio clip to bitrates of 64, 96, and 128 kbps. The bitrate of 32 kbps was also used for MP3. Moreover, for , the fitness function value of the nearoptimum solutions obtained with GA was compared with the fitness function values of the feasible solutions at the initial generation. P P In Experiment 2, the nearoptimum solutions for P and P were obtained by using GA, and the performances of those solu tions were compared with respect to the calcu lation times for getting those solution s, the quality of the watermarked music audio clip, and the detection rate of the watermarks after compression by the MP3 technique. Moreover, for P and P , the nearoptimum solutions obtained by using GA were compared with the solutions produced by random generation of individual, neglecting the restrictions (10) for P and (17) for , with respect to the quality of the watermarked music audio clip, and the detection rate of the watermarks after compression by the MP3 technique. P 7.2. Procedure The procedure in the experiment is as follows. Step 1: First, an initial population consisting of several indi viduals is generated. In the process of generating an ini tial population having a given number of individuals, the individual that does not meet the restriction or that has a fatal gene is deleted as soon as it is produced, and a new individual is generated. If all individuals generated in 300 continuous trials do not meet the restriction or have at least one fatal gene, the procedure is terminated. When an initial population having a given number of feasible individuals is generated, go to Step 2. Step 2: The embedment of digital watermark according to the condition decided by each individual, the sound com pression with the MP3 technique and the detection of digital watermark after MP3 decoding are performed, and then the fitness is calculated. When the generation is final, the procedure is terminated. Otherwise, go to Step 3. Step 3: The roulette strategy for selection, crossover, and mu tation are performed. Go to Step 2. The nearoptimum solution, which is defined as the solution having the highest fitness through all genera tions, is obtained by repeating the process from Steps 2 to 3 until the given final generation. 7.3. Conditions Table 1 shows the conditions of the GA strategy. In ad dition to the conditions shown in Table 1, the lower bound of the detection rate a, described by the restriction conditions (10) and (17), was set to 90. Table 2 shows Copyright © 2011 SciRes. JIS
S. MURATA ET AL. 105 Table 1. Conditions of the GA strategy. Exp. No. Problem Population size Generation loop Crossover rate Mutation rate 1 P' 30 100 0.6 0.1 P' 30 100 0.6 0.1 2 P 50 200 0.6 0.2 Table 2. Conditions of chromosome structure in GA. Exp. No. Problem Parameter values 1 P' 19, 3mn P' 16l 2 P 400k the conditions of the chromosome structure in GA ( are introduced in Section 6). , , , klmn For Experiment 1, 32 and 64 kbps were used as the bi trates of the MP3 compression and DWT levels ranging from one to eight were selected as the search range. For Experiment 2, 96 kbps was used as the bitrate of the MP3 compression for level 3 of DWT, and 32 kbps was used for levels 4 to 6 of DWT. For Experiment 1, we obtained the segmentalsignal to quantizationnoise ratio (hereafter referred to as eg SNR ), as defined by (25) and (26). 2 2 10, , , 11 10log jj NN jjrjr rr SNRyy y jr , (25) 1 1f N eg j j f SNR SNR N , (26) where ,, , rjr yy j denote the values of the rth sound data of frame before and after embedding the digital wa termark, respectively, and , f NN denote the number of sound data at frame and the number of frames to be measured for j eg , respectively. In the present study, we used 209 ms as the time length of one frame. When calculating SNR eg , we excluded the frame with j, which means there is no ch ange of all values of the sound data of frame . An audio frame size of 209 ms is adopted for comparison with results obtained by our previous watermarking method [17,18], where the frame size was decided in the relation to the cond ition of watermarking. SNR SNR j 7.4. Results and Discussions 7.4.1. Experiment 1 In this subsubsection , to examine the performance of the proposed method in Experiment 1, the fitness function value of the nearoptimum solution for the partial prob lem P is first compared with that obtained from 30 feasible solutions generated at random for the partial problem P . Next, the tolerances of watermark obtained by the proposed method to compression by MP3, AAC, and WMA are shown, and the time to obtain the nearoptimum solution for the partial problem P is shown to check the practicability of the proposed meth od. Each DWT level obtained as an element composed of a chromosome of the nearoptimum solution by GA was 4 for classical and jazz, 5 for rock and hiphop, and 6 for popular music for 32 kbps of the bitrate condition of MP3, while the DWT level for 64 kbps as the bitrate condition of MP3 was 3 for classical, jazz, and rock mu sic, 4 for popular, and 5 for hiphop. As shown in Figures 913, GA successfully found a good solution considered as the nearoptimum solution for each case. Table 3 shows the tolerance of watermark to the compression. Table 4 shows the time to obtain a nearoptimum solution. For the MP3 bitrate condition of 32 and 64 kbps in the process of GA, the average detec tion rate after compression by MP3 with the bitrate of 32 to 128 kbps was 98.39% and 93.88%, respectively, and the average time to obtain the nearoptimum solution was 5.94 × 102 and 3.78 × 102 sec, respectively. As a condition for obtaining the high tolerance of watermark to MP3 compression, 32 kbps was better than 64 kbps. However, it took more time to obtain the nearoptimum solution for the MP3 bitrate condition for 32 kbps than the time for 64 kbps. Moreover, for the MP3 bitrate con dition of 32 kbps and 64 kbps in the process of GA, the average detection rate after compression by AAC with the bit rate of 64 to 128 kbps was 93.71% and 84.62%, respectively, and that by WMA with the bitrate of 64 to 128 kbps was 98.33% and 95.07%, respectively. As shown in Table 5, eg after embedding the water mark on the condition of the nearoptimum solution ob tained by the proposed method was 69.7 to 78.6 dB. These values suggest that noise due to the watermarking was difficult to perceive. SNR 7.4.2. Experiment 2 In this subsubsection , to examine the performance of the proposed method in Experiment 2, the performances of the nearoptimum solutions for the original problem P and the partial problem P and those obtained from the solutions generated at random for the original problem P and the partial problem P are compared with respect to the detection rates of the watermarks after MP3 com pression and by the errors of the watermarking. Next, the times to obtain the nearoptimum solution for the origi nal problem P and the partial problem are shown to P Copyright © 2011 SciRes. JIS
S. MURATA ET AL. Copyright © 2011 SciRes. JIS 106 check the practicability of the proposed method. As shown in Figures 14 to 18, GA successfully found a good solution considered as the nearoptimum solution in each case for the original problem P and the partial problem . Moreover, the nearoptimum solution for the original problem P had better performance as a con dition for embedding watermarks than that for the partial problem P P . However, the time to obtain the nearop timum solution for the original problem P was approxi mately 2 to 200 times, compared with that for the partial Figure 9. Comparison of fitness function value between the nearoptimum solution and the 30 feasible solutions generated at random (Experiment 1, music file: classical). Left figure: MP 3 bit rate; 32 kbps. Right figure: MP3 bit rate; 64 kbps. Figure 10. Comparison of fitness function value between the nearoptimum solution and the 3 0 feasible solutions generated at random (Experiment 1, music file: jazz). Left figure: MP3 bit rate; 32 kbps. Right figur e : MP3 bit rate ; 64 kbps. Figure 11. Comparison of fitness function value between the nearoptimum solution and the 3 0 feasible solutions generated at random (Experiment 1, music file: popular). Left figure: MP3 bit rate; 32 kbps. Right figure: MP3 bit rate; 64 kbps.
S. MURATA ET AL. 107 Figure 12. Comparison of fitness function value between the nearoptimum solution and the 3 0 feasible solutions generated at random (Experiment 1, music file: rock). Left figure : MP 3 bit rate ; 32 kbps. Right figur e: MP3 bit rate; 64 kbps. Figure 13. Comparison of fitness function value between the nearoptimum solution and the 3 0 feasible solutions generated at random (Experiment 1, music file: hiphop). Left figure: MP3 bit rate; 32 kbps. Right figure : MP 3 bit r ate; 64 kbps. problem (Table 6). Practically, the watermarking decided by a nearoptimum solution for the partial prob lem is recommended. In additio n, an initial solution for the partial problem can be considered to be a secret key. Therefore, the partial problem P P P P has an advantage over the original problem P from the view point of watermarking security. 7.4.3. Comparison with Another Technique For making the technical level of the proposed method clear, we compared the results by the proposed method with those by our retorted method [17,18]. In our re ported study, another optimal watermarking problem was formulated using discrete Fourier transform and a wa termarking method proposed in the reported study [19]. In our reported stud y, a string composed of 92 r andomly generated bits was used as the watermark information, the same clips as used in the present study were used, and eg was measured using the same condition as that used in the present study. Table 7 shows the toler ance of watermark to the compression in using our re ported method. Comparing Table 7 with Table 3, it is clear that the proposed method had higher tolerance to compression by each of MP3, AAC, and MWA than that by our reported study. As shown in Table 8, SNR eg after embedding the watermark on the condition of the nearoptimum solution obtained by our reported method was 36.8 to 52.2 dB. Although the amount of watermark information used in the present study was more than 4 times to that of our reported study, the proposed method realized lower noise than that by our reported method (Tables 5 and 8). SNR 7.4.4. Discussion for Practi cal Setup It takes much time to apply our techn ique to a song of 3  5 minutes as one clip. For practical usage of our ap proach, we will select some short clips of 10 second, for example. Then, we will use our approach to each short clip. The methodology for effective selections of short lips in a song is our next target. c Copyright © 2011 SciRes. JIS
S. MURATA ET AL. Copyright © 2011 SciRes. JIS 108 Table 3. Watermarking tolerance to compression measured by detection rate (%) in Experiment 1. (a) MP3 bitrate as GA condition: 32 kb ps Compression Method Bitrate (kbps) Classical Jazz Popular Rock Hiphop 128 99.75 99.75 99.5 99.25 99.25 96 99.75 99.75 99.5 99.25 99.25 64 99.25 99.25 98.25 97.25 98.25 MP3 32 93.75 93.5 98.25 97.25 97.75 128 99 99 98.25 97 97.25 96 89.75 91.15 95.25 94.5 93.25 AAC 64 87.25 87 92.5 93 91.5 128 99.75 99.75 99.5 99.75 99.25 96 98.75 97.75 99.5 98.75 99 WMA 64 98.25 95.5 97.25 96 96.25 (%) (b) MP3 bitrate as GA condition: 64 kbps Compression Method Bitrate (kbps) Classical Jazz Popular Rock Hiphop 128 100 100 99.25 100 98.25 96 99.25 99.25 98.75 97.75 97.5 64 95.5 95.5 95.25 92 96.25 MP3 32 80.25 80.25 88.5 75.5 88.5 128 95.25 95.25 93.5 87.75 96.25 96 80.25 80.25 87 74.75 91.5 AAC 64 77.75 77.75 80.25 67.25 84.5 128 100 99 99.75 97.75 98.5 96 97.75 93 96.25 92.75 94.75 WMA 64 94.25 89 94 89 90.25 (%) Table 4. Time (sec.) to obtain a nearoptimum solution in Experiment 1. MP3 bitrate 32 kbps 64 kbps Classical 4.02 × 102 3.98 × 102 Jazz 4.20 × 102 2.70 × 102 Popular 1.16 × 103 3.93 × 102 Rock 4.67 × 102 3.17 × 102 HipHop 5.19 × 102 5.12 × 102 Table 5. SNRseg [dB] after embedding a watermark on the condition of nearoptimum solution obtained by the pro posed method in Experiment 1. MP3 bitrate 32 kbps 64 kbps Classical 70.1 73.2 Jazz 76.3 78.6 Popular 69.7 73.9 Rock 72.9 78.1 HipHop 73.6 71.1
S. MURATA ET AL. Copyright © 2011 SciRes. JIS 109 Figure 14. Comparison between the nearoptimum solutions and the solutions generated at random (Experiment 2, music file: classical). Figure 15. Comparison between the nearoptimum solutions and the solutions generated at random (Experiment 2, music file: jazz).
110 S. MURATA ET AL. Figure 16. Comparison between the nearoptimum solutions and the solutions generated at random (Experiment 2, music file: popular). Figure 17. Comparison between the nearoptimum solutions and the solutions generated at random (Experiment 2, music file: rock). Copyright © 2011 SciRes. JIS
S. MURATA ET AL. Copyright © 2011 SciRes. JIS 111 Figure 18. Comparison between the nearoptimum solutions and the solutions generated at random (Experiment 2, music file: hiphop). It is difficult to model the compression attack in gen eral. In this paper, the framework of watermark optimi zation is shown on th e viewpoint of realizing good qu al ity and high tolerance to MP3 compression. Because MP3 compression is one of the easiest attacks to water mark, we have selected it as an example. In addition, the high tolerance of watermark by the proposed method to each of AAC and WMA compression, which are repre sentative audio compression techniques, is also shown in this paper. 8. Conclusions A method for embedding digital watermark using DWT and GA to realize high tolerance to compression by MP3 is proposed. The proposed method enables us to embed digital watermark in a nearoptimum manner for each music audio clip. Moreover, the nearoptimum solution for the original problem P and partial problem P , and the initial pattern of digital watermark for are used as secret keys in extracting the digital watermark. The experimental results show that the proposed method generates watermarked audio of good quality and high tolerance to MP3 compression. P 9. Acknowledgements The authors would like to thank Associate Professor H. Okuhara of the Gr aduate School of Osaka University for his valuable advice on this research. The authors would also like to thank Associate Professor M. Tabuse of Kyoto Prefectural University for his valuable support of this research. 10. References [1] D. Kirovski and H. S. Malvar, “SpreadSpectrum Water marking of Audio Signals,” IEEE Transactions on Signal Processing, Vol. 51, No. 4, April 2003, pp. 10201033. doi:10.1109/TSP.2003.809384 [2] I.K. Yeo and H. J. Kim, “Modified Patchwork Algorithm: A Novel Audio Watermarking Scheme,” IEEE Transac tions on Speech and Audio Processing, Vol. 11, No. 4, July 2003, pp. 381386. doi:10.1109/TSA.2003.812145 [3] S. Wu, J. Huang, D. Huang and Y. Q. Shi, “Efficiently SelfSynchronized Audio Watermarking for Assured Au dio Data Transmission,” IEEE Transactions on Broad casting, Vol. 51, No. 1, March 2005, pp. 6976. doi:10.1109/TBC.2004.838265 [4] X. Y. Wang and H. Zhao, “A Novel Synchronization Invariant Audio Watermarking Scheme Based on DWT and DCT,” IEEE Transactions on Signal Processing, Vol. 54, No. 12, December 2006, pp. 48354840. doi:10.1109/TSP.2006.881258 [5] S. Xiang and J. Huang, “HistogramBased Audio Wa terMarking against TimeScale Modification and Crop ping Attacks,” IEEE Transactions on Multimedia, Vol. 9,
112 S. MURATA ET AL. No. 7, November 2007, pp. 13571372. doi:10.1109/TMM.2007.906580 [6] S. Kirbiz, A. N. Lemma, M. U. Celik and S. Katzenbeis ser, “DecodeTime Forensic Watermarking of AAC Bit streams,” IEEE Transactions on Information Forensics and Security, Vol. 2, No. 4, December 2007, pp. 683696. doi:10.1109/TIFS.2007.908194 [7] D. J. Coumou and G. Sharma, “Insertion, Deletion Codes with FeatureBased Embedding: A New Paradigm for Watermark Synchronization with Applications to Speech Watermarking,” IEEE Transactions on Information Fo rensics and Security, Vol. 3, No. 2, June 2008, pp. 153165. doi:10.1109/TIFS.2008.920728 [8] S. Xianga, H. J. Kimb, and J. Huanga, “Audio Water marking Robust against TimeScale Modification and MP3 Compression,” Signal Processing, Vol. 88, No. 10, October 2008, pp. 23722387. doi:10.1016/j.sigpro.2008.03.019 [9] X. Y. Wang, P. P. Niu and H. Y. Yang, “A Robust, Digi talAudio Watermarking Method,” IEEE Multimedia, Vol. 16, No. 3, July 2009, pp. 6069. doi:10.1109/MMUL.2009.44 [10] N. K. Kalantari, M. A. Akhaee, S. M. Ahadi and H. Amindavar, “Robust Multiplicative Patchwork Method for Audio Watermarking,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 17, No. 6, Au gust 2009, pp. 11331141. [11] X. Y. Wanga, P. P. Niub and H. Y. Yangb, “A Robust Digital Audio Watermarking Based on Statistics Charac teristics,” Pattern Recognition, Vol. 42, No. 11, Novem ber 2009, pp. 30573064. [12] K. Yamamoto and M. Iwakiri, “RealTime Audio Wa termarking Based on Characteristics of PCM in Digital Instrument,” Journal of Information Hiding and Multi media Signal Processing, Vol. 1, No. 2, April 2010, pp. 5971. [13] D. Goldberg, “Genetic Algorithm in Search, Optimization, and Machine Learning,” AddisonWesley, Reading, Bos ton, 1989. [14] J. H. Holland, “Adaptation in Natural and Artificial Sys tems,” The University Michigan Press, Ann Arbor, 1975, and MIT Press, Cambridge, 1992. [15] M. Shino, Y. Choi and K. Aizawa, “Wavelet Domain Digital Watermarking Based on ThresholdVariable De cision,” Technical Report of IEICE, DSP200086, in Ja panese, Vol. 100, No. 325, September 2000, pp. 2934. [16] D. Inoue and Y. Yoshitomi, “Watermarking Using Wave let Transform and Genetic Algorithm for Realizing High Tolerance to Image Compression,” Journal of the Insti tute of Image Electronics Engineers of Japan, Vol. 38, No. 2, March 2009, pp. 136144. [17] M. Tanaka and Y. Yoshitomi, “Digital Audio Water marking Method with MP3 Tolerance Using Genetic Al gorithm,” Proceedings of the 2006 IEICE General Con ference, Tokyo, 21 March 2006, p. 182. [18] M. Tanaka and Y. Yoshitomi, “Digital Audio Water marking Method with MP3 Tolerance Using Genetic Al gorithm,” Proceedings of 11th CzechJapan Seminar on Data Analysis and Decision Making under Uncertainty, Sendai, 1517 September 2008, pp. 8185. [19] R. Tachibana, “Capacity Analysis of Audio Watermark ing Based on Logarithmic Amplitude Modification against Additive Noise,” IEICE Transactions on Funda mentals of Electronics, Communications and Computer Sciences, in Japanese, Vol. J86A, No. 11, November 2003, pp. 11971206. [20] F. Glover, “Future Paths for Integer Programming and Links to Artificial Intelligence,” Computers and Opera tions Research, Vol. 13, No. 5, May 1986, pp. 533549. doi:10.1016/03050548(86)900481 [21] M. Goto, H. Hashiguchi, T. Nishi mura and R. Oka , “R WC Music Database: Database of CopyrightCleared Musical Pieces and Instrument Sounds for Research Purposes,” Transactions of IPSJ, in Japanese, Vol. 45, No. 3, March 2004, pp. 728738 Copyright © 2011 SciRes. JIS
