Paper Menu >>
Journal Menu >>
Journal of Information Security, 2011, 2, 59-68 doi:10.4236/jis.2011.22006 Published Online April 2011 (h t tp : // ww w .scirp.o rg/journal /j is) Copyright © 2011 SciRes. JIS An Authentication Method for Digital Audio Using a Discrete Wavelet Transform Yasunari Yoshitomi, Taro Asada, Yohei Kinugawa, Masayoshi Tabuse Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Kyoto, Japan E-mail: yoshitomi@kpu.ac.jp Received November 11, 2010; revised February 15, 2011; accepted February 26, 2011 Abstract Recently, several digital watermarking techniques have been proposed for hiding data in the frequency do- main of audio files in order to protect their copyrights. In general, there is a tradeoff between the quality of watermarked audio and the tolerance of watermarks to signal processing methods, such as compression. In previous research, we simultaneously improved the performance of both by developing a multipurpose opti- mization problem for deciding the positions of watermarks in the frequency domain of audio data and ob- taining a near-optimum solution to the problem. This solution was obtained using a wavelet transform and a genetic algorithm. However, obtaining the near-optimum solution was very time consuming. To overcome this issue essentially, we have developed an authentication method for digital audio using a discrete wavelet transform. In contrast to digital watermarking, no additional information is inserted into the original audio by the proposed method, and the audio is authenticated using features extracted by the wavelet transform and characteristic coding in the proposed method. Accordingly, one can always use copyright-protected original audio. The experimental results show that the method has high tolerance of authentication to all types of MP3, AAC, and WMA compression. In addition, the processing time of the method is acceptable for every- day use. Keywords: Authentication, Audio, Copyright Protection, Tolerance to Compression, Wavelet Transforms 1. Introduction Recent progress in digital media technology and distribu- tion systems, such as the Internet and cellular phones, has enabled consumers to easily access, copy, and modify di- gital content, such as electric documents, images, audio, and video. Therefore, techniques to protect the copyrights for digital data and prevent unauthorized duplication or tampering are urgently needed. Digital watermarking (DW) is a promising method of copyright protection for digital data. Several studies have investigated audio DW [1-12]. Two important properties of audio DW are inaud ibilit y o f DW-introd uced d istor tio n, and robustness to signal processing methods, such as compression. In addition to these properties, the data rate and complexity o f the DW have attracted attention when discussing the performance of a DW. We have attempted to develop a method in which 1) the DW can be sufficiently extracted from the watermar- ked audio, even after compression, and 2) the quality of the audio remains high after embedding the DW. How- ever, there is generally a tradeoff between these two pro- perties. Therefore, we focus on this tradeoff and attempt to overcome this critical difficulty by optimizing the po- sitions of the DW in the frequency domain. Recently, di- gital audio distributed over the Internet and cellular phone systems is often modified by compression, which is one of the easiest and most effective ways to defeat a DW without significantly deteriorating the quality of the au- dio. In previous research, we simultaneously improved both the extraction performance of the DW and the quality of the DW-contained audio by developing a multipurpose optimization problem for deciding the positions of the DW in the frequency domain and obtaining a near-opti- mum solution for the problem using a discrete wavelet transform (DWT) and a genetic algorithm (GA) for reali- zing high tolerance to MP3 compression, which is the most popular compression technique [13,14]. Our method enabled us to embed the DW in an almost optimal man- ner within any digital audio. However, obtaining the near-optimum solution was very time consuming. In the Y. YOSHITOMI ET AL. Copyright © 2011 SciRes. JIS 60 present study, to overcome this issue essentially, we have developed an authentication method for digital audio to protect the copyrights. In contrast to the DW, no addi- tional information is inserted into the original audio by the proposed method, and the digital audio is authenti- cated using features extracted using the DWT and char- acteristic coding of the proposed method. This paper presents an analysis of the performance of the method. 2. Wavelet Transform The original audio data 0 k s , which is used as the level-0 wavelet decomposition coefficient sequence, where k denotes the element number in the data, are decompo- sed into the multi-resolution representation (MRR) and the coarsest approximation by repeatedly applying the DWT. The wavelet decomposition coefficient sequence j k s at level j is decomposed into two wavelet de- composition coefficient sequences at level 1j by using (1) and (2): 12 j j knkn n s ps (1) 12 j j knkn n wqs (2) where k p and k q denote the scaling and wavelet se- quences, respectively, and 1j k w denotes the develop- ment coefficient at level 1j. The development coeffi- cients at level J are obtained using (1) and (2) itera- tively from0j to 1jJ. Figure 1 shows the pro- cess of multi-resolution analysis by DWT. In the present study, we use the Daubechies wavelet for the DWT, according to the references [14,15]. As a result, we obtain the following relation b etween k p and k q: 1 1k kk qp (3) We select the Daubechies wavelet because we com- pared the results by the proposed method with those by our retorted method [13,14], where the Daubechies wa- velet was used for the DW. Figure 1. Multi-resolution analysis by the DWT. 3. Proposed Authentication Algorithm It is known that the histogram of the wavelet coefficients of each domain of the MRR sequences has a distribution which is centered at approximately 0 when the DWT is performed on audio data, as shown in Figure 2 [13]. In the present study, the above phenomenon is exploited for the authentication of the audio signal. The procedure is described below. 3.1. Selective Coding 3.1.1. Setting of Parameters For the coding of the audio, we obtain the histogram of the wavelet coefficients V at the selected level of an MRR sequence (see Figure 3). Like the DW techniques for images [15,16] and digital audio [13,14], we set the following coding parameters: The values of Th minus and ()Th plus (see Fig- ure 3) are chosen such that the non-positive wavelet co- efficients (m S in total frequency) are equally divided into two groups by Th minus, and th e positive wav elet coefficients ( p S in total frequency) are equally divided into two groups by ()Th plus. Next, the values of 1T, 2T, 3T, and 4T, the parameters for controlling the authentication precision, are chosen to satisfy the fol- lowing conditions : 1) 12034TTh minusTTThplusT. 2) The value of 1T S, the number of wavelet coeffi- cients in 1,TThminus, is equal to 2T S, the number of wavelet coefficients in ,2Th minusT . In short, 12TT SS . 3) The value of 3T S, the number of wavelet coeffi- cients in 3,TThplus , is equal to 4T S, the number ofwavelet coefficients in ,4Th plusT. In short, 34TT SS . Figure 2. Histogram of the wavelet coefficients of an MRR sequence at level 3 (jazz) [13]. Y. YOSHITOMI ET AL. Copyright © 2011 SciRes. JIS 61 Figure 3. Schematic diagram of the histogram of MRR wa- velet coefficients. 4) 13TmT p SS SS. In the present study, the values of both 1Tm SS and 1Tp SS are set to 0.3, which was determined experi- mentally. 3.1.2. Domain Segmentation in a Wavelet Coefficient Histogram In preparation of a coding for authentication, the proce- dure separates the wavelet coefficients V of an MRR sequence into five sets (hereinafter referred to as A, B, C, D, and E), as shown in Figure 4, using the following criteria: ,1 SC A VV VVT , ,1 2 SC BVVVTVT , ,2 3 SC CVVVT VT , ,3 4 SC DVVVTVT , ,4 SC EVVVT V , where SC V is the set of wavelet coefficients in the ori- ginal audio file. 3.1.3. Selective Coding Algorithm The wavelet coefficients of an MRR sequence are coded according to the following rules, in which i V denotes one of wavelet coefficients: When i VC, flag i f is set to be 1, and bit i b is set to be 0. When i VAE, flag i f is set to be 1, and bit i b is set to be 1. When i VBD , flag i f is set to be 0, and bit i b is set to be 0.5. For the authentication of the digital audio, we use a Figure 4. Five sets (A, B, C, D, and E) are described by a histogram of wavelet coefficients V of an MRR sequence for the assignment of a bit. code C (hereinafter referred to as an original code), which is the sequence of i b defined above. For the co- ding and authentication, we assign a sequence number and a flag for each wavelet coefficient. The flag 1 i f for a i V means that the i V is assigned a bit (i b=0 or 1) for a coding. The flag 0 i f for a i V provides that the i V is not assigned a bit of 0 or 1: i b is externally set to be 0.5 as an arbitrary constant and the value of i b does not influence the performance of the proposed method described in Section 3.2. The exclusion of all i V be- longing to the sets B and D, where the magnitude of the i V are intermediate, from the objects for coding is a no- vel feature of the present study. 3.2. Authentication 3.2.1. Setting of Parameters We authenticate not only an original digital audio file but also a signal-processed version. Compression, one exam- ple of signal processing, is often applied to digital audio for the purposes of distribution via the Internet or for saving on a computer. Through the same procedure as described in Section 3.1, we applied the DWT to digital audio and obtained a histogram of wavelet coeffcients V at the same level of the DWT as that of the coding for the original audio file, which is described in Section 3.1. Then, we set the authentication parameters as fol- lows: The values of Th minus and Th plus (see Fig- ure 5) are chosen such that the non-positive wavelet co- efficients (m S in total frequency) are equally divided into two groups by Th minus , and the positive wave- let coefficients ( p S in total frequency) are equally di- vided into two groups by Th plus . Next, the values of 1T , 2T , 3T , and 4T , the parameters for control- ling the authentication precision, are chosen to satisfy the Y. YOSHITOMI ET AL. Copyright © 2011 SciRes. JIS 62 following conditions: 1) 12034TTh minusTTThplusT . 2) The value of 1T S, the number of wavelet coeffi- cients in 1,TTh minus , is equal to 2T S, the number of wavelet coefficients in ,2Th minusT . In short, 12TT SS . 3) The value of 3T S, the number of wavelet coeffici- ents in 3,TThplus , is equal to 4T S, the number of wavelet coefficients in ,4Thplus T . In short, 34TT SS . 4) 13TmT p SS SS . In the present study, the values of both 1Tm SS and 3Tp SS are set to be 0.3, the same as the settings used for the coding for the original audio file, which is de- scribed in Section 3.1. 3.2.2. Domain Segmentation in a Wavelet Coefficient Histogram In the preparation of a coding for authentication, the pro- cedure separates the wavelet coefficients V of an MRR sequence into three sets (hereinafter referred to as F, G, and H), as shown in Figure 5, using the following crite- ria: , AC F VVVVTh minus , , AC GV VVThminusVThplus , , AC H VVVTh plusV , where AC V is the set of wavelet coefficients of the a target audio file for making the code for authentication. Figure 5. Three sets (F, G, and H), indicated on the histo- gram, of MRR wavelet coefficients used for the authentica- tion. 3.2.3. Authentication Algorithm The wavelet coefficients of an MRR sequence are coded according to the following rules, in which i V denotes one of wavelet coefficients: When 1 i f and i VG , bit i b is set to be 0. When 1 i f and i VFH , bit i bis set to be 1. When 0 i f , bit i b is set to be 0.5. When 0 i f , i b is externally set to be 0.5 as an ar- bitrary constant and the value of i b does not influence the performance of the proposed method described be- low. For the authentication of the digital audio, we use the code C (hereinafter referred to as an authentication code), which is the sequence of i b defined above. The authentication ratio A R (%) is defined by the follow- ing: 1 1 100 1 N iii iN i i f bb AR f (4) where N is the number of wavelet coefficients assign- ed flags in the coding for the original audio file, which is described in Section 3.1. According to (4), the values of neither i b nor i b influence the value of A R in the case that 0 i f , which provides that the corresponding i V is not assigned a bit of 0 or 1 for the coding for the original audio file. For using the proposed method, we need store flags i f and an original code C of each audio file whose copy right we should protect. In calculating (4) for the authentication of an original audio file, we do not use an original audio file but the flags i f and the original code C of the original audio file. 4. Experiment In this section, we describe computer experiments and their results for evaluating the performance of the propo- sed method. 4.1. Method The experiment was performed in the following compu- tational environment: the personal computer was a Dell Dimension 8300 (CPU: Pentium IV 3.2 GHz; main me- mory: 2.0 GB); the OS was Microsoft Windows XP; the development language was Microsoft Visual C++ 6.0. Five music audio files, which were composed of the first entries in the five genre categories: classical, jazz, popular, rock, and hiphop in the music database RWC for research purpose [17], were copied from CDs onto a personal computer as WAVE files with the following specifications: 44.1 kHz, 16 bits, and monaural. For each Y. YOSHITOMI ET AL. Copyright © 2011 SciRes. JIS 63 music audio file selected from the database, one 10-se- cond clip of music audio (hereinafter referred to as an original music audio clip) was extracted starting at 1 mi- nute from the beginning of the audio file and saved on a personal computer. In addition, for each of the five music audio files mentioned above, several 10-second audio clips were extracted by shifting the start-time 1 second at a time from the beginning of the audio file and were saved on a personal computer for use in evaluating the performance of the proposed method. For the purpose of evaluating the tolerance of authentication to compression, MP3, AAC, and WMA compression systems were each used to compress the original music audio clip to bit rates of 64, 96, and 128 kbp s. The process of the experi- ment was as follows: obtain the code of the original WAVE file, compress the file by MP3, AAC, and WMA, and then convert the compressed files into the WAVE files used for the authentication. For the DWT, we use Daubechies wavelets. Level 8 was chosen as the standard for the DWT based on an analysis of preliminary experiments, and this level was used for most of the experiments. The influence of the level of the DWT on the authentication ratio was also analyzed as part of the experiments. Instead of the Dell Dimension 8300 (CPU: Pentium IV 3.2 GHz; main memory: 2.0 GB), Dell Dimension DXC051 (CPU: Pentium IV 3.0 GHz, memory: 1.0 GB) is used only for the comparison with the reported study [13,14]. 4.2. Results and Discussion 4.2.1. Au thenticatio n Pr ocess Table 1 illustrates the process of authentication of audio clips. The jumps in the wavelet coefficient number, such as from 573 to 578, indicate that the intervening wavelet coefficients belong to either the set B or D, which are out of assignment of a bit to 0 or 1 for the coding for the ori- ginal music audio clip. The authentication ratio A R de- fined by (4) was 6 7100 in the case of Table 1, where the bits of the music audio clip after MP3 com- pression were equal to those of the original music except for wavelet coefficient number 57 9. 4.2.2. Robustness to Comp ression Whenever we applied the proposed method to the five original music audio clips, the authentication ratio was 100%. When we applied it to several music clips com- pressed by MP3, AAC, and WMA, the authentication ratio was at or near 100% (Table 2). 4.2.3. Authenticatio n Ra ti os for Other Non-Signal-Processed Music Audio Clips The purpose of authentication is to protect the copyright Table 1. Authentication process (hiphop). Wavelet coeffi- cients No.Original After MP3 compression (128kbps) Bit correspondence set bit set bit 572 C 0 G 0 Yes 579 C 0 F 1 No 580 C 0 G 0 Yes 584 C 0 G 0 Yes 588 E 1 H 1 Yes 589 C 0 G 0 Yes 590 C 0 G 0 Yes Table 2. Authentication tolerance to compression (%). Compression MethodBit rate (kbps) ClassicalJazz Popular RockHiphop 128 99.86 99.86 99.86 100 99.57 96 99.86 100 99.86 100 99.86 MP3 64 99.72 99.72 99.86 100 98.28 128 100 100 100 100 97.55 96 99.86 100 99.86 100 100 AAC 64 100 100 99.14 99.7299.00 128 100 100 100 100 100 96 100 100 100 100 100 WMA 64 100 99.86 100 100 100 (%) on audio data. When the music audio file targeted for being authenticated was different from that used for ma- king the code of the original music audio clip, the au- thentication ratios A R defined by (4) were about 50% (more precisely, they fell in the range 44.09 to 55.62%), which was about half of the authentications ratios when authenticating the same clip as the original music audio clip (100% in all cases in this experiment; see Table 3). An authentication ratio of 50% corresponds to the value in the case that randomly generated bits are used for i b and/or i b in (4). Accor ding ly, th e pr oposed meth od dis- tinguishes an original music audio clip from each of the other four used in this experiment. Using the original code obtained from the original mu- sic audio clip, which was the 10-second clip extracted staring at 1 minute from the beginning of each of the five music audio files, we calculated the authentication ratio to the 10-second clips extracted by shifting the start-time for the clip 1 second at a time. For each of the original music audio clips, the authentication ratio was 100% when an original code was used as the authentication code (Fi g u re 6 ). In Figure 6, the point 60 seconds on the horizontal axis corresponds to the case that the original code is used as the authentication code. Not including these cases, the authentication ratio for jazz, popular, and rock music audio fell mostly in the 40 to 60% range. In contrast, the authentication ratio for classical and hiphop varied according to the start time. Not including the case of using the original code for the authentication code, the highest authentication ratio was 93.95%, which was ob- Y. YOSHITOMI ET AL. Copyright © 2011 SciRes. JIS 64 Table 3. Authentication ratio (%) in all combinations of original and authentication. Original Clas- sical Jazz Popular RockHi- phop Classical 100 44.2247.98 52.8953.61 Jazz 55.62 100 49.86 51.0146.54 Popular 44.09 53.89100 49.5752.88 Rock 47.69 50.8649.42 100 48.13 Authenti- cation Hiphop 48.41 50.4347.55 51.44100 (%) Figure 6. Authentication ratios using clips shifted 1 second at a time for each of the five selected audio clips. served for hiphop. Accordingly, the threshold of the au- thentication ratio for judging authentication of an origi- nal music audio clip should be about 95%. As the au- thentication ratios to music clips extracted from the mu- sic audio files, from which the original music audio clip were obtained, stayed under 95% (again, excluding the cases of using the identical clip), we conclude that the probability of getting an authentication ratio above 95% would be small if we applied the proposed method to other music selected from the database. In other words, we propose that music audio be judged as authenticated when the file gives an authentication ratio of 95% or higher for a certain clip taken from a music audio file. When we used 95% as a threshold for the authentication ratio, both the false negative and positive rates for the authentication of the music audio clip were zero in the both cases shown in Table 3 and Figure 6. 4.2.4. Influence of DWT Level on Authentication Ratio All authentication ratios described above were obtained using a DWT at level 8. The tolerances of the authentica- tion ratio to signal processing by MP3, AAC, and WMA at DWT levels of 2 to 8 with bit rates of 128, 96, and 64 kbps are shown for each bit rate in Tables 4-6, respecti- vely. The authentication ratio does not noticeably change at bit rates of from 64 to 128 kbps. The authentication ratio tends to be slightly higher with increases in the DWT level of the original coding, which is the same as that of the authentication coding. For DWT levels of 7 or 8, the authentication ratio exceeds 95% for all settings of MP3, AAC, and WMA compression tested. The lowest authentication ratio, 94.57%, occurred for DWT level 6 applied to the hiphop audio clip compressed by AAC with a bit rate of 128 kbps. The number of data of the original music audio clip, which is treated as the amount of data at DWT level 0, was 441,000. The number of wavelet coefficients of MRR sequences was reduced by half for an increase of DWT level by one, meaning that the number of 0 or 1 bits in both the original and the au- thentication coding was also reduced by half. 4.2.5. Comparison with Watermarking There is generally a tradeoff between 1) the tolerance of the DW to signal processing, such as compression, and 2) the quality of the music audio after embedding the DW. In other words, to improve the first property tends to cau- se a deterioration of the second property. We had over- come this critical difficulty of the DW by optimizing the positions of the DW in the frequency domain [13,14], [18-20]. However, it took much time to get the condition for embedding the DW by the reported method. Figure 7 shows the relationship between the quality of music audio and the detection rate of the DW after MP3 compression, using the jazz clip as the original music au- Y. YOSHITOMI ET AL. Copyright © 2011 SciRes. JIS 65 Table 4. Authentication ratio (%) of music audio com- pressed by MP3, AAC, and WMA at DWT levels of 2 to 8 with a bit rate of 128 kbps. (1) Classical Signal processing DWT level MP3 AAC WMA 2 99.07 99.41 99.99 3 99.67 99.69 100 4 99.99 99.9 100 5 99.98 99.98 100 6 100 100 100 7 100 100 100 8 99.86 100 100 (2) Jazz Signal processing DWT level MP3 AAC WMA 2 98.63 99.33 99.95 3 99.64 99.8 100 4 99.97 99.99 100 5 100 100 99.98 6 100 100 100 7 100 100 99.93 8 99.86 100 100 (3) Popular Signal processing DWT level MP3 AAC WMA 2 92.78 95.08 99.95 3 95.05 98.5 100 4 97.37 99.7 99.99 5 98.53 99.98 100 6 99.71 100 100 7 100 100 100 8 99.86 100 100 (4) Rock Signal processing DWT level MP3 AAC WMA 2 94.2 95.88 99.81 3 96.72 98.79 99.98 4 98.89 99.85 100 5 99.64 99.98 100 6 100 100 99.96 7 100 100 100 8 100 100 100 (5) Hiphop Signal processing DWT level MP3 AAC WMA 2 95.65 61.31 99.6 3 96.27 72.89 99.69 4 96.85 84.3 99.69 5 98.19 91.64 99.89 6 99.67 94.57 100 7 99.93 96.67 100 8 99.57 97.55 100 (%) Table 5. Authentication ratio (%) of music audio com- pressed by MP3, AAC, and WMA at DWT levels of 2 to 8 with a bit rate of 96 kbps. (1) Classical Signal processing DWT level MP3 AAC WMA 2 95.95 98.76 99.96 3 96.92 99.08 99.99 4 97.89 99.48 100 5 98.53 99.64 100 6 99.53 99.78 100 7 99.71 100 100 8 99.71 99.86 100 (2) Jazz Signal processing DWT level MP3 AAC WMA 2 96.48 98.62 99.97 3 98.42 99.39 100 4 99.79 99.95 100 5 100 99.98 99.98 6 100 100 100 7 100 100 99.93 8 100 99.86 100 (3) Popular Signal processing DWT level MP3 AAC WMA 2 85.61 89.7 99.62 3 94.74 94.7 99.9 4 97.43 98.4 99.85 5 98.79 99.42 99.98 6 99.89 100 100 7 99.86 100 100 8 99.86 100 100 (4) Rock Signal processing DWT level MP3 AAC WMA 2 89.79 92.56 99.33 3 95.37 96.3 99.78 4 98.58 98.94 99.98 5 99.69 99.8 100 6 99.96 99.89 99.96 7 100 100 100 8 100 100 100 (5) Hiphop Signal processing DWT level MP3 AAC WMA 2 92.76 94.86 99.46 3 94.13 96.16 99.54 4 95.38 97.43 99.6 5 97.01 98.84 99.84 6 98.91 99.67 100 7 99.57 99.93 100 8 99.86 100 100 (%) Y. YOSHITOMI ET AL. Copyright © 2011 SciRes. JIS 66 Table 6. Authentication ratio (%) of music audio com- pressed by MP3, AAC, and WMA at DWT levels of 2 to 8 with a bit rate of 64 kbps. (1) Classical Signal processing DWT level MP3 AAC WMA 2 98.18 98.08 99.88 3 98.87 98.51 99.96 4 99.54 99 99.98 5 99.91 98.73 100 6 99.96 99.13 100 7 99.93 99.42 100 8 99.86 100 100 (2) Jazz Signal processing DWT level MP3 AAC WMA 2 92.41 97.23 99.85 3 95.45 98.45 100 4 98.39 99.54 100 5 99.93 99.95 99.98 6 100 99.89 100 7 100 99.86 99.93 8 99.71 100 100 (3) Popular Signal processing DWT level MP3 AAC WMA 2 73.87 81.86 96.73 3 87.94 90.43 99.44 4 95.21 96.02 99.42 5 98.26 97.77 99.84 6 99.78 98.8 100 7 100 99.57 100 8 99.86 99.14 100 (4) Rock Signal processing DWT level MP3 AAC WMA 2 82.67 88.43 96.12 3 91.55 93.67 99.14 4 97.09 97.56 99.69 5 99.4 98.91 99.95 6 99.93 99.17 99.96 7 100 99.35 100 8 100 99.71 100 (5) Hiphop Signal processing DWT level MP3 AAC WMA 2 82.13 92.95 97.86 3 86 94.35 98.85 4 89.31 95.82 98.78 5 92.62 96.7 99.31 6 97.43 97.9 99.86 7 98.48 98.26 99.86 8 98.27 98.99 99.86 (%) Figure 7. Relationship between sound quality after embed- ding the DW and detection rate of the DW [13]. dio clip and 96-kbps MP3 compression [13]. The same original music audio clip was also used in the present ex- periment. In order to achieve a high detection rate of the DW and high quality of the original music audio clip af- ter embedding the DW, the reported method using a ge- netic algorithm was effective, as shown in Figure 7. In the present study, the authentication ratio for the same original music audio clip as that used for getting the re- sults of Figure 7 was 100%, and a deterioration in the quality of the original music audio clip did not occur, which corresponds to an infinite value on the horizontal axis shown in Figure 7. Moreover, it took 2.41 104 to 3.20 104 s and 1.59 102 to 1.85 103 s (with the personal computer referred to as PC2), respectively, to embed the DW using as the formula of the optimization problem the original problem and the partial problem (which had a much smaller sear- ch space) [14], while it took 2.05 10−1 to 2.10 10−1 s (with the personal computer referred to as PC2) and 2.03 10−1 to 2.19 10−1 s (with the personal computer re- ferred to as PC1) for one coding for an original music audio clip in the present study (Tab le 7). In th e reported study [13,14], the experiment was performed in the fol- lowing computation environment: the personal computer was a Dell Dimension DXC051 (CPU: Pentium IV 3.0 GHz; main memory: 1.0 GB), which is referred to as PC2 in Table 7; the OS was Microsoft Windows XP; the development language was Microsoft Visual C++ 6.0. The average time for one coding for an original music audio clip was less than 10−5 times that to embed the DW using as the formula of the optimization problem the original problem, and less than 10−3 times that to embed the DW using as the formula of the optimization problem the partial problem in the reported study. In addition, no deterioration in quality of the original music audio clip ever occurred using the proposed method. These two factors strongly suggest that the proposed method is far superior to th e re ported method. Y. YOSHITOMI ET AL. Copyright © 2011 SciRes. JIS 67 Table 7. Comparison of time(s) to obtain a coding of the proposed method or to embed the DW using as the formula of the optimization problem the original problem and the partial problem of the reported study [14]. DW Coding Original problem Partial problem PC1 PC2 PC2 PC2 Classical 2.19 × 10−1 2.05 × 10−1 3.20 × 104 1.04 × 103 Jazz 2.03 × 1 0 −1 2.06 × 10−1 2.52 × 104 5.12 × 102 Popular 2.19 × 10−1 2.10 × 10−1 2.41 × 104 1.85 × 103 Rock 2.19 × 10−1 2.08 × 10−1 2.44 × 104 2.02 × 1 02 HipHop 2.19 × 10−1 2.08 × 10−1 2.78 × 104 1.59 × 102 Average 2.16 × 10−1 2.07 × 10−1 2.67 × 104 7.5 3 × 102 PC1: Dell Dimension 8300 (CPU: Pentium IV 3.2 GHz; main memory: 2.0 GB); PC2: Dell Dimension DXC051 (CPU: Pentium IV 3.0GHz; main memory: 1.0 GB). 5. Conclusions We have developed an authentication method for music audio using a DWT. When we applied this method to five original music audio clips, the authentication ratio was 100%. Moreover, for music audio data compressed by MP3, AAC, or WMA, the authentication ratio was always at or near 100%. We used flags for distinguishing the wavelet coefficients used for storing a 0 or 1 bit of the original and authentication coding from other coefficien- ts. The method never deteriorated the quality of the orig- inal music audio because it does not change it. When a level 8 DWT was used, which was the standard in this experiment, the mean time for the coding for the original music audio clip was 1 2.16 10 s and that for the au- thentication was 1 2.22 10 s for a 10-second original music audio clip. We propose that a music audio file should be judged to be authenticated when the file gives a 95% or higher authentication ratio for a certain clip taken from the music audio file. For using the proposed method, we need to store in a data base 1) flags used for selective coding, and 2) an original code of each audio file whose copy right we should protect. In calculating the authentication ratio for the authentication of an original audio file, we do not need an original audio file but 1) the flags, and 2) the original code of the original audio file. 6. References [1] D. Kirovski and H. S. Malvar, “Spread-Spectrum Water- marking of Audio Signals,” IEEE Transactions on Signal Processing, Vol. 51, No. 4, 2003, pp. 1020-1033. doi:10.1109/TSP.2003.809384 [2] K. Yeo and H. J. Kim, “Modified Patchwork Algorithm: A Novel Audio Watermarking Scheme,” IEEE Transac- tions on Speech and Audio Processing, Vol. 11, No. 4, 2003, pp. 381-386. [3] S. Wu, J. Huang, D. Huang and Y. Q. Shi, “Efficiently Self-synchronized Audio Watermarking for Assured Au- dio Data Transmission,” IEEE Transactions on Broad- casting, Vol. 51, No. 1, 2005, pp. 69-76. doi:10.1109/TBC.2004.838265 [4] X. Y. Wang and H. Zhao, “A Novel Synchronization Invariant Audio Watermarking Scheme Based on DWT and DCT,” IEEE Transactions on Signal Processing, Vol. 54, No. 12, 2006, pp. 4835-4840. doi:10.1109/TSP.2006.881258 [5] S. Xiang and J. Huang, “Histogram-based Audio Water- marking against Time-Scale Modification and Cropping Attacks,” IEEE Transactions on Multimedia, Vol. 9, No. 7, November 2007, pp. 1357-1372. doi:10.1109/TMM.2007.906580 [6] S. Kirbiz, A. N. Lemma, M. U. Celik and S. Katzenbeis- ser, “Decode-Time Forensic Watermarking of AAC Bit- streams,” IEEE Transactions on Information Forensics and Security, Vol. 2, No. 4, 2007, pp. 683-696. doi:10.1109/TIFS.2007.908194 [7] D. J. Coumou and G. Sharma, “Insertion, Deletion Codes with Feature-Based Embedding: A New Paradigm for Watermark Synchronization with Applications to Speech Watermarking,” IEEE Transactions on Information Fo- rensics and Se curity, Vol. 3, No. 2 , 2008, pp. 153-165. doi:10.1109/TIFS.2008.920728 [8] S. Xianga, H. J. Kimb and J. Huanga, “Audio Water- marking Robust against Time-Scale Modification and MP3 Compression,” Signal Processing, Vol. 88, No. 10, 2008, pp. 2372-2387. doi:10.1016/j.sigpro.2008.03.019 [9] X. Y. Wang, P. P. Niu and H. Y. Yang, “A Robust, Digi- tal-audio Watermarking Method,” IEEE Multimedia, Vol. 16, No. 3, 2009, pp. 60- 69. doi:10.1109/MMUL.2009.44 [10] N. K. Kalantari, M. A. Akhaee, S. M. Ahadi and H. Amindavar, “Robust Multiplicative Patchwork Method for Audio Watermarking,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 17, No. 6, 2009, pp. 1133-1141. [11] X. Y. Wanga, P. P. Niub and H. Y. Yangb, “A Robust Digital Audio Watermarking Based on Statistics Charac- teristics,” Pattern Recognition, Vol. 42, No. 11, 2009, pp. 3057-3064. doi:10.1016/j.patcog.2009.01.015 [12] K. Yamamoto and M. Iwakiri, “Real-Time Audio Wa- termarking Based on Characteristics of PCM in Digital Instrument,” Journal of Information Hiding and Multime- dia Signal Processing, Vol. 1, No. 2, 2010, pp . 59-71. [13] S. Murata, Y. Yoshitomi and H. Ishii, “Optimization of Embedding Position in an Audio Watermarking Method Using Wavelet Transform,” Autumn Research Presenta- tion Forums of ORSJ, Japanese, October 2007, pp. 210-211. [14] S. Murata, “Optimization of Embedding Position in an Audio Watermarking Method Using Wavelet Trans- form,” Master’s Thesis, Osaka University, Suita, Japa- nese, 2006, pp. 53. Y. YOSHITOMI ET AL. Copyright © 2011 SciRes. JIS 68 [15] D. Inoue and Y. Yoshitomi, “Watermarking Using Wavelet Transform and Genetic Algorithm for Realizing High Tolerance to Image Compression,” Journal of the IIEEJ, Vol. 38, No. 2, March 2009, pp. 136-144. [16] M. Shino, Y. Choi and K. Aizawa, “Wavelet Domain Digital Watermarking Based on Threshold-Variable De- cision,” Technical Report of IEICE, DSP2000-86, Japa- nese, Vol. 100, No. 325, September 2000, pp. 29-34. [17] M. Goto, H. Hashiguchi, T. Nishimura and R. Oka, “RWC Music Database: Database of Copyright-Cleared Musical Pieces and Instrument Sounds for Research Pur- poses,” Transactions of IPSJ, Japanese, Vol. 45, No. 3, March 2004, pp. 728-738. [18] M. Tanaka and Y. Yoshitomi, “Optimization Problem for Embedding Position in an Audio Watermarking Based on Logarithmic Amplitude Modification for Realizing High Tolerance to MP3 Compression,” Autumn Research Presentation Forums of ORSJ, Japanese, September 2006, pp. 70-71. [19] M. Tanaka and Y. Yoshitomi, “Digital Audio Water- marking Method with MP3 Tolerance Using Genetic Al- gorithm,” Proceedings of 11th Czech-Japan Seminar on Data Analysis and Decision Making Under Uncertainty, Sendai, September 2008, pp. 81-85. [20] R. Tachibana, “Capacity Analysis of Audio Watermark- ing Based on Logarithmic Amplitude Modification against Additive Noise,” IEICE Transactions on Funda- mentals of Electronics, Communications and Computer Sciences, Japanese, Vol. J86-A, No. 11, November 2003, pp. 1197-1206. |