Journal of Information Security
Vol.4 No.1(2013), Article ID:27544,7 pages DOI:10.4236/jis.2013.41003

Peak-Shaped-Based Steganographic Technique for MP3 Audio

Raffaele Pinardi1, Fabio Garzia1,2, Roberto Cusani1

1Department of Information, Electronics and Telecommunications Engineering Sapienza University of Rome, Rome, Italy

2Wessex Institute of Technology, Southampton, UK


Received September 4, 2012; revised October 12, 2012; accepted October 22, 2012

Keywords: Peak-Shaped Steganography; MP3 Steganography


The aim of this work is the development of a steganographic technique for the MP3 audio format, which is based on the Peak Shaped Model algorithm used for JPEG images. The proposed method relies on the statistical properties of MP3 samples, which are compressed by a Modified Discrete Cosine Transform (MDCT). After the conversion of MP3, it’s possible to hide some secret information by replacing the least significant bit of the MDCT coefficients. Those coefficients are chosen according to the statistical relevance of each coefficient within the distribution. The performance analysis has been made by calculating three steganographic parameters: the Embedding Capacity, the Embedding Efficiency and the PSNR. It has been also simulated an attack with the Chi-Square test and the results have been used to plot the ROC curve, in order to calculate the error probability. Performances have been compared with performances of other existing techniques, showing interesting results.

1. Introduction

Steganography techniques are used to hide secret information in the most common audio/video formats. There are three main different kinds of audio/video steganography [1]:

1) insertion steganography, where the secret message is inserted in the cover object;

2) substitution steganography, where some bits of the cover object are substituted with the bits of the secret message;

3) constructing steganography, where an ad hoc cover object is generated to contain the secret message.

The developed technique is based on LSB steganography, a substitution steganography, that replaces the least significant bit of the audio/video file with the secret message bit. This method is very simple to implement and does not allow the human eye/hear to perceive significant changes in the stego object. In Figure 1 an example is shown: the letter “A” is embedded in the audio samples replacing the least significative bit.

However, this technique has lower resistance to the statistical attacks since with a proper steganalysis it is possible to detect the secret information. To solve this problem, Model Based Steganography can be used [2]. The cover object is divided into two parts, and, to embed the secret information. The first part is the most relevant, and it will not be modified. The second one is less relevant with respect the other and it will contain the secret message. The division is based on the statistical model of the cover object.

After the embedding process, contains the secret information, called . The union between this part and is the stego object.

The purpose of this paper is to present a new steganographic algorithm for the MP3 [3-5] format based on the change, in the Peak Shaped Based for the JPEG [6], of the discrepancy equation, adapting it to vectors and studying the statistical distribution of the MDCT [3-5] coefficients. In the following the analysis of the performance of the proposed algorithm is shown and it is demonstrated that this method does not introduce audible distortion when the signal audio is reproduced. Further, it is demonstrated that this method does not create relevant statistical differences in the samples distribution, showing its suitability for steganographic applications and its robustness to steganographic attacks.

2. The MP3 Format and Compression

The MP3 format was born to have good audio quality and low file size [3,4]. An audio file is first converted into a digital format, with a sampling, and then it is processed with the human psychoacoustic model [3-5].

Figure 1. LSB example: the letter “A” is inserted into an audio file by replacing the least significant bit.

With this model it is possible to delete the frequency that the human ear can’t hear; an algorithm characterized by this properties is called “lossy” because it deletes some information. This happens with a compression, that uses the MDCT [3,5], the Modified Discrete Cosine Transform, described in the following equation:


that is another version of the DCT-II used in the JPEG format, where:


and  is a window (it is possible to choose different kinds of windows).

The audio samples are processed in a MDCT filterbank. The audio sequence is divided in “frame”, each frame contains M samples and is processed as in Figure 2.

3. Peak Shaped Based Steganography and MP3

3.1. Peak Shaped Based Steganography

The Peak Shaped Based (PSB) [6] steganography has been used for the JPEG format. This is a method based on:

Ÿ  LSB steganography;

Ÿ  Least significant bit;

Ÿ  Model Based steganography [2].

The JPEG coefficient are, for first, divided by group, indicated with g(b) [6]:


and by offset, indicated with O(b) [6]:


where b is the JPEG coefficient and .

Figure 2. MDCT filterbank.

The PSB algorithm is based on an assumption, from the properties of the JPEG coefficients statistical distribution processed by the algorithm F5, which is [6]:



where h(b) indicates the histogram of the coefficient b. With this assumption is possible to calculate a probability, called “offset probability”.

Subsequently the coefficients are processed with the “discrepancy”, an operator that allows to calculate the statistical dependence between two closer coefficients. This is defined as follows:


where [6]:


where, as shown in Figure 3, the blocks from 1 to 4 are the neighbors of the 0th. Each block contains 64 DCT-II coefficient, 8*8, and their sequences composing the JPEG image.

3.2. Differences between JPEG and MP3

To apply the PSB algorithm to the MP3 format it is necessary to study the differences between this format and the JPEG standard, in order to identify possible changes. These differences are:

Ÿ  the JPEG uses the DCT-II while the MP3 uses the MDCT;

Ÿ  the JPEG works on blocks; each blocks, or matrix, size is 8*8. Instead, the MP3 works on frame; each frame has dimension equal to 1*1152, that are vectors;

Ÿ  the PSB is based on an assumption from the F5 algorithm, that is used for the JPEG format.

Concerning the first point, it is possible to notice that the PSB works on the coefficients. It is therefore necessary to demonstrate that the DCT-II and the MDCT statistical distributions have the same properties.

Figure 3. Adjacent blocks to the block B0.

Concerning the second point, it is necessary to detect the operations that in the PSB works on matrix and, to apply it on the MP3 format, transforming them in operations that works on vectors.

Concerning the third point, it should be studied the statistical distribution of the MP3 coefficients after the F5-algorithm in order to identify if this distribution has the same properties than the JPEG distribution processed by F5.

3.2.1. MP3 Discrepancy

The JPEG discrepancy works on matrix, as described in paragraph 3.1. It is necessary to modify this operation to enable it to operate on vectors. Considering the MP3 format, and the MP3 frames, it is possible to call one of them as. The discrepancy works on the previous frame and the subsequent. The frames are showed in Figure 4. The mean is done only for 2 frames because 2 frames are taken and each frame contains 576 coefficients.


where is the same as in the JPEG discrepancy.

3.2.2. Statistical Distribution of the MP3 Coefficients

The statistical distribution of the MP3 coefficients, after the compression, is Peak Shaped [7]. This trend can be modeled by the Generalized Gaussian (GG). The GG varies with a parameter, called r, and it is possible to choose different values of r, i.e. r = 2 is a Gaussian, r = 1 is a Laplacian etc. A good approximation for the distribution of the MP3 coefficient has the value r set to 0 or 1,

Figure 4. The frames on which works the MP3 discrepancy.

as shown in Figure 5.

With the parameter r set to 0.5, it is possible to have the best approximation of the statistical distribution. It is possible to choose the value r = 1, the Laplacian distribution, to have a good approximation with low complexity.

The Laplacian distribution is used to approximate the statistical distribution of the JPEG coefficients, as well [8-10].

3.3. The PSB over MP3

Using the previous considerations, it is possible to apply the Peak Shaped Based steganography to the MP3 format. In fact it is possible to utilize the assumption used by this algorithm from the F5 [11] steganography because the MP3 coefficients and the JPEG coefficients have the same statistical distribution. F5 modifies the coefficients, without considering their source.

Having both formats, namely both transformed, the same statistical distribution of the coefficients, the use of different transformed becomes irrelevant to the development of the algorithm.

3.4. The Embedding Process

In the following, the list of steps of the embedding process is reported:

Ÿ  the first step is represented by the analysis of the MP3 statistical distribution;

Ÿ  successively the value of Hg vector that contains the histograms of the MP3 is calculated;

Ÿ  with the Hg values it is possible to exclude the samples that are statistical most significant;

Ÿ  the Hg values allow the calculus, with the algorithm shown in Figure 6, of the offset probability vector, called P;

Ÿ  each frame of the MP3 file, that contains 576 coefficients, is taken and analyzed;

Ÿ  the coefficients b are divided by group, with the g(b) (4), and by offset, with the O(b) (5);

Ÿ  the discrepancy is calculated by means of Equation (9);

Ÿ  with the discrepancy, the vector P and a PRNG, according with the secret key, it is possible to determi-

Figure 5. Statistical distribution of the MP3 coefficients approximated with the GG [7].

Figure 6. Block diagram of the calculus of the offset probability vector P.

nate the coefficients that contain the stego message;

Ÿ  the offset of each coefficient that is possible to modify is changed according to the value of the bit of the secret message, as showed in Table 1.

To extract the secret message the embedding process must be repeated to determinate the coefficients that were modified. The analysis of each offset allows the reconstruction of the secret message.

4. Results

4.1. Performance Parameters

To analyze the performance of a steganographic techni-

Table 1. Offset of each coefficient that is changed.

que three parameters are used:

Ÿ  Embedding capacity;


Ÿ  Embedding efficiency.

4.1.1. Embedding Capacity

The embedding capacity (EC) [7,10] indicates the maximum data size that it is possible to hide in the cover object. It is defined as follows:


4.1.2. PSNR

The Peak to Noise Ratio [10] provides the similarity between the cover object, the original file, and the stego object, the file where the secret message has been hidden. It is defined through the Mean Square Error (MSE):



4.1.3. Embedding Efficiency

Through the Embedding Efficiency (EE) [2,12] it is indicated the average number of bits inserted for each change. It is defined as follows:


4.2. Peak Shaped Based for MP3 Performance

To evaluate the PSB for MP3 performance it is necessary to compare the Efficiency, the Capacity and the PSNR with different kinds of steganographyc algorithms and with the original PSB, that it was implemented for the JPEG format.

4.2.1. JPEG Performance

The embedding capacity and efficiency for the Model Based Steganography for the JPEG format are shown in Table 2.

The PSNR is evaluated for the PSB algorithm for the JPEG format, as shown in Table 3. MB1 and MB2 are two model based techniques used in [2], PSB is the Peak

Table 2. Model based steganography performance [2].

Table 3. PSNR for the JPEG-PSB algorithm [6].

Shaped Steganography used for the JPEG format.

4.2.2. PSB for MP3

To calculate the Embedding Capacity for the MP3-PSB steganography it is necessary to calculate the file size as follows:


A new variable is defined and it is called Ls that is the secret message length. Then the Capacity is:


and the results are shown in Table 4.

The PSNR is evaluated as described in Equation (11) and the results are showed in Table 5.

The Efficiency is calculated by analyzing the number of changes to insert the secret message in the cover object. It is possible to see the performance in Table 6.

4.2.3. MP3 Steganography Comparison

In Tables 7 and 8 the PSNR and the Capacity are illustrated for different kinds of steganographyc algorithms for MP3 format.

4.3. Steganalysys

With the steganalysis [9] it is possible to have a better analysis of the PSB-MP3 performance. This method calculated two probabilities, the False Alarm probability, when a cover object is classified as stego, and the Missed Detection probability, when a stego object is classified as cover. The first one is indicated with the symbol  

Table 4. PSB-MP3 capacity.

Table 5. PSNR for PSB-MP3.

Table 6. Efficiency for the PSB-MP3.

Table 7. Capacity for different algorithm.

Table 8. PSNR for different algorithms.

and the second one with the symbol . The values of these probabilities depend on a threshold, called, that modify the steganalysis system accuracy.

With these probabilities it is possible to calculate other parameters, like the detection probability, , and the error probability :


4.3.1. Chi-Square Test

A steganalytic technique that is possible to use for the PSB-MP3 is the Chi-Square test [12]. Some parameters are calculated with the histograms of the MDCT coefficients probability distribution and the results of the Chi-Square test are compared with the threshold.

One method to calculated the chi-square test is the Zhang-Ping attack [13] that evaluates two variables:



and if it will calculate the chi-square values:


to compare with the threshold.

4.3.2. ROC Curve

This analysis to evaluate the PSB performances is done on a random set of MP3 files. With the comparison with the threshold and the Chi-Square value it is possible to calculate the two probabilities of false alarm and missed detection, shown in Table 9.

With these two probabilities it is possible to graph the ROC curve. This curve indicates the efficiency of the steganalytic method. If the curve is near the first quadrant bisector the steganographyc algorithm is very strong, otherwise the steganalytic method is efficient.

In Figure 7 it is possible to see the ROC curves (solid line) of the PSB algorithm for the MP3. This curve is very smash on the first quadrant bisector (dashed line). This indicates that the steganographyc method is very robust when this steganalytic algorithm is used. When the ROC curve is far from the bisector the steganographic algorithm isn’t very robust or else the steganalytic technique is very efficacious. Instead when this curve is very close to the bisector the technique is very secure, since there is perfect security when the ROC curve is exactly equal to the bisector.

Figure 7. ROC curve for the PSB-MP3.

Table 9. False alarm, missed detection and detection probabilities when the threshold takes different values.

5. Conclusions

A new steganographic algorithm for the MP3 format has been developed by changing, in the Peak Shaped Based for the JPEG, the discrepancy equation, adapting it to vectors and studying the statistical distribution of the MDCT coefficients. The analysis of the performance of this algorithm showed that this method does not introduce audible distortion when the signal audio is reproduced. Further, this method does not create relevant statistical differences in the samples distribution.

The Peak Shaped Based for the MP3 has an high capacity compared to the other algorithms and a good PSNR. In fact the mean Embedding Capacity is equal to 12.75%, higher than the most relevant techniques used for MP3 steganography; the PSNR is equal to 58.21 dB, higher than the PSB for the JPEG.

A steganalytic attack has been simulated to evaluate the robustness of the algorithm. This attack is implemented on the Zhang Ping analysis and on the Chi-Square test. This attack has been adapted to the PSB-MP3 since it was created for the JSteg steganography. By calculating the false alarm probability and the missed detection probability it is possible to draw the ROC curve. The analysis of this curve shows that this attack is not suitable for this steganographic method because the ROC is crushed on the bisector. The error probability, calculated with the ROC curve, tends to 0.5 when the threshold increases; when it takes this values the choice is completely random.

The steganographyc algorithm implemented, as assumed, is resistant to the statistical attacks.


  1. G. Kipper, “Investigator’s Guide to Steganography,” Auerbach Publications, Boca Raton, 2003. doi:10.1201/9780203504765
  2. P. Sallee, “Model-based Steganography,” Springer Verlang, Berlin, 2004.
  3. A. Spanias, T. Painter and V. Atti, “Audio Signal Process And Coding,” John Wiley and Sons, Hoboken, 2007. doi:10.1002/0470041978
  4. U. Zolzer, “Digital Audio Signal Processing,” John Wiley and Sons, Hoboken, 2008. doi:10.1002/9780470680018
  5. J. S. Jacaba, “Audio Compression Using Modified Discrete Cosine Transform: The MP3 Coding Standard,” University of the Philippines, Manila, 2001.
  6. L. Rossi, F. Garzia and R. Cusani, “Peak-Shaped-Based Steganographic Technique for JPEG Images,” EURASIP Journal on Information Security, 2009, Article ID: 382310. doi:10.1155/2009/382310
  7. R. Yu, X. Lin, S. Rahardja and C. C. Ko, “A Statistic Study of the MDCT Coefficient Distribution for Audio,” IEEE International Conference on Multimedia and Expo, Taipei, 30-30 June 2004, pp. 1483-1486.
  8. D. Liu, H. Zhang, M. Polycaropou, C. Alippi and H. He, “Advances in Neural Networks,” Springer Verlag, Berlin, 2011.
  9. R. Bohme, “Advanced Statistical Steganalysis,” Springer, Berlin, 2010.
  10. K. B. Shiva Kumar, K. B. Raja and R. K. Chhotaray, “Sabyasachi Pattanaik, Bit Length Replacement Steganography Based on DCT Coefficients,” International Journal of Engineering Science and Technology, Vol. 2, No. 8, 2010, pp. 3561-3570.
  11. A. Westfeld, “F5-A Steganographic Algorithm,” Springer Verlang, Berlin, 2001.
  12. K. Lee, A. Westfeld and S. Lee, “Category Attack for LSB Steganalysis of JPEG Images,” Springer Verlang, Berlin, 2006.
  13. T. Zhang and X. Ping, “A Fast and Effective Steganalytic Technique against Jsteg-Like Algorithms,” Proceedings of the 2003 ACM Symposium on Applied Computing (SAC), Melbourne, 9-12 March 2003, pp. 307-311.