Paper Menu >>
Journal Menu >>
Circuits and Systems, 2010, 1, 12-17 doi:10.4236/cs.2010.11003 Published Online July 2010 (http://www.SciRP.org/journal/cs) Copyright © 2010 SciRes. CS Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform Michael Tammen1, Mohamed El-Sharkawy2, Hisham Sliman1, Maher Rizkalla1 1Purdue School of Engineering and Technology, Indianapolis, USA 2Egypt Japan University of Science and Technology, Borg Elarab, Alexandria, Egypt E-mail: melshark@iupui.edu Received March 18, 2010; revised April 20, 2010; accepted April 25, 2010 Abstract The Society of Motion Picture and Television Engineers (SMPTE) Standard 421M, commonly known as VC-1, is a state-of-the-art video compression format that provides highly competitive video quality, from very low through very high bit rates, at a reasonable computational complexity. First, this paper presents fast motion compensation methods. The four motion estimation methods examined are fast, three step search, varying diamond, and 2D logarithmic. These methods use less search points than the full spiral scan used in the VC-1 reference software, which allows for faster motion estimation. Second, this paper presents a resid- ual texture based choice of the block size for the Discrete Cosine Transform (DCT). To determine the block size, data is examined after the residual texture has been calculated. This is in contrast to the VC-1 reference software, which uses calculations at the block level to determine the block size. The residual texture of each block is small and uniform, allowing for simplified block choices. Keywords: VC-1, Motion Estimation, Discrete Cosine Transform, Video Compression 1. Introduction VC-1 is a state-of-the-art video compression format that provides highly competitive video quality, from very low through very high bit rates, at a reasonable computational complexity [1-4]. At a high level, VC-1 is similar to oth- er popular video standards since the First Moving Pic- ture Expert Group Standard (MPEG-1). They have sim- ilarities in many different areas, one of which is block- by-block motion compensation. The motion compensa- tion is done using a motion vector from a previously re- constructed frame to determine the displacement. On the decoder side, quantized transform coefficients are en- tropy-decoded, dequantized, and inverse-transformed to produce an approximation of the residual error, which is then added to the motion-compensated prediction to generate the reconstruction [5]. An important feature of VC-1 is adaptive block size transforms for inter-frame coding. Instead of using a fixed block size for every transform, like many previous stan- dards, it has the ability to choose a transform size based on the information in the block or macroblock. VC-1 can choose 8 × 8, 8 × 4, 4 × 8, or 4 × 4 transforms based on the information in the block. The ability to choose a transform size allows VC-1 to deal with areas of conti- nuity and discontinuity with more accuracy and less ringing artifacts than any other video standard [1]. An 8 × 8 block may be transformed by either one 8 × 8 block, two horizontally stacked 8 × 4 s, two vertically stacked 4 × 8 s, or four 4 × 4 blocks. The block sizes can be seen in Figure 1. Another key innovation in VC-1 is the handling of an entire zero block or macroblock. When it happens, it does not send any transform information along, since the in- verse transform of a zero block is going to be zero. Intra frames and intra blocks in predicted frames use 8 × 8 transform by default, meaning no calculations need to be preformed [5]. The way VC-1 signals the transform size is new and unique. It can either be signaled at the frame, macroblock, or block level. For frame level signaling, every block wi- thin the frame will use the same block size. Frame level signaling is useful for low-rate situations to keep the overhead low. If macroblock level signaling is used, then 8 × 84 × 88 × 4 4 × 4 Figure 1. Transform sizes. M. TAMMEN ET AL. Copyright © 2010 SciRes. CS 13 every block in the macroblock (six 8 × 8 blocks) is trans- formed using the same size. Block level signaling is used only for one 8 × 8 block. These different signaling types allow VC-1 to handle both nonstationary data (macro- block and block level signaling) and low-rate situations (frame level signaling) better than previous standards. Currently, VC-1 uses half difference and half sum en- ergies for the transform size decision for a block. It splits the block into four quadrants: left, right, top, and bottom. The left-right half difference, the left-right half sum, the top-bottom half difference, and the top-bottom half sum are then computed. Each of the above values is a running total for all the values in the quadrant. To get the block size, it compares the left-right sum to the left-right difference and determines to chop vertically or not. Then it compares the top-bottom sum and top- bottom difference to determine if it should chop hori- zontally. If both conditions are met a 4 × 4 is used, while if none are met an 8 × 8 is used. Motion estimation (ME) is the process of comparing the macroblock to be coded with all macroblocks within the search area in the reference frame [6-10]. The macro- block in the reference frame which is closest to the macro- block to be coded is chosen as the reference macroblock. Once it is chosen, a motion vector (MV) is assigned. The motion vector provides the coordinates of the best match in an (x, y) form, using the center of the search area as (0, 0). An illustration can be seen below in Figure 2. 2. Motion Estimation Techniques The VC-1 reference software currently uses the full spi- ral search shown in Figure 3 (a search size of 4 is used in the figure) to find the best match. A best match is found by calculating the cost at each location. The spot Figure 2. Motion estimation [11]. with the lowest cost is chosen as the best match. This is currently the only method for motion estimation within VC-1 reference software. It does allow for the search size to be modified but that is all. The first implemented method was the fast motion es- timation. It is based on the H.264 fast motion estimation but is not completely similar. It calculates the cost of each position in a diamond shape. The size of the dia- mond grows or shrinks based on the search size. The pattern is shown below in Figure 4 with a search size of 4. Also, the cost for the center is calculated before this search starts and if no cost in the fast search is lower than the center, then the center is chosen. Second, we implemented a three or four step search. Figure 5 shows the search pattern for this search. First, -4-3-2-10 1 2 3 4 474 75 76 77 78 79 80 81 50 373 44 45 46 47 48 49 26 51 272 43 22 23 24 25 10 27 52 171422189 2 11 28 53 070 41 2071 3 12 29 54 -1 6940196 5 4 13 30 55 -2 68 39 18 17 16 15 14 31 56 -3 67 38 37 36 35 34 33 32 57 -4 66 65 64 63 62 61 60 59 58 Figure 3. Full spiral search. -4-3-2-10 1 2 3 4 4× 3× × 2× × 1× × 0× × -1 × × -2 × × -3 × × -4 × Figure 4. Fast motion estimation. -6-5-4-3-2-1 0 1 2 3 4 5 6 6 2 2 2 5 411 2 1 2 3 3 3 3 2 3 2 3 2 2 1 3 3 3 010 1 -1 -2 -3 -4 11 1 -5 -6 Figure 5. Three step search. M. TAMMEN ET AL. Copyright © 2010 SciRes. CS 14 the cost is computed at the center (0) and eight positions (1) around the center. The positions distance from the center is calculated by 2N-1, where N is 3 if the search size is less than or equal to 12, or 4 if the search size is greater than 12. This allows for better accuracy when the search size is large, because the normal three step search will only be able to search about half of the search area if the search size is greater than 12. The least cost position is chosen (1) and N is decremented. The cost is then cal- culated at eight positions (2) around 1. This again allows for another least cost position to be picked (2) and N is decremented again. Lastly, eight positions (3) have their cost calculated around 2 and the least cost position is chosen as the best match, 3. A four step search would si- mply have one more step and works the same way. Obvi- ously, the best match is not always going to require three or four steps, so if at the end of any step the best cost belongs to the center of the search (0, 1, and 2 for refer- ence) the search terminates. For example, if the cost of 0 is less than the cost at each of the eight positions (1) then the search terminates and 0 is chosen as the best match. Third, a 2D log search was implemented. Figure 6 is an illustration of how the log search works. To determine the distance from the center in which the search is started, d = floor (2 × (log2S – 1)), is computed, where S is the search size. The cost is then calculated at the center (0) and four other positions (1) surrounding it. The position with the least cost is chosen (1) and d remains the same. Next, the cost is computed at the three surrounding posi- tions (2) and if their cost is not lower than the center then d is halved. If d is an odd number, 1 is subtracted from it and then it is halved. The last search we implemented was the varying dia- mond search. This borrows ideas from the three step search, but instead of using eight positions it uses four. The option of a three step or a four step is allowed in this search as well. Figure 7 illustrates what a varying dia- mond search will look like when it uses three steps. The bold positions are those with the lowest cost, and the lower the number, the earlier it is searched. -6 -5 -4 -3 -2 -1 01 2 3 4 5 6 6 5 4 3 1 2 1 0 1 0 1 -1 -2 3 4 -3 2 3 1 3 4 2 -4 3 4 -5 -6 2 Figure 6. 2D log search. -6-5-4-3-2-1 0 1 2 3 4 5 6 6 5 41 3 22 1 02120 1 -1 3 -2 323 -3 3 -4 1 -5 -6 Figure 7. Varying diamond search. 3. Modified Adaptive Block Size Transform To go along with the motion estimation, a modified adap- tive block size transform algorithm is implemented. The current method for choosing a transform size has some drawbacks. Multiple calculations are performed at the block level, which is very costly. Through research, we discovered that the block size is dominated by the residual, which is the prediction error [12]. Using (1), the residual data is computed. The tex- ture that results is small in magnitude and uniform. Compared to the original data, there is a large difference since the original data may not be uniform. Since the data is uniform the block size selection is simplified. We assume that motion estimation has been performed. The algorithm is as follows: 1) Initialize residual texture threshold T8 and T4 for 8 × 8 and 4 × 4 blocks. We perform motion estimation for the current macroblock to get the residual. We assign T8 and T4 to 250 and 50 respectively. 2) Partition the macroblock into 24 (4 × 4) blocks. 3) Calculate the residual block texture of each 4 × 4 block using Equation (1). xxC ij ji nm 3 0 2 3 0 , ,16 1 (1) where m is the number of 8 × 8 blocks and n is 4 × 4 block number in the 8 × 8 block, beginning in the up- per-left and moving right. 4) Sum all the 4 × 4 blocks’ texture in an 8 × 8 block as the 8 × 8 block’s texture. The macroblock texture is the sum of all 6 of the 8 × 8 blocks’ texture. CC n n,mm 3 0 (2) 3 0m m MB CC (3) 5) Create a count of each 4 × 4 residual block whose texture is greater than the threshold T4 and record it in M. TAMMEN ET AL. Copyright © 2010 SciRes. CS 15 count8. Also track the position of each block that is over the threshold in the variable Track_b4. Track_b4 con- tains the value of 1-4 based on which is the most recent block to be over the threshold value. 6) The block size selection is done through the fol- lowing method: a) If count4 = 0, 8 × 8 is chosen for the block size. b) If count4 > 1 and Cm > T8, 4 × 4 is chosen as the block size. c) If track_b4 = 0 or 3 and C1 > = C2 choose 4 × 8, otherwise choose 8 × 4. d) If track_b4 = 1 or 2 and C3 < = C0 choose 8 × 4, otherwise choose 4 × 8. The above procedure is carried out in two different steps in the same subroutine. First, a subroutine is called that partitions the macroblock and carries out all calcula- tions. This includes calculating the residual texture for the macroblock, as well as the sum of all the blocks’ texture. The code then enters a loop, which is executed 6 times, once for each block in the macroblock, where the transform size is selected and the block is transformed. The subroutine where the transform size is selected does not involve any calculations like the standard. It simply gets a pointer to the residual texture results and performs step 6. Not performing any calculations at the block level allows for greater speed. 4. Results The motion estimation techniques and modified adaptive block size transform were implemented on the VC-1 sample encoder. A variety of video clips were chosen to make sure that both changes work on different types of video sequences. Some of the sequences have low detail and low movement, while others have medium detail and low movement or medium movement and low detail. All sequences are run 30 fps, and can be seen in Table 1. The cases covered in Table 1 give varieties of video clips that range from low to high movement and low to high details. More frames are needed for higher move- ment to cover information depth between frames. Table 2 shows the key configurations we used for the encoder. It is important to note that the modified adaptive block size transform will run only if the option ‘BlockType’ is set to ‘Any’. If ‘random’ or ‘iterated’ is set in the option file, then the subroutine is never called by the encoder due to the transform size being set. We tested all se- quences with a search size of 8 and 14, hence both num- bers below. These were two separate tests and will be noted later. All of the other important parameters were set to ‘random’ or ‘iterated’. All sequences were tested to determine the time taken for the motion estimation and adaptive block size trans- form, the signal-to-noise ratio (SNR) for the luminance and chrominance components, and the bitrate. The re- sults of the motion estimation changes are compared against those of the standard. Table 3 summarizes the results with a search size of 8, while Table 4 summarizes the results with a search size of 14. Both tables use per- centages to show how much faster the proposed algo- rithms are than the standard. It is noted that a speed im- provement of up to 38% for search size equals to 14 and 18% for search size equals to 8. Table 1. Video sequences tested. IndexSequence Frames 1 Akiyo (CIF) 300 2 Coastguard (CIF) 300 3 Foreman (QCIF) 300 4 Stefan (CIF) 300 5 Flower (CIF) 250 6 Carphone (QCIF) 90 7 Football (CIF) 90 8 Erik (CIF) 50 Table 2. Key encoder options. Parameter Value Profile: Advanced Level: L0 Frame Pattern: I:P:P:P SearchSize: 8 or 14 DQuant: 1 MVRange: 128 × 64 BlockType: Any Quantizer: Explicit Table 3. Time results with a search size of 8. Percent Saving (%) Sequence FastTSS Diamond 2DLog 1 12.3112.38 13.46 13.04 2 23.8824.55 25.73 24.68 3 15.0415.42 16.11 15.48 4 21.7822.22 22.88 21.72 5 19.7520.80 21.97 21.08 6 10.0311.24 10.67 10.97 7 21.8122.84 23.82 21.95 8 13.8613.77 14.54 14.02 Average 17.3117.90 18.64 17.87 Table 4. Time results with a search size of 14. Percent Saving (%) Sequence FastTSS Diamond 2DLog 1 32.7533.53 34.45 34.39 2 44.1045.60 46.68 46.35 3 33.1734.23 35.04 34.63 4 42.0543.30 44.23 43.56 5 40.6742.68 43.58 43.07 6 27.6428.55 28.87 29.30 7 43.2844.74 45.87 45.03 8 32.1232.85 33.98 33.53 Average 36.9738.18 39.09 38.73 M. TAMMEN ET AL. Copyright © 2010 SciRes. CS 16 Tables 3 and 4 show that the diamond search is the strongest performer followed closely by the 2D log search and three step searches. The fast motion estimation is the worst performer of the group. So strictly from a speed aspect, the diamond search is the best choice. Next, we will examine the signal-to-noise ratio (SNR) results for search sizes of 8 and 14. Table 5 shows the SNR for the reference software. Tables 6 and 7 show the percent degradation in the implemented algorithm for search sizes of 8 and 14 respectively. The negative val- ues in the tables show that the SNR is better in the im- plemented algorithm. The VC-1 encoder in its present state has an overflow issue with some of the sequences. The most reliable data comes from sequences 1, 3, and 6 due to little or no overflow occurring. Sequence 1 is the most reliable, as no overflow occurs during the encoding of this sequence. Therefore, using sequence 1 as the baseline we see that there is a very minimal SNR degradation. But, on aver- age all searches see an improvement in SNR. Lastly, we will look at the bitrates for all of the se- quences. The bitrate numbers can get rather large so we examine the percent degradation numbers. Once again if the number is negative that means the proposed algo- rithm is better than the standard. Tables 8 and 9 show the results below. Aside from sequence 6 there is a very minimal effect on the bitrate. There is a range of both small increases from the standard as well as small decreases, neither of which are above 1% (excluding the 1.79% in sequence 4). As for sequence 6, the bitrate is 9% larger than the stan- dard. The reason of such result will be studied in future work. 5. Conclusions In this paper, we presented four different motion estima- tion techniques and a fast block size selection tool for Table 5. SNR results for the standard with both search sizes. SNR (dB) Search Size 8 Search Size 14 Sequence Y U V Y U V 1 21.31 20.78 17.07 21.31 20.7817.07 2 19.30 25.72 24.88 19.29 25.7124.90 3 15.39 23.81 24.10 15.39 23.8124.10 4 24.89 35.04 36.05 24.84 35.0735.91 5 28.11 22.78 21.37 28.09 23.4522.65 6 10.06 18.51 16.24 10.06 18.5216.25 7 22.34 26.75 26.06 22.34 26.9526.08 8 18.64 20.18 21.50 18.64 20.1821.52 Table 6. SNR results with a search size of 8. Percent Degradation (%) Fast TSS Diamond 2DLog Sequence Y U V Y U V Y U V Y U V 1 0.28 –0.27 –0.16 0.16 –0.11–0.090.20 –0.17–0.16 0.19 –0.16 –0.23 2 –2.48 –2.10 0.76 –2.56–1.04 1.36 –3.30–1.20 1.59 –3.52 –1.71 –1.01 3 0.48 –0.71 –1.17 0.20 –0.46–0.83 0.34 –0.48–0.85 0.33 –0.52 –0.90 4 –5.70 –4.51 –3.79 –6.77 –3.92 –3.93 –6.13 –5.02 –4.30 –5.83 –4.54 –2.91 5 –2.82 0.50 –2.00 –2.92 0.68 0.75 –2.94 0.25 0.73 –2.91 0.80 0.83 6 0.55 0.31 –0.98 0.04 0.70 –0.60 0.13 1.04 –0.96 0.17 0.52 –0.53 7 –0.58 2.15 2.02 –0.88 3.11 –1.43–0.65 1.74 –1.61–0.24 0.96 0.27 8 –4.08 –1.46 –3.91 –3.99 –1.47 –3.56 –3.74 –0.37 –2.69 –3.81 0.11 –2.46 Average –1.79 –0.76 –1.15 –2.09 –0.31 –1.04 –2.01 –0.53 –1.03 –1.95 –0.57 –0.87 Table 7. SNR results with a search size of 14. Percent Degradation (%) Fast TSS Diamond 2DLog Sequence Y U V Y U V Y U V Y U V 1 0.27 –0.13 –0.28 0.17 –0.02 0.15 0.20 –0.20–0.20 0.20 –0.47 –0.28 2 –4.63 –0.98 0.82 –2.91–1.37 1.40 –2.91–1.07 1.44 –2.86 –1.18 1.10 3 0.42 –0.52 36.36 0.21 –0.50–0.83 0.33 –0.53–0.85 0.42 –0.52 36.36 4 –7.45 –3.90 26.66 –6.87 –3.75 –4.28 –4.94 –4.44 –3.56 –7.45 –3.90 26.66 5 –3.24 3.08 –27.7 –2.99 3.66 6.33 –3.01 3.32 6.17 –3.24 3.08 –27.7 6 0.29 0.58 38.17 0.21 0.65 –0.76 0.32 0.55 –0.90 0.29 0.58 38.17 7 –0.11 1.81 14.13 –0.50 1.83 0.43 –0.76 2.68 –1.20–0.11 1.81 14.13 8 –3.87 –0.08 10.08 –4.30 –0.88 –3.54 –3.93 –1.55 –3.32 –3.87 –0.08 10.08 Average –2.29 –0.02 12.28 –2.12 –0.05 –0.18 –1.84 –0.16 –0.30 –2.08 –0.09 12.32 M. TAMMEN ET AL. Copyright © 2010 SciRes. CS 17 Table 8. Bitrate results with a search size of 8. Percent Degradation (%) Sequence Fast TSS Diamond 2DLog 1 0.34 0.20 0.09 0.11 2 –0.24 –0.25–0.25 –0.29 3 –0.01 –0.010 0.01 4 0.01 –0.070 0 5 –0.27 –1.47–0.03 0.26 6 9.68 10.158.81 8.37 7 0.09 0.05 0.26 0.06 8 0.49 0.49 0.37 0.48 Average 1.26 1.14 1.16 1.13 Table 9. Bitrate results with a search size of 14. Percent Degradation (%) Sequence Fast TSS Diamond 2DLog 1 0.33 0.22 0.12 0.13 2 –0.06 –0.08–0.08 0.16 3 –0.01 0 –0.01 –0.01 4 0.21 –0.47–0.48 1.79 5 –0.51 –1.700.02 0.02 6 9.90 10.118.77 8.02 7 –0.28 –0.31–0.29 –0.28 8 0.40 0.40 0.20 0.18 Average 1.25 1.02 1.03 1.25 VC-1. Finding an accurate and fast set of motion vectors is very important to any video codec. Also, adaptive block size transforms are a vital part of any next generation video standard. Overall, the best performer was the dia- mond search which was followed closely by the three and four step search. Both were fast with the diamond having an 18.64% and 39.09% increase in speed with a 0.98% increase in the SNR. The three and four step search had a 17.90% and 38.18% increase in speed with a 0.96% increase in the SNR. Both exhibited an increase in bitrate around 1.10%, which is mainly due to sequence 6. Otherwise they would be much closer to 0. 6. References [1] S. Srinivasan and S. L. Regunathan, “An Overview of VC-1”, Proceeedings of SPIE, VCIP, Beijing, July 2005, pp. 720-728. [2] Proposed SMPTE 421M, “VC-1 Compressed Video Bit- stream Format for Decoding Process,” 2006. http://www. smpte.org [3] Proposed SMPTE RP 227, “VC-1 Bitstreams Transport Encoding,” 2007. http://www.smpte.org [4] Proposed SMPTE RP 228, “VC-1 Decoder and Bit- streams Conformance,” 2008. http://www.smpte.org [5] S. L. Regunathan, A. M. Rohaly, R. Crinon and P. Griffis, “Quality and Compression: The Proposed SMPTE Video Compression Standard VC-1,” SMPTE Motion Imaging Journal, Vol. 114, No. 5-6, 2005, pp. 194-201. [6] M. Ghanbari, “The Cross-Search Algorithm for Motion Estimation,” IEEE Transactions on Communications, Vol. 38, No. 7, 1990, pp. 950-953. [7] P. I. Hosur and K. K. Ma, “Motion Vector Field Adaptive Fast Motion Estimation,” Presented at the 2nd Interna- tional Conference on Information, Communications and Signal Processing, Singapore, December 1999, pp. 7-10. [8] R. Li, B. Zeng and M. Liou, “A New Three-Step Search Algorithm for Block Motion Estimation,” IEEE Transac- tions on Circuits and Systems for Video Technology, Vol. 4, No. 4, 1994, pp. 438-442. [9] L. K. Liu and E. Feig, “A Block-Based Gradient Descent Search Algorithm for Block Motion Estimation in Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 6, No. 4, 1996, pp. 419-422. [10] L.-M. Po and W.-C. Ma, “A Novel Four-Step Search Algorithm for Fast Block Matching,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 6, No. 3, 1996, pp. 313-317. [11] S. T. Samant and M. El-Sharkawy, “Modified Motion Vector Searches for H.264/AVC,” International Confer- ence on Computer Engineering & Systems (ICCES'06), Cairo, Egypt, November 2006, pp. 331-336. [12] Z. Wang, Q. Peng and Y. Zeng, “Residual Texture Based Fast Block-Size Selection for Inter-Frame Coding in H.264/AVC,” IEEE Proceedings of the 6th International Conference on Parallel and Distributed Computing Ap- plications and Technologies, Dalian, 2005, pp. 853-855. |