Paper Menu >>
Journal Menu >>
Open Journal of Applied Sciences, 2013, 3, 36-40 Published Online March 2013 (http://www.scirp.org/journal/ojapps) Copyright © 2013 SciRes. OJAppS Aerial Video Encoding Optimization Based On x264 Fan Yang, Hongbing Ma Department of Electronic Engineering, Tsinghua University, Beijing, China Email: yf693036805@gmail.com, hbma@mail.tsinghua.edu.cn Received 2012 ABSTRACT x264 video codec uses lots of new video encoding technology based on H.264/AVC video encoding standard which enhances compression efficiency. However this results in so heavy computation that the x264 codec is not fit for real-time encoding application of high resolution video. This paper analyses the character of aerial video and then opti- mizes the inter-frame mode decision and motion estimation in x264 codec according to its character by reducing a lot of unnecessary computation. In the result, about 19% computation and encoding time is reduced with total bits and PSNR decreasing lightly. Keywords: x264; Aerial Video Encoding; Mode Decision; Motion Estimation 1. Introduction Aerial video real-time encoding and transmission is very important in application. With the definition of aerial video increasing to about 1080P, the big data brings the challenge for real-time encoding. Therefore, some scho- lars do lots of analysis and research about aerial video encoding which focus on two aspects mainly. (1) Ac- cording to H.264/AVC encoding standard[1] and the character of aerial video, they use the global motion vec- tor estimated by the speed of aircraft to substitute local motion vector to save the encoding time[2]. (2) By iden- tifying the new area and object motion, they integrate the last frame into a new one[3]. This is a new encoding me- thod which is independent of the traditional encoding standard. The two methods both decrease the complexity and both have locality. The first one is only used for the aerial video which is taken when the aircraft is parallel with the horizon. Otherwise, the method will lead to very low compression efficiency. The second method’s chal- lenge is in the recognition of the object motion. The error of recognition will have large bad impact in encoding. H.264/AVC is a new generation video encoding stan- dard which is proposed by ITU-T and ISO. H.264/AVC has high compression efficiency. x264[4] is the open-source release of H.264/AVC which is used widely. As x264 is not satisfying in the application of real-time encoding for aerial video, x264 must be optimized firstly before being used. In x264, mode decision account for about 50%~70% in all the encoding time according to different command line parameter. For P frame, the candidate modes include P_skip, P16x16, P16x8, P8x16, P8x8, P4x4, P8x4, P4x8, I16x16, I8x8, I4x4. For B frame, the candidate modes include B_skip, B16x16, B16x8, B8x16, B8x8, I16x16, I8x8, I4x4. We must calculate the encoding cost of all the modes to select the best one to gain the best com- pression efficiency. x264 use the Rate Distortion Cost[1] to measure the encoding cost. The process of calculating all the modes’ encoding cost will cause so heavy compu- tation. How to reduce the unnecessary computation is the focus of this paper. Currently, some scholars propose improvement pro- gra ms about the mode decision[5-8]. In one word, the main process is like this. Because of the correlation of adjacent frame in time and space for the aerial video, they predict the mode type of current macro block di- rectly according to the analysis of the surrounding macro block’s mode, motion vector and content. This method reduces lots of needless computation. But the pred iction still needs lots of analysis and computation and may not be so accurate. It has limited optimization for the appli- cation. The second part in this paper mainly a na l yse s the cha- racter of the aerial video. The third part proposes the op- timization algorithm in mode decision for the aerial vid- eo. The fou r t h part optimizes the motion estimation and the fift h part makes the simulation of the two algorithms. The last part gives the conclusion. 2. Character of Aerial Video In aerial video, the change of adjacent frame is caused by the motion and the shake of the camera mainly. The mo- tion of object in video will lead to so slight change that we can omit it. This means that the middle part of every F. YANG, H. MA Copyright © 2013 SciRes. OJAppS frame forms from the rotation and flat move of last frame and the edge is made of the new area. Figure 1. Video Sequence Change. From the perspective of motion vector, the direction of motion vector is consistent for most macro blocks. But the magnitude of the motion vector has something to do with the angle between the camera and horizon. When the camera is parallel with the horizon, the magnitude is mostly the same which can be called the global motion vector. When the camera forms some angle with the ho- rizon, the magnitude of the upper macro blocks are less. However, the motion vector is almost the same for the adjacent macro blocks in direction and magnitude. In one word, the character of the aerial video is as fol- lows: (1) The change of the adjacent frame mainly occurs in the edge district. (2) The motion vector of most adjacent macro blocks are the same in direction and magnitude. 3. Mode Decision Optimization In the process of mode decision, the big macro blocks(16x16) are more suitable for the area of back- ground or changing slowly because of the same motion vector. While the small macro blocks(4x4) are fitter for the area changing quickly. As in these areas, we can split the big macro block into the small macro blocks to do motion estimation respectively to gain better compres- sion efficiency. Based on the analysis and the character of aerial video, we do some statistics about the mode decision on aerial video. We define that the edge area is the district whose distance is less than 5% to the boun- dary. We split the frame into two areas called the edge and the middle respectively. Then we do the statistics about the mode type for the two areas. Table 1 shows that the mode type ratio in the edge and the middle area. From Table 1, we can see that in the middle area, the skip mode and inter-frame 16x16 mode account for about 98% and the intra-frame mode about 1.5%, others about less than 0.5%. The current x264 encoder will seek the less than 0.5% inter-frame mode(except skip and In- ter-fra m e 16x16 mode) in the cost of calculating all the inter-frame modes. Table 1. Mode Type Ratio . Skip Inter16x16 Inter16x8 Inter8x16 Edge 41.33% 52.78% 0.13% 3.81% Mid 41.18% 56.93% 0.11% 0.25% Inter8x8 Intra16x16 Intra8x8 Intra4x4 Edge 0.08% 1.45% 0.39% 0.01% Mid 0.07% 1.0% 0.44% 0.02% This process will consume lots of encoding time and enhance little compression efficiency. Therefore, we propose an optimization algorithm. In the edge area, we compute all the intra-frame mode and inter-frame mode to find the best mode. While in the middle area, we only check the skip mode, inter-frame 16x16 mode and in- tra-frame mode, Omitting the check of the other in- ter-frame mode. The flow of mode decision after optimi- zation is as Figure 2 sho ws. Figure 2. Mode Decision Flow 4. Motion Estimation Optimization Motion estimation is the process of searching the macro block which makes the motion compensation least. This is the most time-consuming process. x264 realizes 5 kind s of full -pixel motion estimation algorithms. They are dia(diamond search), hex(hexagon search), umh(UMH -e xa gon search), esa(exhaustive search) and Inter 16x16 mode selection, satisfy termination early? macro block in the edge? Other inter mode decision Satisfy termination early? no satisfy skip mode? no Intra-frame mode decision yes yes Yes no no Mode decision Input macro block Output mode 37 F. YANG, H. MA Copyright © 2013 SciRes. OJAppS tesa(hadamard exhaustive search)[4]. The compression efficiency becomes better, the computation more com- plex, and encoding time longer. We do some statistics about the five motion estimation algorithms with the aerial video. Then we compare the change in encoding time, total bits and PSNR. The result shows in Table 2. Table 2. Contrast of Motion Estimation Algorithms dia hex Umh Coding time 14.00s 14.84s 18.13s Total bits 424.9KB 424.3KB 424.2KB PSNR 41.91 41.92 41.94 esa tesa Coding time 27.52s 33.35s Total bits 423.9KB 422.5KB PSNR 41.97 41.97 In Table 2 we can see that the encoding time increase by 138% at most, while the total bits and PSNR change within 0.5% which can be omitted. According to the character of the aerial video, the motion vector of the adjacent macro block is familiar. x264 encoder can pre- dict the motion vector accurately with some simple me- thod such as median prediction. Based on the predicted motion vector, the encoder can find the macro block very fast which makes the current motion compensation least. Therefore, the best motion estimation method for aerial video is diamond search given compression efficiency and encoding time. Diamond search algorithm is the process of iteratively matching macro blocks with diamond template. For the diamond search algorithm in aerial video, we do the sta- tistic about the relationship between iterative counts and ratio. The result shows in Table 3. Table 3.Diamond Search Algorithm Iterative Number Ratio. 1 2 3 4 5 65.7% 25.3% 5.60% 1.41% 0.62% 6 7 8 9 10 0.37% 0.24% 0.17% 0.12% 0.08% 11 12 13 14 15 0.07% 0.05% 0.03% 0.02% 0.02% From Table 3, we can see something as follows: (1) The macro blocks whose iterative number is 1 ac- count for 65.7%. This kind of macro block’s matching macro block is the same as the predicted one. Therefore, iterative computation is not usef ul for this kind macro blocks. (2) The macro blocks whose iterative number are 2 or less than 2 account for about 96.6%. The macro blocks whose iterative number is small accounting for so high ratio shows that the prediction for the motion vector is very accurate. The matching macro block is almost near the predicted one. According to the fact, this paper proposes an algorithm which is used to terminate the motion estimation earlier by self-adaptive threshold. (1) Initialize the variable threshold = 0. (2) Calculate the average cost of predicted matching ma- cro block. )*/( WHPCAC = (1) AC means average cost of predicted matching macro block. PC means the cost of predicted matching macro block. H and W are the height and the wide of current macro block. If the average cost AC is less than (ratio * threshold), terminate the motion estimation, otherwise go to step 3. Ratio is the control constant which can be defined by ourse l ve s. (3) Conduct diamond search. If the count of the iteration equal to 1, calculate average value between the threshold and average cost AC computed in the step 2, assign the value to threshold. If the count of iteration is more than 1, fini sh the motion estimation directly. Algorithm statement: (1) Initialize the threshold at the first time we conduct the motion estimation. (2) At the second step, the prediction cost is calculated according to the predicted vector which is realized in x264. (3) Ratio is a constant used to control the filtering. For example, the ratio being set to 0 means filtering nothing, the same to no optimization. If the ratio is set to 100000, this means all the motion estimation will be omitted. According to the statistic, it is reasonable that the ratio is set to 0.8. 5. Simulation This part simulates the optimization algorithms in the third and fo ur th part describe respectively. Then we combine the two algorithms to do the simulation together. The simulation result is as follows. We take some aerial videos in the suburbs of Beijing whose contents include farmland, forest, factory school and moving cars. We name the video as video1, video2 , video3, video4, video5. 38 F. YANG, H. MA Copyright © 2013 SciRes. OJAppS Figure 3. Aerial Video Samples Firstly, we explain some command line parameters in x264: (1) --no-psy, disable psychology optimization. PSNR will fail if the psychology optimization enabled. (2) --qp 30, qp is quantizatio n parameter. 30 is a good threshold for the video encoding for our psychology. (3) --partitions all , select the best mode from all the mode type. (4) -me[dia, umh, hex, esa, tesa], which motion estima- tion method will be selected. In our experiment we choose dia. (5) --psnr , calculate the signal and noise ratio. The negative data in the following tables mean s de- crease after optimization. Positive data means increase after optimization. We optimize the x264 encoder as the third part de- scrib es and Table 4 shows the change in the field of en- coding time, total bits and PSNR after optimization. Table 4. Change after Mode Decision Opti mization. Coding time Total bits PSNR video1 -15% 0.4% -0.01 video2 -13% 0.3% -0.01 video3 -22% 0.5% -0.02 video4 -12% 0.9% -0.02 video5 -14% 0.3% 0 average -15% 0.5% -0.012 We optimize the x264 encoder as the fourt h part states and Table 5 shows the difference after optimization. From the experiments we can reach some conclusions. 1) The two algorithms both decrease the needless com- putation to reduce the encoding time with the total bits and PSNR decreasing lightly. 2) From the first two experiments we can see that the mode decision optimization is better than motion estima- tion optimization. 3) The combined optimization is better than the separate one. 4) The two algorithms are simple to realize and have stable effect on the real-time encoding for the aerial vid- eo. Table 5. Change after Motion Estimation Optimiza- tion. Coding time Total bits PSNR video1 -3% -0.04% -0.05 video2 -5% -0.4% -0.04 video3 -4% 0.8% -0.1 video4 -8% 1.5% -0.1 video5 -4% 0.2% -0.09 average -5% 0.4% -0.08 We combine the two algorithms and optimize the x264 encoder. The difference is as Table 6 shows. Table 6. Change after Mode Decision and Motion Es- timation Optimization. Coding time Total bits PSNR video1 -17% 0.6% -0.05 video2 -16% 0.8% -0.04 video3 -25% 0.7% -0.11 video4 -18% 1.5% -0.11 video5 -17% 0.3% -0.09 average -19% 0.8% -0.08 6. Conclusion This paper analyzes the aerial video and summarizes its characters. According to the characters, we optimize the x264 encoder from two perspectives. 1: We omit the needless mode decision to reduce the computation ac- cording to the position of the macro block. 2: We com- pute the threshold which is used to terminate the motion estimation earlier to reduce the encoding time. The two optimization save about 19% encoding time with the bit rates and PSNR decreasing slightly. REFERENCES [1] Draft Recommendation and Final Draft International Standard of Joint Video Specification, Mar.2003, ITU-T Rec.H.264 and ISO/IEC 14 496-10 AVC, Joint Video Team [2] Malavika Bhaskaranand and Jerry D.Gibson. “Low-complexity Video Encoding For UAV Reconnais- sance And Surveillance.” The 2011 Military Communica- tions Conference, Nov.2011. [3] Holger Meuel, Macro Mundeloh and Jörn Ostermann. “Low Bit Rate Based ROI Video Coding for HDTV Aerial Surveillance Video Sequence.” 2011 IEEE Com- puter Society Conference On Computer Vision And Pat- tern Recognition Workshops, June 2011. [4] x264 Software(120711). Available from: 39 F. YANG, H. MA Copyright © 2013 SciRes. OJAppS http://www.videolan.org/developers/x264.html [5] Huanqiang Zeng,Canhui Cai and Kai-Kuang Ma. “Fast Mode Decision For H.264/AVC Based On Macroblock Motion Activity.” IEEE Transactions On Circuits And Systems For Video Technology, Vol.19, No.4, pp.491-499, April 2009. [6] Tiesong Zhao, Hanli Wang, Sam Kwong and C.-C.Jay Kuo. “Fast Mode Decision Based on Mode Adaptation.” IEEE Transactions On Circuits and Systems For Video Technology, Vol.20, No.5, pp.697-705, May 2010. [7] D.Wu, F.Pan, K.P.Lim, S.Wu, Z.G.Li, X.Lin, S.Rahardja, and C.C.Ko. “Fast Intermode Decision in H.264/AVC Video Coding.” IEEE Transactions On Circuits And Sys- tems For Video Technology. Vol.15, No.7, pp.953-958, July 2005. [8] Po-Hung Chen, Hung-Ming Chen, Mon-Chau Shie,Che-Hung Su,Wei-Lung Mao,Chia-Ke. Huang. “Adaptive Fast Block Mode Decision Algorithm for H.264/AVC.” 2010 the 5th IEEE conference on Industrial Electronics and Applications, June 2010. 40 |