Aerial Video Encoding Optimization Based On x264

doi:10.4236/ojapps.2013.31B008

Paper Menu >>

Journal Menu >>

Open Journal of Applied Sciences, 2013, 3, 36-40

Published Online March 2013 (http://www.scirp.org/journal/ojapps)

Aerial Video Encoding Optimization Based On x264

Fan Yang, Hongbing Ma

Department of Electronic Engineering, Tsinghua University, Beijing, China

Email: yf693036805@gmail.com, hbma@mail.tsinghua.edu.cn

Received 2012

ABSTRACT

x264 video codec uses lots of new video encoding technology based on H.264/AVC video encoding standard which

enhances compression efficiency. However this results in so heavy computation that the x264 codec is not fit for

real-time encoding application of high resolution video. This paper analyses the character of aerial video and then opti-

mizes the inter-frame mode decision and motion estimation in x264 codec according to its character by reducing a lot of

unnecessary computation. In the result, about 19% computation and encoding time is reduced with total bits and PSNR

decreasing lightly.

Keywords: x264; Aerial Video Encoding; Mode Decision; Motion Estimation

1. Introduction

Aerial video real-time encoding and transmission is very

important in application. With the definition of aerial

video increasing to about 1080P, the big data brings the

challenge for real-time encoding. Therefore, some scho-

lars do lots of analysis and research about aerial video

encoding which focus on two aspects mainly. (1) Ac-

cording to H.264/AVC encoding standard[1] and the

character of aerial video, they use the global motion vec-

tor estimated by the speed of aircraft to substitute local

motion vector to save the encoding time[2]. (2) By iden-

tifying the new area and object motion, they integrate the

last frame into a new one[3]. This is a new encoding me-

thod which is independent of the traditional encoding

standard. The two methods both decrease the complexity

and both have locality. The first one is only used for the

aerial video which is taken when the aircraft is parallel

with the horizon. Otherwise, the method will lead to very

low compression efficiency. The second method’s chal-

lenge is in the recognition of the object motion. The error

of recognition will have large bad impact in encoding.

H.264/AVC is a new generation video encoding stan-

dard which is proposed by ITU-T and ISO. H.264/AVC

has high compression efficiency. x264[4] is the

open-source release of H.264/AVC which is used widely.

As x264 is not satisfying in the application of real-time

encoding for aerial video, x264 must be optimized firstly

before being used.

In x264, mode decision account for about 50%～70%

in all the encoding time according to different command

line parameter. For P frame, the candidate modes include

P_skip, P16x16, P16x8, P8x16, P8x8, P4x4, P8x4, P4x8,

I16x16, I8x8, I4x4. For B frame, the candidate modes

include B_skip, B16x16, B16x8, B8x16, B8x8, I16x16,

I8x8, I4x4. We must calculate the encoding cost of all

the modes to select the best one to gain the best com-

pression efficiency. x264 use the Rate Distortion Cost[1]

to measure the encoding cost. The process of calculating

all the modes’ encoding cost will cause so heavy compu-

tation. How to reduce the unnecessary computation is the

focus of this paper.

Currently, some scholars propose improvement pro-

gra ms about the mode decision[5-8]. In one word, the

main process is like this. Because of the correlation of

adjacent frame in time and space for the aerial video,

they predict the mode type of current macro block di-

rectly according to the analysis of the surrounding macro

block’s mode, motion vector and content. This method

reduces lots of needless computation. But the pred iction

still needs lots of analysis and computation and may not

be so accurate. It has limited optimization for the appli-

cation.

The second part in this paper mainly a na l yse s the cha-

racter of the aerial video. The third part proposes the op-

timization algorithm in mode decision for the aerial vid-

eo. The fou r t h part optimizes the motion estimation and

the fift h part makes the simulation of the two algorithms.

The last part gives the conclusion.

2. Character of Aerial Video

In aerial video, the change of adjacent frame is caused by

the motion and the shake of the camera mainly. The mo-

tion of object in video will lead to so slight change that

we can omit it. This means that the middle part of every

F. YANG, H. MA

frame forms from the rotation and flat move of last frame

and the edge is made of the new area.

Figure 1. Video Sequence Change.

From the perspective of motion vector, the direction of

motion vector is consistent for most macro blocks. But

the magnitude of the motion vector has something to do

with the angle between the camera and horizon. When

the camera is parallel with the horizon, the magnitude is

mostly the same which can be called the global motion

vector. When the camera forms some angle with the ho-

rizon, the magnitude of the upper macro blocks are less.

However, the motion vector is almost the same for the

adjacent macro blocks in direction and magnitude.

In one word, the character of the aerial video is as fol-

lows:

(1) The change of the adjacent frame mainly occurs in

the edge district.

(2) The motion vector of most adjacent macro blocks

are the same in direction and magnitude.

3. Mode Decision Optimization

In the process of mode decision, the big macro

blocks(16x16) are more suitable for the area of back-

ground or changing slowly because of the same motion

vector. While the small macro blocks(4x4) are fitter for

the area changing quickly. As in these areas, we can split

the big macro block into the small macro blocks to do

motion estimation respectively to gain better compres-

sion efficiency. Based on the analysis and the character

of aerial video, we do some statistics about the mode

decision on aerial video. We define that the edge area is

the district whose distance is less than 5% to the boun-

dary. We split the frame into two areas called the edge

and the middle respectively. Then we do the statistics

about the mode type for the two areas. Table 1 shows

that the mode type ratio in the edge and the middle area.

From Table 1, we can see that in the middle area, the

skip mode and inter-frame 16x16 mode account for about

98% and the intra-frame mode about 1.5%, others about

less than 0.5%. The current x264 encoder will seek the

less than 0.5% inter-frame mode(except skip and In-

ter-fra m e 16x16 mode) in the cost of calculating all the

inter-frame modes.

Table 1. Mode Type Ratio .

Skip Inter16x16 Inter16x8 Inter8x16

Edge

41.33% 52.78% 0.13% 3.81%

Mid 41.18% 56.93% 0.11% 0.25%

Inter8x8 Intra16x16 Intra8x8 Intra4x4

Edge

0.08%

1.45%

0.39%

0.01%

Mid 0.07% 1.0% 0.44% 0.02%

This process will consume lots of encoding time and

enhance little compression efficiency. Therefore, we

propose an optimization algorithm. In the edge area, we

compute all the intra-frame mode and inter-frame mode

to find the best mode. While in the middle area, we only

check the skip mode, inter-frame 16x16 mode and in-

tra-frame mode, Omitting the check of the other in-

ter-frame mode. The flow of mode decision after optimi-

zation is as Figure 2 sho ws.

Figure 2. Mode Decision Flow

4. Motion Estimation Optimization

Motion estimation is the process of searching the macro

block which makes the motion compensation least. This

is the most time-consuming process. x264 realizes 5

kind s of full -pixel motion estimation algorithms. They

are dia(diamond search), hex(hexagon search),

umh(UMH -e xa gon search), esa(exhaustive search) and

Inter 16x16 mode selection, satisfy

termination early?

macro block in the edge?

Other inter mode decision

Satisfy termination early?

satisfy skip mode?

Intra-frame mode decision

yes

Yes

Mode decision

Input macro block

Output mode

F. YANG, H. MA

tesa(hadamard exhaustive search)[4]. The compression

efficiency becomes better, the computation more com-

plex, and encoding time longer. We do some statistics

about the five motion estimation algorithms with the

aerial video. Then we compare the change in encoding

time, total bits and PSNR. The result shows in Table 2.

Table 2. Contrast of Motion Estimation Algorithms

dia hex Umh

Coding time 14.00s 14.84s 18.13s

Total bits 424.9KB 424.3KB 424.2KB

PSNR 41.91 41.92 41.94

esa tesa

Coding time 27.52s 33.35s

Total bits 423.9KB 422.5KB

PSNR 41.97 41.97

In Table 2 we can see that the encoding time increase

by 138% at most, while the total bits and PSNR change

within 0.5% which can be omitted. According to the

character of the aerial video, the motion vector of the

adjacent macro block is familiar. x264 encoder can pre-

dict the motion vector accurately with some simple me-

thod such as median prediction. Based on the predicted

motion vector, the encoder can find the macro block very

fast which makes the current motion compensation least.

Therefore, the best motion estimation method for aerial

video is diamond search given compression efficiency

and encoding time.

Diamond search algorithm is the process of iteratively

matching macro blocks with diamond template. For the

diamond search algorithm in aerial video, we do the sta-

tistic about the relationship between iterative counts and

ratio. The result shows in Table 3.

Table 3.Diamond Search Algorithm Iterative Number

Ratio.

1 2 3 4 5

65.7% 25.3% 5.60% 1.41% 0.62%

6 7 8 9 10

0.37% 0.24% 0.17% 0.12% 0.08%

11 12 13 14 15

0.07% 0.05% 0.03% 0.02% 0.02%

From Table 3, we can see something as follows:

(1) The macro blocks whose iterative number is 1 ac-

count for 65.7%. This kind of macro block’s matching

macro block is the same as the predicted one. Therefore,

iterative computation is not usef ul for this kind macro

blocks.

(2) The macro blocks whose iterative number are 2 or

less than 2 account for about 96.6%. The macro blocks

whose iterative number is small accounting for so high

ratio shows that the prediction for the motion vector is

very accurate. The matching macro block is almost near

the predicted one.

According to the fact, this paper proposes an algorithm

which is used to terminate the motion estimation earlier

by self-adaptive threshold.

(1) Initialize the variable threshold = 0.

(2) Calculate the average cost of predicted matching ma-

cro block.

)*/( WHPCAC =

(1)

AC means average cost of predicted matching macro

block. PC means the cost of predicted matching macro

block. H and W are the height and the wide of current

macro block.

If the average cost AC is less than (ratio * threshold),

terminate the motion estimation, otherwise go to step 3.

Ratio is the control constant which can be defined by

ourse l ve s.

(3) Conduct diamond search. If the count of the iteration

equal to 1, calculate average value between the threshold

and average cost AC computed in the step 2, assign the

value to threshold. If the count of iteration is more than 1,

fini sh the motion estimation directly.

Algorithm statement:

(1) Initialize the threshold at the first time we conduct the

motion estimation.

(2) At the second step, the prediction cost is calculated

according to the predicted vector which is realized in

x264.

(3) Ratio is a constant used to control the filtering. For

example, the ratio being set to 0 means filtering nothing,

the same to no optimization. If the ratio is set to 100000,

this means all the motion estimation will be omitted.

According to the statistic, it is reasonable that the ratio is

set to 0.8.

5. Simulation

This part simulates the optimization algorithms in the

third and fo ur th part describe respectively. Then we

combine the two algorithms to do the simulation together.

The simulation result is as follows.

We take some aerial videos in the suburbs of Beijing

whose contents include farmland, forest, factory school

and moving cars. We name the video as video1, video2 ,

video3, video4, video5.

F. YANG, H. MA

Figure 3. Aerial Video Samples

Firstly, we explain some command line parameters in

x264:

(1) --no-psy, disable psychology optimization. PSNR

will fail if the psychology optimization enabled.

(2) --qp 30, qp is quantizatio n parameter. 30 is a good

threshold for the video encoding for our psychology.

(3) --partitions all , select the best mode from all the

mode type.

(4) -me[dia, umh, hex, esa, tesa], which motion estima-

tion method will be selected. In our experiment we

choose dia.

(5) --psnr , calculate the signal and noise ratio.

The negative data in the following tables mean s de-

crease after optimization. Positive data means increase

after optimization.

We optimize the x264 encoder as the third part de-

scrib es and Table 4 shows the change in the field of en-

coding time, total bits and PSNR after optimization.

Table 4. Change after Mode Decision Opti mization.

Coding

time

Total bits

PSNR

video1

-15%

0.4%

-0.01

video2

-13%

0.3%

-0.01

video3

-22%

0.5%

-0.02

video4

-12%

0.9%

-0.02

video5

-14%

0.3%

average

-15%

0.5%

-0.012

We optimize the x264 encoder as the fourt h part states

and Table 5 shows the difference after optimization.

From the experiments we can reach some conclusions.

1) The two algorithms both decrease the needless com-

putation to reduce the encoding time with the total bits

and PSNR decreasing lightly.

2) From the first two experiments we can see that the

mode decision optimization is better than motion estima-

tion optimization.

3) The combined optimization is better than the separate

one.

4) The two algorithms are simple to realize and have

stable effect on the real-time encoding for the aerial vid-

eo.

Table 5. Change after Motion Estimation Optimiza-

tion.

Coding

time

Total bits PSNR

video1

-3%

-0.04%

-0.05

video2

-5%

-0.4%

-0.04

video3

-4%

0.8%

-0.1

video4

-8%

1.5%

-0.1

video5

-4%

0.2%

-0.09

average

-5%

0.4%

-0.08

We combine the two algorithms and optimize the x264

encoder. The difference is as Table 6 shows.

Table 6. Change after Mode Decision and Motion Es-

timation Optimization.

Coding

time

Total bits PSNR

video1

-17%

0.6%

-0.05

video2

-16%

0.8%

-0.04

video3

-25%

0.7%

-0.11

video4

-18%

1.5%

-0.11

video5

-17%

0.3%

-0.09

average

-19%

0.8%

-0.08

6. Conclusion

This paper analyzes the aerial video and summarizes its

characters. According to the characters, we optimize the

x264 encoder from two perspectives. 1: We omit the

needless mode decision to reduce the computation ac-

cording to the position of the macro block. 2: We com-

pute the threshold which is used to terminate the motion

estimation earlier to reduce the encoding time. The two

optimization save about 19% encoding time with the bit

rates and PSNR decreasing slightly.

REFERENCES

[1] Draft Recommendation and Final Draft International

Standard of Joint Video Specification, Mar.2003, ITU-T

Rec.H.264 and ISO/IEC 14 496-10 AVC, Joint Video

Team

[2] Malavika Bhaskaranand and Jerry D.Gibson.

“Low-complexity Video Encoding For UAV Reconnais-

sance And Surveillance.” The 2011 Military Communica-

tions Conference, Nov.2011.

[3] Holger Meuel, Macro Mundeloh and Jörn Ostermann.

“Low Bit Rate Based ROI Video Coding for HDTV

Aerial Surveillance Video Sequence.” 2011 IEEE Com-

puter Society Conference On Computer Vision And Pat-

tern Recognition Workshops, June 2011.

[4] x264 Software(120711). Available from:

F. YANG, H. MA

http://www.videolan.org/developers/x264.html

[5] Huanqiang Zeng,Canhui Cai and Kai-Kuang Ma. “Fast

Mode Decision For H.264/AVC Based On Macroblock

Motion Activity.” IEEE Transactions On Circuits And

Systems For Video Technology, Vol.19, No.4,

pp.491-499, April 2009.

[6] Tiesong Zhao, Hanli Wang, Sam Kwong and C.-C.Jay

Kuo. “Fast Mode Decision Based on Mode Adaptation.”

IEEE Transactions On Circuits and Systems For Video

Technology, Vol.20, No.5, pp.697-705, May 2010.

[7] D.Wu, F.Pan, K.P.Lim, S.Wu, Z.G.Li, X.Lin, S.Rahardja,

and C.C.Ko. “Fast Intermode Decision in H.264/AVC

Video Coding.” IEEE Transactions On Circuits And Sys-

tems For Video Technology. Vol.15, No.7, pp.953-958,

July 2005.

[8] Po-Hung Chen, Hung-Ming Chen, Mon-Chau

Shie,Che-Hung Su,Wei-Lung Mao,Chia-Ke. Huang.

“Adaptive Fast Block Mode Decision Algorithm for

H.264/AVC.” 2010 the 5th IEEE conference on Industrial

Electronics and Applications, June 2010.