J. Software Engineering & Applications, 2010, 3: 472-476
doi:10.4236/jsea.2010.35053 Published Online May 2010 (http://www.SciRP.org/journal/jsea)
Copyright © 2010 SciRes. JSEA
A Novel Efficient Mode Selection Approach for
H.264
Lu Lu, Wei Zhou
School of Computer Science & Engineering, South China University of Technology, Guangzhou, China.
Email: lul@scut.edu.cn
Received March 11th, 2010; revised April 2nd, 2010; accepted April 3rd, 2010.
ABSTRACT
H.264 video coding standard introduces motion estimation with multiple block sizes to achieve a considerably higher
coding efficiency than other video coding algorithms. However, this comes at the greatly increased computing complexity
at the encoder. In this paper, a method is proposed to eliminate some redundant coding modes that contribute very little
coding gain. The simulation results show that the algorithm can remarkably decrease the complexity at the encoder while
keeping satisfying coding efficiency.
Keywords: Video Coding, H.264, Mode Selection
1. Introduction
The JVT (Joint Video Team) introduced a number of
advanced features in H.264 or MPEG-4 AVC. These
improvements achieve significant gains in encoder and
decoder performances [1-3]. One of the new features is
multi-mode selection, which is the subject of this paper. In
the H.264 coding algorithm, blockmatching motion esti-
mation is an essential part of the encoder to reduce the
temporal redundancy between frames. H.264 supports
motion estimation and compensation using different block
sizes ranging from 16 × 16 to 4 × 4 luminance samples,
which is shown in Figure 1, with many options between
the two. The luminance component of each macroblock
can be split by four ways: 16 × 16, 16 × 8, 8 × 16 and 8 × 8.
Each of the submacroblock partitions is called a macrob-
lock partition. If the 8 × 8 mode is chosen, each of 8x8
macroblock partitions within the macroblock can be fur-
ther split by four ways: 8 × 8, 8 × 4, 4 × 8 or 4 × 4, which are
called macroblock sub-partitions. These partitions and
subpartitions give rise to a lager number of possible
combinations within each macroblock.1
H.264 standard uses computationally intensive La-
grangian rate-distortion (RD) optimization to choose the
best block size for a macroblock. The general equation of
Lagrangian RD optimization is given as:
mod modee
J
DR λ (1)
where Jmode is the rate-distortion cost (RD cost) and
Jmode is the Lagrangian multiplier; D is the distortion
measurement between original macroblock and recon-
structed macroblock located in the previous coded frame,
and R reflects the number of bits associated with choosing
the mode and macroblock quantizer value, Qp, including
the bits for the macroblock header, the motion vector(s)
and all the DCT residue blocks [4,5].
The computational complexity required by motion es-
timation, however, increases linearly with the number of
used block types because block matching needs to be
performed for each of them. In JVT reference software
JM75C[6], it adopts full search method for each block
type and selects the optimal block type as the final coding
mode based on the RD cost function. Though it provides
the best coding efficiency, the computational complexity
is obviously much too high. In order to reduce the inten-
sive computational requirement, Andy Cbang etc. pro-
posed fast multi-block motion estimation [7]. They adopt
an approach of early termination by skipping searching
for mode 16 × 8 and mode 8 × 16, if the performance of
mode 16 × 16 is “good enough”, otherwise all coding
modes will be performed. This method only considers
three coding modes which are 16 × 16, 16 × 8 and 8 × 16
inter coding modes. Another approach, proposed by Andy
C. Yu, is based on estimating block detail complexity [8].
It is an effective way judging by his simulation results, but
there is more a critical factor, texture direction, which he
does not think about but also can be useful to significantly
improve coding efficiency.
In this paper, we propose a effective method to elimi-
nate some redundant coding modes in mode selection.
1This paper is supported by Guangdong Technology Projec
t
(2009B010800048) and Guangzhou Technology Major Project.
A Novel Efficient Mode Selection Approach for H.264
Copyright © 2010 SciRes. JSEA
473
Figure 1. Inter-prediction modes
The paper will be organized as follows. The proposed
algorithm will be described in detail in Section 2. Section
3 shows the simulation and the results. Finally, a conclu-
sion will be given in Section 4.
2. Proposed Algorithm
2.1 Block details
Table 1 shows the observations on how selected modes
relate sequence characteristics.
The choice of partition size has a significant impact on
compression performance. In general, according to Table
1, large partition sizes are appropriate for homogeneous
areas of the frame and small partition sizes may be bene-
ficial for detailed areas.
We derive an approach based on summing the total
energy of the AC coefficients to estimate the block detail.
The AC coefficients can be obtained from the DCT coef-
ficients of each block. The definition is:
11
11
(
MN
AC
uv
EF


 2
(,))uv (2)
From (2), EAC, the total energy of the AC components of
an M × N block is the sum of all the DCT coefficients,
F(u,v), except for the DC component, u = 0 and v = 0.
11
00
(,) ()()
(2 1)(21)
(, )cos[]cos[]
16 16
MN
xy
Fuv cucv
x
uyv
fxy



 (3)
where,
11
,,0
22
,,0
(),() for uv
MN
for uv
MN
cu cv
(4)
According to the energy conservation principle, the
total energy of an M × N block is equal to the accumulated
energy of its DCT coefficients. Thus, (3) can be further
simplified as
11 11
22
00 00
1
((,)) [(,)]
MN MN
AC
xyxy
Efxy fxy
MN
 
 

 (5)
where the first term is the total energy of the image
intensities within an M × N block, and the second term
represents the mean square intensity. Equation (5) clearly
shows that the energy of the AC components of a mac-
roblock can be represented by the variance.
Evaluating the maximum sum of the AC components is
the next target. By definition, the largest variance is ob-
tained from the block comprising checkerboard pattern in
which every adjacent pixel is the permissible maximum
and minimum value. Thus, Emax, the maximum sum of AC
components of an M×N block is
22
max min
max
2
max min
(, )(, )
2
[(,) (,)]
4
f
xy f xy
EMN
MN fxyfxy

(6)
Note that Emax can be calculated in advance. Then the
criterion to assess the complexity of a macroblock detail is
max
ln( )
ln( )
AC
d
E
rE
(7)
In total, 7 different block sizes are recommended by
H.264 for P-frames, namely, 16 × 16, 16 × 8, 8 × 16, 8 × 8, 8
× 4, 4 × 8, 4 × 4 as well as SKIP, and other two INTRA
prediction modes, I4MB and I16MB. However, in our
complexity measurement, there are only 3 categories,
which are denoted as MD16 category, MD8 category, and
MD4 category, respectively.
The proposed algorithm provides a recursive way to
decide the complexity of each macroblock. Firstly, a
macroblock of 16 × 16 pixels is examined with the first
Table 1. Selected modes for different sequences
Sequence Skip 16×16 16×8 8×16 8×8 Intra16 Intra4
Container 75.8 10.4 3.5 2.7 7.3 0.3 0.0
Foreman 23.7 39.9 39.9 7.3 7.6 7.3 9.3
Bus 3.5 22.0 12.1 14.4 40.5 1.0 5.5
Mobile 4.5 31.3 7.1 6.1 6.1 49.7 0.3
IPPP, 5 reference frames, CABAC, CIF Format
A Novel Efficient Mode Selection Approach for H.264
Copyright © 2010 SciRes. JSEA
474
piecewise equation in (7). An LDB category is given if it
is recognized as being a homogenous macroblock. Oth-
erwise, the macroblock is decomposed into 4 blocks of 8 ×
8 pixels. Note that an 8 × 8 block is recognized as
high-detailed if it satisfies two conditions: 1) the RB in (7)
is greater than 0.7, and it is decomposed into four 4 × 4
block, and 2) one of its four decomposed 4 × 4 blocks is
highdetailed as well. If an 8 × 8 block satisfies the first
condition but not the second, it is still recognized as
low-detailed. After checking all the 8 × 8 blocks, an MDB
category is given to a macroblock which possesses more
than two high-detailed blocks, otherwise the HDB cate-
gory is assigned. Table 2 displays the relationship be-
tween the three categories in the proposed algorithm and
the 9 inter-frame prediction modes. It is observed that the
LDB category covers the least number of prediction
modes, whereas the HDB category contains all the avai-
lable modes. The table further indicates that the higher
detailed the macroblocks are, the more prediction modes
the proposed algorithm has to check.
The function of the natural logarithm is to linearize both
Emax and EAC such that the range of rd can be uniformly
split into 10 subgroups. In our evaluation, a macroblock
that has the rd >0.7, is considered to be a high-detailed
block.
2.2 Object Movement
More than one object is contained in a macroblock and is
moving in different directions. This included objects
moving over a background with different velocity. For
example, in Figure 2 the object is moving against a static
background. In this case, the current block should be
divided into two 8 × 16 sub-blocks whereas sub-block 0
should have a zero motion vector and sub-block 1 should
have a motion vector such that the cost function can be
minimized.
2.3 Texture Regions
When the edge of texture aligned perfectly with the sensor
boundaries at a particular time instant, the texture edge is
clear and sharp. We will describe this texture as having
“integer-pixel location”. When the texture undergoes an
integer-pixel translational motion, the texture will look
exactly the same in the two consecutive frames except that
one is a translation to another. And the moved texture can
be predicted perfectly by integer-pixel motion estimation.
If the edges of texture have a half-pixel offset relative to
the senor, the edges may be blurred as shown in Figure
3(b) and said to have “half-pixel location”. The original
zero-pixel-wide (sharp) edge now becomes one pixel-
wide (blurred). The pixel at the blurred edges may have
only half the intensity of the original one, which can lead
to difficulty in motion estimation.
Similarly, if the edges have a quarter-pixel offset it may
Table 2. Block categories and corresponding modes
Detail Level Enabled Modes
LDB 16×16
MDB 16×16, 16×8, 8×16, 8×8
HDB 8×8, 8×4, 4×8, 4×4
Figure 2. Example of an object moving on a static back-
ground
be blurred as shown in Figure 3(c). We will describe this
texture as having “quarter-pixel location”. The zero-
pixel-wide (sharp) object edge becomes one pixel-wide
(blurred). The pixels at the blurred edges may have 3/4 or
1/4 of the intensity. The use of sub-pixel motion estima-
tion algorithm, like half-pixel or quarter-pixel estimation,
uses interpolation to predict the sub-pixel shift of texture
relative to the sampling grid.
Different type of texture (integer, half or quarter) has a
different response to fractional motion estimation. For
example, the texture in Figure 3(b) (half) can be predicted
perfectly by the texture in Figure 3(a) (integer) using
half-pixel motion estimation but not vice verse. Since it is
possible for a macroblock to contain more than one kind
of texture, using only one integer, half or quarter pixel
motion vector will not be sufficient to describe the texture
content. For example, a macroblock may contain two 8 ×
16 sub-blocks where sub-block 0 contains “half-pixel”
texture and sub-block 1 contains “integer-pixel” texture.
In this case, the current macroblock should be divided into
two 8 × 16 sub-blocks in which half-pixel motion vector
should be used for sub-block 0 and integer-pixel motion
vector for sub-block 1.
2.4 Algorithm
Former results [9] show that, often, about 70% of the
macroblocks will choose mode 1 (16 × 16) as their final
block type. In the proposed algorithm it determines the
macroblock detail-level and analysis the information
obtained from 8 × 8 block size ME to predict the mode 1
macroblock in advance, if possible, the optimal motion
vector. If the macroblock is predicted to be mode 1 mac-
roblock, searching will be stopped immediately.
As a result, computation can be saved for mode 2 and
mode 3 block size ME and in some situation mode 1 as
well. Three decisions are set up in handling different
A Novel Efficient Mode Selection Approach for H.264
Copyright © 2010 SciRes. JSEA
475
Figure 3. (a) Integer-pixel texture; (b) Half-pixel texture; (c) Quarter-pixel texture
video area-general area, slow moving area and fast mov-
ing area.
Step1: If rd<0.3 then
- Select 16 × 16 as the only enabled mode (LDB)
Else if 0.3<rd<0.7 then
- Disable 8 × 8, 4 × 8, 4 × 4 (MDB)
Else if If rd>0.7 then
- Enable all of the modes (HDB)
Defining MV0, MV1, MV2, MV3 be the motion vector
of 8 × 8 subblock of the current macroblock. Two condi-
tions are checked:
Step2:
C1: If MV0=MV1=MV2=MV3 then
- choose mode 1 (16 × 16) as final block type
- no ME will be further performed
- 16 × 16 MV = 8 × 8 MV0
C2: If three subblock MV are the same AND
the forth unequal MV only differ by one quarter pixel
(1/4)
distance
then
- choose mode 1 (16 × 16) as final block type
- no ME will be further performed
- 16 × 16 MV = dominated 8 × 8 MV
C3: If collocate MB in previous frame is mode1 AND
{MV0, MV1, MV2, MV3} < 4 (i.e. one integer pixel
distance) AND MV0, MV1, MV2, MV3 has the same
direction then
- choose mode 1 (16 × 16) as final block type
- 8 point local search around MV = {0, 0}
C4: If all magnitude of 8 × 8 MVx >= 3 integer distance
OR all magnitude of 8 × 8 MVy >= 3 integer distance
- choose mode 1 (16 × 16) as final block type
- local search for surrounding 24 points of MV0
The reason for all the 8 × 8 motion vector having the
same direction in decision C3 can be illustrated using
Figure 4(a). Suppose a macroblock is undergoing small
rotational motion as shown in Figure 4(a). The motion
vector at the left size of macroblock will be downward and
the right side will be upward. As a result, there is a high
potential for the current macroblock segmented vertically
even the magnitude of motion vector is very small.
Figure 5 shows the performance of C1 + C2 + C3 + C4.
We can see the hit rate increase for the fast panning part of
foreman sequence which is much closer to the optimal one.
3. Simulation Results
The proposed algorithm was implemented in the ref-
erence JVT software.JM75C. We have tested our pro-
posed method over a series of testing sequence with dif-
ferent resolution.
In this paper, two QCIF (176 × 144) sequences, “Fore-
man” and “Stefan” are selected to show the result. In the
simulation, the sequences are encoded at 30 fps with qp =
10 to 20 with step size of two. The PSNR and bitrate
comparison between proposed algorithm and full search is
shown in Table 3.
Figure 4. Example of rotational motion in Macroblock that
cause segmentation (a) vertical segmentation; (b) horizontal
segmentation
Figure 5. Performance using Decision C1 + C2 + C3 +C4
using foreman QCIF sequence
A Novel Efficient Mode Selection Approach for H.264
Copyright © 2010 SciRes. JSEA
476
Table 3. PSNR and Bitrate Comparison between the proposed algorithm and FS with QP = 10 to 20; (a) Stefan QCIF; (b)
Foreman QCIF
Stefan QCIF ForemanQCIF
FMFME Full Search FMFME Full Search
QP Psnr
(dB)
BR
(kbits)
Psnr
(dB)
BR
(kbits)
Gain
(dB)
BR
Gain QP Psnr
(dB)
BR
(kbits)
Psnr
(dB)
BR
(kbits)
Gain
(dB)
BR
Gain
10 49.48 2602.9
49.48 2600.0
0 -0.11%1049.69
1457.1 49.69 1455.0
0 -0.15%
12 47.64 2203.3
47.64 2201.3
0 -0.09%1247.97
1149.2
47.97 1146.8 0 -0.21%
14 46.13 1891.5
46.14 1889.5
-0.01 -0.10%1446.5 923.2646.5 921.76 0 -0.16%
16 44.48 1611.5 44.48 1608.4
0 -0.19%1644.89 732.1444.9 729.57 -0.01 -0.35%
18 42.58 1323.5
42.58 1321.5
0 -0.15%1843.11 552.3343.12 550.69 -0.01 -0.30%
20 40.89 1095.2
40.89 1092.2
0 -0.27%2041.52 422.3141.53 419.7 -0.01 -0.62%
22 39.3 903.06 39.3 900.83 0 -0.25%2240.03 328.5440.03 326.17 0 -0.73%
24 37.36 707.5 37.36 705.61 0 -0.27%24 38.32 241.99 38.33 238.92 -0.01 -1.28%
26 35.65 557.72 35.66 555.8 -0.01 -0.35%2636.83 180.03 36.85 178.54 -0.02 -0.83%
28 33.95 432.34 33.96 430.4 -0.01 -0.45%2835.48 136.92 35.49 135.35 -0.01 -1.16%
30 32.06 322.08 32.07 320.79 -0.01 -0.40%3033.99 103.0234 101.24 -0.01 -1.76%
32 30.34 238.65 30.34 236.84 0 -0.76%3232.57 77.65 32.58 76.58 -0.01 -1.40%
34 28.79 177.98 28.8 177.61 -0.01 -0.21%3431.3 60.67 31.34 59.62 -0.04 -1.76%
36 27.13 127.37 27.13 127.04 0 -0.26%3629.96 45.86 30.03 45.43 -0.07 -0.95%
38 25.66 94.1 25.68 93.35 -0.02 -0.80%3828.63 35.55 28.71 35.47 -0.08 -0.23%
40 24.33 70.58 24.36 70.35 -0.03 -0.50%4027.49 28.63 27.53 28.3 -0.04 -1.17%
Average -0.00625-0.32% Average -0.02-0.82%
(a) (b)
The complexity is shown in Table 3. The proposed
algorithm can reduce computational cost by 58% on av-
erage (equivalent complexity of performing motion esti-
mation on 1.7 block types instead of 4 block types) with
negligibly small PSNR degradation (0.013dB) and slight-
increase in bit rate (0.57%).
4. Conclusions
In this paper, we propose a method to eliminate some
redundant coding modes, which speeds up the process of
multi-mode selection. The simulation results show that the
algorithm can remarkably decrease the complexity at the
encoder while keeping satisfying coding efficiency.
REFERENCES
[1] “Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T
VCEG: Draft Text of Final Draft International Standard
for Advanced Video Coding,” H. 264|ISO/IEC 14496-10
AVC, ITU-T.
[2] M. Ghanbari, “Standard Codecs: Image Compression to
Advanced Video Coding,” IEE Publishing, 2002.
[3] E. G. Iain and Richardson, “H.264 and MPEG-4 Video
Compression,” Wiley, 2003.
[4] F. S. Yan, “Fast mode selection based on texture analysis
and local motion activity in H.264/AVC,” 2004 Inter-
national Conference of Communications, Circuits and
Systems, Chengdu, Vol. 1, 27-29 June 2004, pp. 539-542.
[5] G. W. Teng, Z. Y. Zhang, Y. J. Zhang and W. J. Zhang,
“Fast Mode Decision Algorithm in Inter Pictures Based on
H. 264/ AVC,” Journal of Optoelectronics·Laser, Vol. 16,
No. 7, July 2005, pp. 866-870.
[6] “JVT Reference Software JM75C”. http://bs.hhi.de/~sueh
ring/tm
[7] A. Chang, O. C. Au and Y. M. Yeung, “A Novel Approach
to Fast Multi-block Motion Estimation for H.264 Video
Coding,” Proceedings 2003 International Conference on
Multimedia and Expo, Maryland, Vol. 1, 6-9 July 2003, pp.
539-542.
[8] A. C. Yu, “Efficient Block-size Selection Algorithm for
Inter-Frame Coding in H.264/MPEG-4 AVC,” 2004 IEEE
International Conference on Acoustics, Speech and Signal
Processing, Montreal, Vol. 3, 17-21 May 2004, pp.69-72.
[9] Y. S. Cui, D. G. Duan and Z. L. Deng, “Fast Motion
Estimation Algorithm on H.264,” Journal of Liaoning
Institute of Technology, Vol. 24, No. 5, October 2004, pp.
12-15.