Wireless Sensor Network, 2010, 2, 328-336
doi:10.4236/wsn.2010.24044 Published Online April 2010 (http://www.SciRP.org/journal/wsn)
Copyright © 2010 SciRes. WSN
Very Low Bit-Rate Video Coding by Combining
H.264/AVC Standard and 2-D Discrete Wavelet Transform
Ali Aghagolzadeh1,2, Saeed Meshgini1, Mehdi Nooshyar1, Mehdi Aghagolzadeh1
1Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
2Iranian Telecommunication Research Center (ITRC), Tehran, Iran
E-mail: aghagol@tabrizu.ac.ir, saeed_meshgini@tabrizu.ac.ir, nooshyar@tabrizu.ac.ir
Received October 25, 2009; revised November 11, 2009; accepted February 16, 2010
Abstract
In this paper, we propose a new method for very low bit-rate video coding that combines H.264/AVC stan-
dard and two-dimensional discrete wavelet transform. In this method, first a two dimensional wavelet trans-
form is applied on each video frame independently to extract the low frequency components for each frame
and then the low frequency parts of all frames are coded using H.264/AVC codec. On the other hand, the
high frequency parts of the video frames are coded by Run Length Coding algorithm, after applying a
threshold to neglect the low value coefficients. Experiments show that our proposed method can achieve bet-
ter rate-distortion performance at very low bit-rate applications below 16 kbits/s compared to applying
H.264/AVC standard directly to all frames. Applications of our proposed video coding technique include
video telephony, video-conferencing, transmitting or receiving video over half-rate traffic channels of GSM
networks.
Keywords: Video Coding, H.264/AVC Standard, Run Length Coding, Two-Dimensional Wavelet Transform
1. Introduction
The demands for video transmission and delivery over
both high and low bandwidth channels have been accel-
erated. The high bandwidth applications include digital
video by satellite (DVS) and high-definition television
(HDTV). The low bandwidth applications are dominated
by transmission over the Internet, where the majority of
modems work at speeds below 56 kbits/s [1].
On the other hand, representing video material in a
digital form requires a long number of bits. The volume
of data generated by digitising a video signal is too large
for the most transmission systems. This means that com-
pression is essential for the most digital video applica-
tions. An efficient and well-designed video compression
system gives very significant performance advantages
for visual communication at both low and high transmis-
sion bandwidths. At low bandwidths, compression en-
ables applications that would not otherwise be possible,
such as basic-quality video telephony over a standard
telephone connection. At high bandwidths, compression
can support a much higher visual quality. Video com-
pression and video codecs will therefore remain a vital
part of the emerging multimedia applications for the
foreseeable future, allowing designers to make the most
efficient use of the available transmission capacity. The
development of video coding technology since 1980 has
been bounded up with a series of international standards
for video compression. Each of these standards supports
a particular application of video coding (or a set of ap-
plications), such as videoconferencing and digital televi-
sion [2].
H.264/AVC is the newest video coding standard of the
ITU-T Video Coding Experts Group and the ISO/IEC
Moving Picture Experts Group. The goals of this stan-
dardization efforts were enhanced compression effi-
ciency, network-friendly video representation for both
interactive (video telephony) and non-interactive (broad-
cast, streaming, storage and video on demand) applica-
tions [3]. H.264/AVC has achieved a significant im-
provement in rate-distortion efficiency relative to the
previous standards [4]. However, H.264/AVC standard,
like the previous video coding standards, results in a
number of unacceptable artifacts such as blockiness
when operated at very low bit rates. Hence, there is a
need for new techniques to improve the coding effi-
ciency and produce acceptable quality of video at very
low bit-rate applications.
In this paper, a new video compression method for
very low bit-rate coding is proposed. The main goal of
A. AGHAGOLZADEH ET AL.329
this paper is enhancing the compression efficiency
(rate-distortion performance) at very low bit-rate appli-
cations (such as video-conferencing and video teleph-
ony). This has been achieved by combining H.264/AVC
standard and two-dimensional discrete wavelet trans-
form.
Experiments show that H.264/AVC standard, like the
other video coding standards, has a good capability in
coding of the low frequency components (the general
structure) in contents of video frames, but it has difficul-
ties in encoding the details of objects in video streams,
like boundaries and edges. Since the techniques em-
ployed in this standard use only the statistical dependen-
cies in the video signal at a block level and do not con-
sider the semantic content of the video, at very low bit
rates (high quantization factors) artifacts are introduced
at the block boundaries. Usually these block boundaries
do not correspond to physical boundaries of the moving
objects and hence, visually annoying artifacts are intro-
duced [5]. This problem is emphasized when the objects
in video frame are dislocated rapidly; i.e. when a fast
motion in a video stream occurs. Depending on the num-
ber of quantization levels used in the coding procedure,
some details of an object are eliminated. The more the
number of quantization levels is decreased, the more the
details are vanished. High and suddenly motions in a
video stream can also lead into loss of some important
information through a limited capacity channel. The sup-
porting idea of this paper is to combat these problems by
extracting the details from a video sequence and then
coding them by another scheme instead of H.264/AVC
standard.
This paper is organized as follows. At first in Section 2,
we give some analytic discussion about wavelet trans-
form. The architecture of the proposed video coding sys-
tem is then presented in Section 3. In Section 4, com-
parisons are given between the experimental results ob-
tained by the proposed method and the original H.264
codec. The possible advantages of our proposed method
in different applications are discussed in this section.
Conclusions are given in Section 5.
2. Wavelet Transform
Although the Fourier transform has been the mainstay of
transform-based image and video processing since the late
1950s, a more recent transformation, called the wavelet
transform, is now making it even easier to compress,
transmit, and analyze many images and videos. Unlike the
Fourier transform, whose basis functions are sinusoids,
wavelet transforms are based on small waves, called
wavelets, of varying frequency and limited duration.
The goal of the modern wavelet research is to create a
set of basis functions (or general expansion functions)
and transforms that will give an informative, efficient,
and useful description of a function or signal. Another
central idea is that of multiresolution analysis where the
decomposition of a signal is done in terms of the differ-
ent resolutions of details.
Both the mathematics and the practical interpretations
of the wavelet transform seem to be best served by using
the concept of resolution to define the effects of changing
scales. To do this, we will start with a scaling function
x
rather than directly with the wavelet
x
. After
the scaling function is defined from the concept of resolu-
tion, the wavelet functions will be derived from it. Good
reviews of the wavelet transform are given in [6] and [7].
In following, a short review and mathematical interpreta-
tions of the wavelet transform are given [6] and [7].
We define a set of scaling functions in terms of integer
translates of the basic scaling function by

2
.
kxxkk L
 
 
(1)
The subspace of
2
L spanned by these functions
is defined as

0k
k
V Spanx
(2)
for all integers k, k
. This means that

0
for any .
kk
k
f
xax fxV

(3)
One can generally increase the size of the subspace
spanned by changing the spatial scale of the scaling func-
tions. A two-dimensional family of functions is gener-
ated from the basic scaling function by scaling and trans-
lation by
2
,22
jj
jk
x
xk

(4)
whose span over is
k

,
2j
jk jk
kk
V SpanxSpanx


(5)
for all integers k
. This means that if
j
f
xV
, then
it can be expressed as
2
j
k
k
.
f
xaxk

(6)
For , the span can be larger since
0j
,jk
x
0
be-
comes narrower and is translated into smaller steps. It,
therefore, can represent finer details. For,
j
,jk
x
is wider and is translated into larger steps. So these
wider scaling functions can represent only coarse in-
formation, and the size of the space they span is smaller.
In order to follow our intuitive ideas of scale or resolu-
tion, we formulate the basic requirements of multireso-
lution analysis (MRA) by requiring nested spanned
spaces as
Copyright © 2010 SciRes. WSN
A. AGHAGOLZADEH ET AL.
330
L
2
21012
VVVVV L

 (7) Haar
or
1 for all
jj
VV j
(8)
with

2
, .VV
 
(9)
The space that contains high resolution signals also
contains those of lower resolution.
Because of the definition of V, all spaces have to
satisfy a natural spacing condition:
j
 
1
2
j
j
f
xV fxV
  (10)
which ensures that all elements in a space are simply scaled
versions of the elements in the next space. This relationship
of the spanned spaces is illustrated in Figure 1.
The nesting of the spans of
2j
x
k
, denoted by
and graphically illustrated in Figure 1, is achieved
by requiring that
j
V
1
x
V
. This means that if
x
is
in , it is also in , the space spanned by
0
V1
V
2
x
.
This means that
x
can be expressed in terms of a
weighted sum of the shifted
2
x
as
 
22 ,
n
xhn xnn


(11)
where the coefficients are a sequence of real or
possibly complex numbers called the scaling function
coefficients (or the scaling filter or the scaling vector)
and the

hn
2 maintains the norm of the scaling function
with the scale of two. This recursive equation is funda-
mental to the theory of the scaling functions and is re-
ferred by different names such as refinement equation,
multiresolution analysis (MRA) Equation, or dilation
Equation.
The Haar scaling function is the simple unit-width,
unit-height pulse function

x
shown in Figure 2(a).
It is obvious that

2
x
can be used to construct

x
by
Figure 1. Nested vector spaces spanned by the scaling functions.
Scaling Function Haar Wavelet Function
(a) (b)
db5 Scaling Function db5 Wavelet Function
(c) (d)
Sym5 Scaling Function Sym5 Wavelet Function
(e) (f)
Figure 2. “Haar”, “db5”, and “Sym5” scaling and wavelet
functions.

22xxx
 
1

(12)
which means that relation (11) is satisfied for coeffi-
cients

012h,
112h. The fifth-order
Daubechies scaling function shown in Figure 2(c), satis-
fies relation (11) for
0 0.1601h,
1 0.6038h,
2 0.7243h,
30h.1384
, ,
9h
0.0033. Also,
the fifth-order Symlet scaling function shown in Figure 2(e)
satisfies Equation (11) for
0 0.0195h,
1 0.0211h ,
2 0.1753h ,
3h0.0166
, ,
9 0.0273h. In-
deed, the design of wavelet systems is how to choose the
coefficients
hn.
321
VVVV The important features of a signal can better be de-
scribed or parameterized, not by using
,jk
x
and
increasing to increase the size of the subspace
spanned by the scaling functions, but by defining a
slightly different set of functions
j

,jk
x
that span the
differences between the spaces spanned by the various
scales of the scaling function. These functions called the
wavelet functions. There are several advantages for re-
quiring that the scaling and wavelet functions be or-
0
Copyright © 2010 SciRes. WSN
A. AGHAGOLZADEH ET AL.331
thogonal. Orthogonal basis functions allow simple cal-
culation of expansion coefficients and also Parseval’s
theorem holds that allows partitioning of the signal’s
energy in the wavelet transform domain. The orthogonal
complement of in is defined as . This
means that all members of are orthogonal to all
members of . We require
j
V
jk
1j
V
,,jl
xx
,,jkl
j
W
j
V

j
W

0dx

(13)
for all corresponding
0
W
0
W
. The relationship be-
tween the various subspaces can be seen from the fol-
lowing expansions. From (7), we may start at any ,
say at , and write
j
V
0j
2
01 .L 
2
V
10


00
W
VV
2
(14)
We now define the wavelet spanned subspace
such that
0
W
VV (15)
which extends to
20 1
.VV W (16)
In general, this gives
1
LV W (17)
when is the initial space spanned by the scaling
function
0
V
x
k
2
LV
2
j
. Figure 3 pictorially shows the nest-
ing of the scaling function spaces for the different
scales and how the wavelet spaces are the disjoint
differences (except for the zero element) or, the or-
thogonal complements.
j
V
j
The scale of the initial space is arbitrary and could be
chosen at a higher resolution of, say, to give
10j
10 11
LV W
10
W
55
W

j
W
(18)
or at a lower resolution such as to give
5
4
  (19)
or at even where (17) becomes
Figure 3. Scaling and wavelet functions vector spaces.
2
21012
LWWWWW

  (20)
eliminating the scaling space altogether.
Since these wavelets reside in the space spanned by
the next narrower scaling function, , they can be
represented by a weighted sum of the shifted scaling
function
0
WV1
2
x
defined in (11) by

122 ,
n
xhn xnn


(21)
for some set of coefficients . From the require-
ment that the wavelets span the difference or orthogonal
complement spaces, and the orthogonality of the integer
translates of the wavelet (or scaling function), it can be
shown that the wavelet coefficients (modulo translations
by integer multiples of two) are required by orthogonal-
ity to be related to the scaling function coefficients by

1
hn
 
111
n
hnh n .
t
(22)
The function generated by (21) gives the prototype or
the mother wavele
x
for a class of expansion
functions of the form
2
,22
jj
jk
x
xk

(23)
where is the scaling of
2j
x
, is the translation in k
x
, and 2
2j maintains the norm of the wavelets for
the different scales. The Haar wavelet function which is
associated with the scaling function in Figure 2(a), is
shown in Figure 2(b). For the Haar wavelet, the coeffi-
cients in (21) are
2
L
101h2,

1
h112 which
satisfy Equation (22). Daubechies and Symlet wavelet
functions associated with the scaling functions in Fig-
ures 2(c) and 2(e), are shown in Figures 2(d) and 2(f),
respectively.
We have now constructed a set of functions
k
x
and
,jk
x
that could span all of . According
to (17), any function

2
L
2
gx L could be written
 
 
,
0
,
k
k
jk
jk
gxck x
djkx




21
WW V
(24)
as a series expansion in terms of the scaling function and
wavelets. In this way, the first summation in (24) gives a
function that is a low resolution or coarse approximation
of
g
x. For each increasing index in the second
summation, a higher or finer resolution function is added,
which leads to more details.
j
0
V
00
W
0
W
32
VV V
1
V
2
W1
W 0
Copyright © 2010 SciRes. WSN
A. AGHAGOLZADEH ET AL.
332
The 1-D Discrete Wavelet Transform: Since
000
2
1jjj
LV W W
  (25)
by using (4) and (23), a more general statement for the
expansion Equation (24) can be given by
 



00
0
0
2
2
22
22
jj
j
k
jj
j
kjj
g
xck xk
dk xk


(26)
or
 
 
00
0
,
,
jjk
k
jjk
kjj
gxc kx
dk x
 (27)
where could be zero as in (17) and (24), it could be
ten as in (18), or it could be negative infinity as in (20)
where no scaling functions are used. The choice of
sets the coarsest scale whose space is spanned by
0
j
0
j
0,jk
x
. The rest of is spanned by the wavelets
which provide the high resolution details of the signal.
The coefficients in this wavelet expansion are called the
one-dimensional discrete wavelet transform (1-D DWT)
of the signal

2
L

g
x. If the wavelet system is orthogonal,
these coefficients can be calculated by inner products
 
,jjk
ck gxxdx
(28)
and
 
,jjk
dk gxxdx
(29)
The DWT is similar to Fourier series but, in many
ways, is much more flexible and informative. It can be
made periodic like Fourier series to represent periodic
signals efficiently. However, unlike Fourier series, it can
be used directly on non-periodic transient signals with
excellent results.
The 2-D Discrete Wavelet Transform: The
one-dimensional transforms of the previous discussion
are easily extended to two-dimensional functions like
images. In two dimensions, a two-dimensional scaling
function,
,
x
y
, and three two-dimensional wavelets,
,, ,,
HV
,
D
x
yx

yx
y
are required. Each is
the product of one-dimensional scaling function
and
corresponding wavelet
. Excluding products of func-
tions with the same variable that produce one-
dimensional results, like
 
x
x

, the four possible
products produce the separable scaling function
 
,
and the separable directionally sensitive wavelets

,
H
x
yx

y (31)

,
V
x
yx

y (32)

,
D.
x
yxy

(33)
These wavelets measure functional variations – inten-
sity or gray-level variations for images – along the dif-
ferent directions:
H
measures the variations along
columns (for example, horizontal edges), V
responds
to the variations along rows (like vertical edges), and
D
corresponds to the diagonals variations. The direc-
tional sensitivity is a natural consequence of the separa-
bility imposed by Equations (31) to (33); it does not in-
crease the computational complexity of the two-
dimensional transform.
Given separable two-dimensional scaling and wavelet
functions, extension of the one-dimensional DWT to
two-dimensions is straightforward. We first define the
scaled and translated basis functions:
2
,, ,22 ,2
jj j
jmn
x
yxmy

n
 (34)



2
,, ,2 2,2
,,.
ijijj
jmn ,
x
yxmy
iHVD


n
(35)
The discrete wavelet transform of function
,
g
xy
of size
M
N
is then
 
0
0
11
,,
00
,
1,,
j
MN
jmn
xy
cmn
g
xy xy
MN


 (36)
 

0
11
,,
00
,
1,,
,, .
i
j
MN
i
jmn
xy
dmn
,
g
xy xy
MN
iHVD


 (37)
As in the one-dimensional case, is an arbitrary
starting scale and the
0
j
0,
j
cmn coefficients define an
approximation of
,
g
xy at scale. The
0
j
,
i
j
dmn
coefficients add horizontal, vertical, and diagonal details
for scales. We normally let and select N =
M = 2J so that j = 0,1,2,…, J–1 and m, n = 0,1,2,…,
0
jj0
j0
2j–1. Given the
0,
j
cmn and of Equations
(36) and (37),
,
i
j
dmn
,
g
xy is reconstructed via the inverse
discrete wavelet transform
x
yx

y (30)
Copyright © 2010 SciRes. WSN
A. AGHAGOLZADEH ET AL.333
  
 
00
0
,,
,,
,,
1
,,
1,,
.
jjmn
mn
ii
jjmn
iHVDj jmn
,
g
xyc mnxy
MN
dmn xy
MN



(38)
In the next section, we will apply the 2-D discrete
wavelet transform to the frames of a video sequence in-
dependently to extract the low frequencies and the high
frequencies components of each video frame.
3. Proposed Video Coding System
As mentioned before, the main idea of this paper is to
decompose a given video stream into two separated parts
such that one part includes low frequencies components
(information about the main structures and the back-
ground of video frames) and the other part includes high
frequencies components (information about edges, bor-
ders, and details of the video frames). The decomposition
of the input video stream into two separated components
is accomplished through the two-dimensional discrete
wavelet transform.
As shown in the previous section, there are several
well-known families of wavelets which can be used in
image processing tasks such as Haar wavelets, Daube-
chies wavelets and Symlets (short form for symmetrical
wavelets). Among the different families of wavelets,
Haar wavelet transform is the simplest one and has very
low complexity; for this reason it is used in many appli-
cations in signal and image processing. Hence, in our
proposed method, we use two-dimensional Haar wavelet
as default. Of course, in order to generalize our technique
for other types of wavelets, we have tested our proposed
scheme by the fifth-order two-dimensional Daubechies
wavelet and the fifth-order two-dimensional Symlet. The
results are given in Section 4.
Since H.264 codec is more compatible with coding the
main structures of the objects and the low frequencies
components in a video sequence, the proposed method
utilizes two-dimensional wavelet transform to extract the
low frequencies components from video sequence and
encode them by H.264 codec. The visual quality of these
components directly depends on the quantization factor
and the other parameters of H.264 video codec. In our
proposed method, the low frequencies part of each frame
has comparatively very smaller dimensions. Quantizing
these parts of the video with more bits and utilizing the
efficient types of motion estimation for motion compen-
sation will increase the quality of the reconstructed
video.
The remaining parts of the frames in the video stream,
which are the high frequencies components, should be
encoded by a different way. Since a large number of very
small quantities are produced during the decomposition
process, they can be neglected by assigning zero values
after a thresholding procedure. So, a large number of
zeros are the most repeated symbols in the high frequen-
cies bands. When a specific symbol is repeated very fre-
quently in a sequence, an optimum source coding proce-
dure can be done by Run Length Coding (RLC). In a raw
of “zero” repetitions, one "zero" symbol and the number
of repetitions are encoded afterward. The more the sym-
bol “zero” is repeated, the more the sequence is com-
pressed [8]. By applying a proper threshold value, the
enough number of zeros is produced, so the compression
rate is increased. This hard threshold value (T) is simply
applied on each transform coefficient value () of the
high frequencies bands by the following decision equa-
tion:
,ij
P
,,
,
0
ij ij
ij
PPT
P
otherwise
(39)
Figure 4 shows the block diagram of the overall pro-
posed system. First of all, the two-dimensional discrete
wavelet transform is applied on the video source and the
low frequencies part is encoded by H.264 codec and the
remaining parts, which include information mostly about
the video objects’ edges and borders, are encoded using
RLC algorithm.
To apply the two-dimensional wavelet transform on a
given video sequence, it is applied on each frame of
video sequence, independently. Since the video is QCIF
formatted, each frame contains luminance (Y) and
chrominance (Cb and Cr) layers; therefore the two-
dimensional wavelet transform is applied three times for
each frame. By recollecting the LL band of the lumi-
nance and chrominance values for each frame and com-
bining them into a video with sequenced frames, a new
video sequence is generated with very smaller dimen-
sions, with the same structure as the original video se-
quence.
Figure 5 shows an example for two-dimensional
wavelet transform. Figure 5(a) is a frame of “Suzie”
Figure 4. Block diagram of the proposed system.
Copyright © 2010 SciRes. WSN
A. AGHAGOLZADEH ET AL.
334
(a)
(b)
LL LH
HL HH
(c)
Figure 5. (a) a video frame; (b) two-dimensional Haar wave-
let transform of the frame; (c) the corresponding bands.
video sequence. After applying a two dimensional Haar
wavelet transform on it, Figure 5(b) is obtained. Finally,
Figure 5(c) indicates the corresponding LL, LH, HL, and
HH bands according to Figure 5(b). Considering that the
LH, HL, and HH bands show the disparity between the
neighboring pixels, respectively, in the horizontal, verti-
cal and oblique directions, these bands resemble the
edges and borders in a frame of video. Therefore the
corresponding regions in the frame which do not have
edges and borders, produce zero or near zero values for
these bands. Also applying the hard threshold value can
simply increase the number of “zero” symbols. By in-
creasing the threshold value, more “zero” symbols are
produced and the compression rate is increased; therefore
fewer bits are utilized for encoding by RLC algorithm. In
other words, the amount of bits used to represent the high
frequencies components of a frame is negligible when
compared to the amount of bits produced by H.264 en-
coder to represent the low frequencies components of
that frame [9].
4. Experimental Results
In this section, the results of the proposed method are
compared with the results of H.264 default mode. At first,
we need to choose a proper threshold value. A suitable
value for the threshold can be chosen by cross-validation.
The proposed method is applied on some famous test
video samples like “Suzie” and “foreman” video se-
quences. Experiments on these video sequences show
that by selecting the hard threshold value so that about
95 percent of the coefficients in the high frequencies
bands are set to “zero”, the best rate-distortion perform-
ance can be achieved. It is noticeable that for achieving
the equal compression rates for the LH, HL, and HH
bands and also in different layers of the input video (lu-
minance and chrominance layers), the different amount
of threshold values must be applied for the different
bands, since the required threshold value for the HH
band is lower than the required threshold value for the
LH and HL bands. In Figure 6, the hard threshold value
is chosen so that about 95 percent of the quantities in any
band, except the LL band, will be “zero”; therefore an
equivalent compression is achieved for all three bands. It
must be mentioned that the quantities produced by the
two-dimensional wavelet transform for the LH, HL, or
HH bands are either positive or negative values; there-
fore an absolute threshold value is applied by the deci-
sion Equation (39).
The rate-distortion plots of the proposed method and
H.264 default mode are compared in Figure 7 for
“Suzie” video sequence. Rate-distortion plot presents the
amount of PSNR over the different bit rates. PSNR for
the default mode is computed by comparing the output
video of H.264 decoder with the original video (input
video) pixel-wise, where the dimensions of each frame
are 176×144 pixels. The proposed method utilizes the
two-dimensional Haar wavelet transform; therefore the
dimensions of each input frame to H.264 encoder are
LL (H.264) LH (95%)
HL (95%) HH (95%)
Figure 6. The hard threshold values are chosen so that
about 95 percent of the coefficients in the high frequency
regional bands are set to “zero”.
Copyright © 2010 SciRes. WSN
A. AGHAGOLZADEH ET AL.335
88 × 72 pixels. Hence, the spatial resolution of the pro-
posed method is 4 times smaller than the original H.264
mode, resulting in a very large compression rate; but
PSNR is quite comparable for very low bit rates.
In order to test the performance of the proposed
method for the other types of wavelets, we also test our
proposed technique on the fifth-order Daubechies wave-
let and the fifth-order Symlet wavelet. The rate-distortion
plots of the proposed method by these wavelets are
compared with Haar default wavelet and the original
H.264 mode in Figure 8. As it shows, performance of
our proposed system for these families of wavelets is
comparable with Haar wavelet. This implies that we can
easily generalize our proposed method for the other
suitable types of wavelets.
In order to compare the visual quality of the decoded
videos subjectively, we also show a sample frame of the
Figure 7. Comparison between rate-distortion plots of the
original H.264 and the proposed method by “Haar” wavelet
for “Suzie” video sequence.
Figure 8. Comparison among rate-distortion plots of the
original H.264 and the proposed method by “Haar”, “db5”,
and “Sym5” wavelets for “Suzie” video sequence.
reconstructed videos for both the original H.264 codec
and our proposed system in Figure 9. As we can see in
Figure 9, the visual quality of the decoded video frame
for the proposed scheme (right side pictures) is much
better than the visual quality of the decoded frame for the
original H.264 method (left side pictures). Although for
the high bit rates, the proposed method can not achieve
good results, but for very low bit rates, it shows superior
results.
(a)
(b) (c)
(d) (e)
(f) (g)
Figure. 9. Subjective comparison between the visual quali-
ties of the decoded videos for a sample frame of “Suzie”
video sequence: (a) the original input frame; (b), (d), (f) the
outputs for H.264 decoder at rates 10 kbps (PSNR=27.8), 11
kbps (PSNR=28.2), 13 kbps (PSNR=28.9), respectively; (c),
(e), (g) the outputs for the proposed decoder at the rates 10
kbps (PSNR=28.7), 11 kbps (PSNR=29), 13 kbps (PSNR=
29.7), respectively.
Copyright © 2010 SciRes. WSN
A. AGHAGOLZADEH ET AL.
Copyright © 2010 SciRes. WSN
336
The main advantages of the proposed method are sum-
marized as follows:
Advantage 1: For a bit rate between 4 to 16 kb/s (very
low bit rates), PSNR of the proposed method is higher
than PSNR of H.264 default mode. Since the most impor-
tant information is lost during quantizing with high quan-
tization factors, the proposed method avoid losing this part
of information by separating them from the original video
and then coding them using RLC algorithm. This property
can highly be utilized in applications when very low bit
rates are requested for video communication (such as
videoconferencing and video telephony).
Advantage 2: The proposed method, compared with
H.264 default mode, can achieve good performance for
the much less bit rates. Therefore the proposed method can
be utilized for sending video over very low capacity
channels like the home-used dial-up connections. There is
another case for very low capacity channels in which our
proposed video coding system can be used effectively. In
GSM (Global System for Mobile communication) net-
works, speech or other data are communicated between
BTS (Base Transceiver Station) and MS (Mobile Station)
mostly over a half-rate traffic channel at rate 11.4 kbits/s.
If we want to transmit or receive a video sequence over
this very low capacity channel, It will be better to use the
proposed video coding scheme of this paper since it pro-
vides much more acceptable basic-quality video in such a
bit rate (11.4 kbits/s) compared to the original H.264 co-
dec as can be seen in Figure 9.
Advantage 3: The most challenging problem of
H.264/AVC standard is its high computational complex-
ity which has limited its usage in real-life applications.
The computational complexity of H.264/AVC standard is
directly related to the dimensions of the frames in the
video sequences. Therefore reducing the spatial resolu-
tion to a quarter of the size of the original resolution
would reduce the computational complexity dramatically.
Since the computational complexity of the wavelet
transform in comparison to the computational complex-
ity of H.264 codec is almost negligible; therefore the
proposed method is much faster than the case using just
H.264 codec. This helps to improve the performance of
H.264/AVC standard to be more compatible with the
new emerging applications.
5. Conclusions
In this paper we described a novel video compression
approach that combines H.264/AVC standard and two-
dimensional discrete wavelet transform. The main goal
of our proposed method is enhancing the performance of
H.264/AVC standard to be more reliable for very low
bit-rate applications. To do this, video information is
decomposed into two parts, known as the low frequen-
cies components and the high frequencies components,
which contain information about the objects’ main struc-
tures and edges, respectively. To decompose this informa-
tion, the two-dimensional discrete wavelet transform is
applied on the sequenced frames. Then the low frequen-
cies parts of all frames are encoded by H.264/AVC stan-
dard while the high frequencies parts of frames are en-
coded using RLC algorithm. As revealed by experiments,
the main advantage of the proposed method compared to
H.264 default mode is requiring lower bit rate for the same
value of PSNR in case of very low bit rates. Also we
showed that the proposed method is computationally more
efficient than the ordinary H.264/AVC standard.
6. Acknowledgement
This research has been supported by Iran Telecommunica-
tion Research Center, Tehran, Iran, which is appreciated.
7. References
[1] B. J. Kim, Z. Xiong and W. A. Pearlman, “Low Bit-Rate
Scalable Video Coding with 3-D Set Partitioning in Hier-
archical Trees (3-D SPIHT),” IEEE Transactions on Cir-
cuits and Systems for Video Technology, Vol. 10, No. 8,
2000, pp. 1374-1386.
[2] I. E. G. Richardson, “Video Codec Design Developing
Image and Video Compression Systems,” John Wiley &
Sons, 2002.
[3] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narro-
schke, F. Pereira, T. Stockhammer and T. Wedi, “Video
Coding with H.264/AVC: Tools, Performance, and Com-
plexity,” IEEE Transactions on Circuits and Systems for
Video Technology, Vol. 4, No. 1, 2004, pp. 7-28.
[4] T. Wiegand, G. J. Sullivan, G. Bjntegaard and A. Luthra,
“Overview of the H.264/AVC Video Coding Standard,”
IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 13, No. 7, 2003, pp. 560-576.
[5] R. Talluri, K. Oehler, T. Bannon, J. D. Courtney, A. Das.
and J. Liao, “A Robust, Scalable, Object-Based Video
Compression Technique for Very Low Bit-Rate Coding,”
IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 7, No. 1, 1997, pp. 221-232.
[6] C. S. Burrus, R. A. Gopinath and H. Guo, “Introduction
to Wavelets and Wavelet Transforms: A Primer,” Pren-
tice Hall, 1998.
[7] R. C. Gonzalez and R. E. Woods, “Digital Image Proc-
essing,” 2nd Edition, Prentice Hall, 2002.
[8] D. Salomon, “Data Compression: The Complete Refer-
ence,” 4th Edition, Springer, Berlin, 2007.
[9] A. Aghagolzadeh, S. Meshgini, M. Nooshyar and M.
Aghagolzadeh, “A Novel Video Compression Technique
for Very Low Bit-Rate Coding by Combining H.264/ AVC
Standard and 2-D Wavelet Transform,” Proceedings of 9th
International Conference on Signal Processing, Beijing,
2008, pp. 1251-1254.