Int. J. Communications, Network and System Sciences, 2010, 3, 364-368
doi:10.4236/ijcns.2010.34046 Published Online April 2010 (
Copyright © 2010 SciRes. IJCNS
Team Spirit Model Using MPEG Standards for Video
Neetu Singh, Piyush Chauhan, Nitin Rakesh, Nitin Nitin
Department of C SE & IT, Jay pee Universi t y of Inf or m at i on Tech nology, Waknagh at , I ndia
Received December 10, 2009; revised January 18, 2010; accepted February 21, 2010
Today the multimedia content delivery chain faces many challenges. There are increasing terminal diversity,
network heterogeneity and pressure to satisfy the user preferences. Therefore, the need of customized con-
tents comes in light to provide users the best possible experience. In this paper, we address the problem of
multimedia customization. For the customized content, we suggest the team spirit model (TSM) that uses the
web edition content description, MPEG-4 standards and the MPEG-21 multimedia framework. It efficiently
implements the video customization
Keywords: TSM, MPEG-4, MPEG-21
1. Introduction
Users are already swamped by the quantity of material
available on conventional mass media. In today’s world,
the larger range of multimedia material is on hand .This
can only serve up to offer customers with so much cho-
ice that they are overwhelmed by it. In addition, obvi-
ously they expect to be provided with services that grant
access to information. The service might be due to the
increased accessibility of technologies on a broad level.
It arises the need to customize those services with re-
garding to the user’s receiving device, which could be
from a handheld to a widescreen terminal. If not so then
the provided information could be inadequate. If at users
widescreen device the least standard is applied, it would
suffer under-represented information while a maximum
standard of handheld devices might not be able to prop-
erly view or use the service at all [1]. That is why the
obvious way of optimizing and adapting are needed. The
provided service matches to the device to maintain or
even enhance user-friendliness. It is the only way to cope
with the technical convergence of devices, which would
provide access to the similar services. In this paper, our
aim probe into the use of dynamic components involved,
which should take into user characteristics and assure an
efficient way to “access any information from any ter-
minal”, eventually after some kind of adaptation.
In this paper, we are presenting the team spirit model,
which take input as web edition content that stores all
content information, MPEG-4 Advanced video coding
(AVC), and File Format for audio and video and lastly
the MPEG-21 usage environment for other necessary
description to efficiently implement adaption and finally
achieve the customization in the well-mannered way. In
TSM scenarios, it is essential to customize more easily
and efficiently for the desired content. The contents have
available descriptions of the parts that fit to be matched/
bridged: the content and the usage environment. The
major objective of this paper probes into the use and de-
velopment of a TSM system that has capable of manag-
ing and customizing content so that any information can
be delivered to different terminals and networks.
2. Related Work
Video customization is demand of today’s world. The
customization can be achieved efficiently by using vari-
ous models like hybrid multimedia access (HMA) Model
and Universal Multimedia Access (UMA) Model using
standards MPEG-7 and MPEG-21 [1-4]. In this paper, we
are focusing on Team Spirit as an adaptation engine to
implement the video customization using some advance
technology based on MPEG standards. The major objec-
tive of this paper is to discuss the role of the various
MPEG standards in the context of multimedia customiza-
tion adaptation and to contribute for a better organization
and understanding of the multimedia customization for
Copyright © 2010 SciRes. IJCNS
3. MPEG Standards Used
MPEG-4 enables different software and hardware devel-
opers to create multimedia objects p ossessing better abili-
ties of adaptability and flexibility to improve the quality
of such services and technologies as digital television,
graphics, the World Wide Web and their extensions. This
standard enables developers to better control their content
and to fight more effectively against copyright violation.
Data network providers can use MPEG-4 for data trans-
parency. With the help of standard procedures, MPEG-4
data can be interpreted and transformed into other signal
types compatible with any available network.
3.1. MPEG-4 (Part-14) File Format
The MP4 file format is intended to contain the informa-
tion regarding the media of an MPEG-4 presentation in a
flexible, extensible format that in turn facilitates inter-
change, management, editing, and presentation of the
media. The presentation may be view as ‘local’ to the
system containing the presentation, or may be via a net-
work or other stream delivery mechanism. The file for-
mat is designed to be independent of any particular de-
livery protocol while enabling efficient support for de-
livery in general. The Figure 1, gives an example of a
simple interchange file, containing three streams.
The composition of the Mp4 file format is of object-
oriented structures called ‘atoms’. A unique tag and a
length identify each atom. Most atoms describe a hierar-
chy of metadata giving information such as index points,
durations, and pointers to the media data. Now the ‘movie
atom’ can be defined as the collection of these atoms. The
media data itself is located elsewhere; it can be in the
MP4 file, contained in one or more ‘mdat’ or media data
atoms, or located outside the MP4 file and referenced via
URL’s [5].
3.2. MPEG-4 Video Image Coding Scheme
Figure 2 below outlines the description of the MPEG-4
Mp4 file
Tr a k
vid eo
IOD Mdat
time-ordered, BIFS,
OD, video and audio
acce ss u n its
Figure 1. Example of a simple interchan ge file.
DCT QMotion
Figure 2. Basic block diagram of MPEG-4 video coder.
video algorithms to encode rectangular as well as arbi-
trarily shaped input image sequences.
The basic coding structure involves shape coding (for
arbitrarily shaped VOs) and motion compensation as
well as DCT-based texture coding (using standard 8 × 8
DCT or shap e adap tive DC T). Th e basic coding struc ture
involves shape coding (for arbitrarily shaped VOs) and
motion adaptive DCT). The basic coding structure in-
volves shape coding (for arbitrarily shaped VOs) and
motion compensation as well as DCT-based texture cod-
ing (using standard 8 × 8 DCT or shape adaptive DCT).
An important advantage of the content-based coding
approach MPEG-4 is that the compression efficiency can
be significantly improved for some video sequences by
using appropriate and dedicated object-based motion pre-
diction “tools” for each object in a scene.
A number of motion prediction techniques can be used
to allow efficient coding and flexible presentation of the
objects [5]:
Standard 8 × 8 or 16 × 16 pixel block-based motion
estimation and compensation, with up to ¼ pel accuracy.
Global Motion Compensation (GMC) for video ob-
jects: Encoding of the global motion for a object using a
small number of parameters. GMC is based on global
motion estimation, image warping, motion trajectory
coding and texture coding for prediction errors.
Global motion compensation based for static “spr-
ites”. A static sprite is a possibly large still image, de-
scribing panoramic background. For each consecutive
image in a sequence, only 8 global motion parameters
describing camera motion are coded to reconstruct the
object. These parameters represent the appropriate affine
transform of the sprite transmitted in the first frame.
Copyright © 2010 SciRes. IJCNS
Quarter Pel Motion Compensation enhances the
precision of the motion compensation scheme, at the cost
of only small syntactical and computational overhead. A
accurate motion description leads to a smaller prediction
error and, hence, to better visual quality.
Shape-adaptive DCT: In the area of texture coding,
the shape-adaptive DCT (SA-DCT) improves the coding
efficiency of arbitrary shaped objects. The SA-DCT algo-
rithm is based on predefined orthonormal sets of one-
dimensional DC T basi s funct ions.
Figure 3 depicts the basic concept for coding an MP-
EG-4 video sequence using a sprite panorama image. It is
assumed that the foreground object (tennis player, image
top right) can be segmented from the background and that
the sprite panorama image can be extracted from the se-
quence prior to coding. (A sprite panorama is a still image
that describes as a static image the content of the back-
ground over all frames in the sequence). The large pano-
rama sprite image is transmitted to the receiver only once
as first frame of the sequence to describe the background
- the sprite remains is stored in a sprite buffer. In each
consecutive frame, only the camera parameters relevant
for the background are transmitted to the receiver. This
allows the receiver to reconstruct the background image
for each frame in the sequence based on the sprite [5].
The moving foreground object is transmitted separately as
an arbitrary-shape video object. The receiver composes
both the foreground and background images to recon-
struct each frame (bottom picture in figure below). For
low delay applicatio ns, it is possible to tran smit the sprite
in multiple smaller pieces over consecutive frames or to
build up the sprite at the decoder progressively.
Subjective evaluation te sts wi thi n M PEG hav e show n
Figure 3. Sprite coding of video sequence.
that the combination of these techniques can result in a
bit stream saving of up to 50% compared with the ver-
sion 1, depending on content type and data rate.
3.3. MPEG-4 (Part-10) Avc/H.264
The intent of the H.264/AVC project was to create a
standard capable of providing good video quality at sub-
stantially lower bit rates than previous standards without
increasing the complexity of design so much that it
would be impractical or excessively expensive to imple-
ment. An additional goal was to provide enough flexibil-
ity to allow the standard to be applied to a wide variety
of applications on a wide variety of networks and sys-
tems, including low and high bit rates, low and high-
resolution video, packet networks. The H.264 standard is
a “family of standards”: the members of which are the
profiles descri bed bel ow .
A specific decoder decodes at least one, but not nec-
essarily all, profiles. The decoder specification describes
which of the profiles can be decoded. Scalable video
coding as specified. H.264/AVC allows the construction
of bitstreams that contain sub-bitstreams that conform to
H.264/AVC. For temporal bitstream scalability, i.e., the
presence of a sub-bitstream with a smaller temporal sam-
pling rate than the bitstream, complete access units are
removed from the bitstream when deriving the sub-bit-
stream. In this case, high-level syntax and inter predicti-
on reference pictures in the bitstream are constructed
accordingly. For spatial and quality bitstream scalability,
i.e. the presence of a sub-bitstream with lower spatial res-
olution or quality than the bitstream, NAL (Network Ab-
straction Layer) removed from the bitstream when de-
riving the sub-bitstream. In this case, inter-layer predic-
tion, i.e., the prediction of the higher spatial reso lution or
quality signal by data of the lower spatial resolution or
quality signal, is typically used for efficient coding.
3.4. MPEG-21 Multimedia Framework
MPEG-21 framework is based on two essential concepts:
the definition of a fundamental unit of distribution and
transaction (the Digital Item) and the concept of Users
interacting with Digital Items. The Digital Items can be
considered the “what” of the Multimedia Framework
(e.g., a video collection, a music album) and the Users
can be considered the “who” of the Multimedia Frame-
work. In practice, a Digital Item is a combination of re-
sources, metadata, and structure. The resources are the
individual assets or (distributed) resources. The metadata
comprises informational data about or pertaining to the
Digital Item as a whole or to the individual resources
included in the Digital Item.
Finally, the structure relates to the relationships among
the parts of the Digital Item, both resources and metadata.
Within MPEG-21, a resource is defined as an individu-
Copyright © 2010 SciRes. IJCNS
ally identifiabl e asset su ch as a video or audio clip, an im-
age, or a textual asset. Hence, terms like multimedia con-
tent, media data, image, graphics, video, movie, visual
content, audio data, speech content, etc. become unnec-
essary and should be avoided in the context of MPEG-21
when referring to resources with no specific context.
Within any system (such as MPEG-21) that proposes to
facilitate a wide range of actions involving “Digital Ite-
ms”, there is a need for a very precise description for de-
fining exactly what constitutes such an “item”.
Clearly, there are many kinds of content, and probably
just as many possible ways of describing it to reflect its
context of use. This presents a strong challenge to lay out
a powerful and flexible model for Digital Items, which
can accommodate the myriad forms that content can take
(and the new forms it will assume in the future). Such a
model is only truly useful if it yields a format that can be
used to represent any Digital Items defined within the
model unambiguously and communicate them, and in-
formation about them, successfully. The Digital Item
Declaration specification (part 2 of ISO/IEC 21000) pro-
vides such flexibility for representing Digital Items. Part
7 of the MPEG-21 standard Digital Item Adaptation (D-
IA) have been created mainly to address the usage envi-
ronment description problem [6].
3.5. Web Edition Content Description
Web edition is a database-grounded web content man-
agement system where in a central database; it stores all
“content information” including text, images, graphics,
flash movies, style sheets, etc. The plus point of this sys-
tem is that the page itself does not have to be edited if the
content has to be changed. As an alternative, only the
database entry has to be changed and the corresponding
page re-saved. In addition, the content stored in the da-
tabase can be quickly searched or used in other ways.
Therefore, it can be added that a central database con-
tains all the content on the web site user wish to manage
with web editio n. However, user hardly no tices the data-
base, because it runs initially in the background. Users
only see their web sites and the input boxes on the actual
web pages. The web edition program only has to be in-
stalled on the provider's (ISP) server once [7].
4. Team Spirit Engine
Team Spirit® Figure 4 Mobile is a comprehensive voice
and video engine SDK, including a library of highly op-
timized low-MIPS consuming voice and video codecs to
enable mobile video over IP communication on a broad
range of handsets, speech enhancement and network ad-
aptation algorithms [8]. To speed up deployment and se-
cure flawless voice quality on every handset, the engine
can use tuning Wizard for automatic adjustment of AEC
parameters. MHz-efficiency and Wi-Fi adaptation are
extremely important for resource constrained mobile de-
vices, intended for usage in public Wi-Fi networks, more
susceptible to delays, jitter, packet loss, aggregated echo,
noise and congestion. The SPIRIT’s Engine handles
Wi-Fi-specific network impairments and makes video
transmission over Wi-Fi/WiMAX more stable and reli-
able. This Engine can be run even on 200 MHz processors.
Simultaneously, the Engine supports both software and
hardware accelerators inside popular application proces-
sors to deliver CIF quality video on mobile devices.
5. Team Spirit Model for Video
Starting with the more traditional coding tools such as
MPEG-1, and MPEG-2, the recent scalable video coding
tools is MPEG-4 Fine-Grain-Scalability (FGS), and pass-
ing through web edition content description.MPEG stan-
dardization culminates with the MPEG-21 multimedia
framework which offers a wrapper to allow all the pieces
in a multimedia customization chain to integrate and in-
teract with each other.
Figure 4. Layered architecture of team spirit.
Video Processin
Application Layer
Call Manager (SIP, H.323,H.324M)
Voice Processing
Video Processimg
Team S p ir it ® 3. 0Vo i c e& V i de o E n gi ne M o bile Framework
Transport & NAT Traversal Support (RTP;SRTP/RTCP,STUN)
OS/Hardware Dependent IO (Sound IO,Video IO)
Speech co-
dec Libraries
Duple xTM
tion DTMF
3rd Party\Voice
Sync. N-way Con-
ference Mixer
RTP Packetizers
Video PLC
Video Codec
H.263, H.264
3rd Party Video
Copyright © 2010 SciRes. IJCNS
Figure 5. Team spirit model.
Figure 5 depicts a possible configuration of a multi-
media customization chain using all MPEG standards
Basically adaptation engine require three inputs, i.e.,
firstly, MPEG-4 AVC and file format for audio and video
processing so that whatever the audio and video require-
ment like formatting, compression, coding and other es-
sentials regarding both. Secondly, content description
web Edition is a database-grounded web content man-
agement system. Web edition stores all “content informa-
tion”. Additionall y, the content stored in the datab ase can
be rapidly searched or used in other ways. Lastly, at the
end user side, the MPEG-21 UED describes the environ-
ment (terminal, network, natural environment, and user
preferences) where the content is to be consumed.
When the user performs a query or defines a choice,
the UED, thus enabling a customizing application to ex-
plore this information to create the right co ntent variation
to provide the best possible user experience, accompa-
nies request. Finally, the TSM adaptation engine at the
center of in Figure E is responsible for matching the user
query either by selecting the most adequate available
variation, or by performing some adaptation. As proc-
essing a user query, the customizing application creates
an adapted variation of the Item to be sent to the user-the
new variation and its corresponding description may be
added to the resources available at the server. The user
query response may be delivered through a network,
eventually using a real-time connection. In this case, the
streaming module will stream the scalable or non-scal-
able content to the us er; in the case real-time transcoding
is been performed, it may happen that real time adjust-
ments to the transcoding process are implemented using
measures which characterize, for example, the network
fluctuations. This is how the customization is achieved.
6. Conclusions and Future Work
In this paper, we present a team spirit model for efficient
video customization using MPEG standards and Web
edition content description. The MPEG-4 (part-10) is
used for advance video coding and MPEG-4 (part-14) for
file formatting. For the usage environment description,
we have MPEG-21 standards. Hence, these standards act
as a base for development of a TSM system .
The TSM systems are capable of managing and cus-
tomizing content efficiently so that any information can
be delivered to different terminals and networks like for
television, phones and over other communication media
where video customization is needed. In future, after
adding some more functionality to it, the model is capa-
ble to deploy all essential features related to video deliv-
ery in smart phones, iPhones etc.
7. References
[1] S. Kim and Y. Yoon, “Video Customization System Us-
ing MPEG Standards,” International Conference on Mul-
timedia and Ubiquitous Engineering, 2008.
[2] S. Kim and Y. Yoon, “Universal Multimedia Access
Model for Video Delivery,” The 3rd International Con-
ference on Grid and Pervasive Computing - Workshops,
[3] B. S. Manjunath, P. Salembier and T. Sikora, “Introduc-
tion to MPEG 7: Multimedia Content Description Lan-
guage,” Wiley, New York, 2002.
[4] “MPEG Requirement Group, MPEG-21 Multimedia Fra-
mework,” Part 1: Vision, Technologies and Strategy,
Proposed Draft Technical Report, 2nd Edition, MPEG
Waikaloa Meeting, USA, December 2003.
[5] R. Koenen, “MPEG Requirement Group, MPEG-4 Over-
view,” V.21–Jeju Version, March 2002.
[6] “MPEG MDS Group, Multimedia Framework,” Part 7:
Digital Item Adaptation, Final Draft International Stan-
dard, MPEG Waikaloa Meeting, USA, December 2003.
[8] “Spirit DSP, Voice and Engine Expert Group,” Datasheet,
Juniper Research, February 2002.
Usage Environment
Web Edition Content
Team Spirit
Part 10
Part 14
User preferences
Band width
Equipment profile