Towards High Quality VoIP in 3G Networks An Empirical Approach

doi:10.4236/ijcns.2008.14043

Paper Menu >>

Journal Menu >>

I. J. Communications, Network and System Sciences, 2008, 4, 285-385

Published Online November 2008 in SciRes (http://www.SciRP.org/journal/ijcns/).

Towards High Quality VoIP in 3G Networks

an Empirical Approach

Andres ARJONA¹, Cedric WESTPHAL², Antti YLÄ-JÄÄSKI³,

Martin KRISTENSSON¹, Jukka MANNER³

¹ Nokia Siemens Networks, Finland, ² DoCoMo Labs, USA, ³ Helsinki University of Technology, Finland

Email: {andres.arjona, martin.kristensson}@nsn.com, cwestphal@docomolabs-usa.com,

{antti.yla-jaaski, jukka.manner}@hut.fi

Received August 22, 2008; revised October 10, 2008; accepted October 12, 2008

Abstract

Third generation (3G) packet switched WCDMA networks with high-speed downlink packet access (HSPDA)

are currently being deployed worldwide to provide wireless broadband connectivity. When introducing

HSDPA in 3G networks the end user experience and system capacity with voice over IP applications improve

considerably. When later on adding also high-speed packet uplink access (HSUPA), the system capacity and

end user experience will improve even further. This paper analyzes with measurements the VoIP quality over

current Release 5 HSDPA networks. VoIP is expected to be a widely used application over 3G data services.

The results show that even though the introduction of HSDPA significantly reduces the user-to-user voice

delay, the performance is satisfactory only for selected devices. Overall, the end user experience is still

significantly worse than with circuit switched solutions and is not acceptable. The current limitations with

VoIP in HSDPA networks with a too large delay can be improved by using the RLC UNACK mode,

potentially decreasing the jitter buffer size and reducing the terminal processing delay. In the longer term,

HSUPA and several features in 3GPP Release 7 standards will bring further performance improvements in

both user plan latency and system capacity.

Keywords: HSDPA, VoIP, WCDMA, Voice Quality, MOS

1. Introduction

Voice over IP (VoIP) is becoming a widely deployed

service in data networks, and it will penetrate from the

fixed network domain into wireless network domain. The

characteristics of fixed networks and wireless networks

are fundamentally different, which will impact the

performance of services. In this article we analyze the

VoIP service performance in wireless HSDPA and

WCDMA networks.

High Speed Download Packet Access (HSDPA) [1]

networks are being intensively deployed to provide

broadband connectivity to mobile devices, such as

handheld terminals and laptops. This broadband wireless

access is able to support voice applications over a packet

data connection instead of traditional circuit switched calls.

With the introduction of multi-radio devices with HSDPA,

WCDMA, and WiFi capabilities as well as integrated

VoIP clients, ubiquitous connectivity across any of these

networks is possible using the same mobile terminal.

However, while the mobile terminal and client are the

same, performance differs depending on the wireless

access in use.

Most of the studies of VoIP over 3G network focus on

simulation works. However, there is little data on the

performance in actual networks. Since VoIP is expected to

become a widely used application, and comes pre-

configured in many current handsets, it is of great

importance to better understand the performance of such

application over 3G networks. We set to answer the

following question: is VoIP over 3G network commercially

viable with the current state of the arts networks?

This paper studies the quality of VoIP in wireless

networks with multi-radio mobile devices both in the lab

and in live network environment setups by conducting a

TOWARDS HIGH QUALITY VOIP IN 3G NETWORKS 351

AN EMPIRICAL APPROACH

methodic performance analysis based on the E-Model

[2,3](we will describe the E-Model in more details in

Section 2). Likewise, our study will encompass the

signaling performance required for VoIP applications.

The key contribution of the paper is to characterize the

performance of VoIP over 3G network, and to identify the

main differences between HSDPA and WCDMA. We

perform a thorough empirical evaluation of VoIP quality

and signaling performance with HSDPA and WCDMA.

From our evaluation, we will observe that:



VoIP performance is acceptable in HSDPA networks

only for VoIP clients on devices with enough

processing power, such as laptops;



VoIP performance is rarely acceptable in WCDMA

networks, even for those high performance clients;



WCDMA performance can be significantly improved

by having retransmissions only at the BTS, not the

RNC;



The delay introduce by the end-user terminal is a

critical factor in the performance.

Our study takes into consideration both the

performance of the network and also the performance of

real embedded VoIP clients. In addition, we validate the

results of our study by comparing them to the actual

performance in a densely deployed HSDPA network in

Finland. Based on the results, we analyze the primary

differences in performance between simulations found in

the literature, our lab experiences and a live network case

study. Finally, we discuss possible features that can

improve the performance enough in current and future

releases to support VoIP in all handheld devices.

The remainder of the paper is organized as follows.

Section 2 describes our research approach, Section 3, 4

and 5 present results from a laboratory setup, a live

network scenario, and for VoIP signaling performance

respectively. Subsequently, in Section 6 we describe some

standardization improvements. In Section 7 we discuss the

available related works and finally in Section 8 we draw

conclusions.

2. Methodology and Test Environments

Our experiments are composed of measurements in a

HSDPA and WCDMA testbed, as well as a live HSDPA

network of a Finnish operator. We are interested in

measuring both the VoIP service audio quality in both

laboratory and live setups and SIP signaling latencies for

registering users and setting up calls.

2.1. VoIP Quality Methodology

The evaluation methodology consisted of multiple VoIP

tests carried out in a radio interference free environment.

These conditions were achieved in a laboratory setup by

using an RF room for the BTS and clients [4]. The tests

included different wireless accesses technologies and

variable combinations of codecs, signal conditions,

number of clients and fading profiles among others.

The main evaluation was carried out with two similar

tools based on the E-Model [2], which is a ITU-T

recommendation for VoIP evaluation. Firstly, with a NSN

proprietary tool, which is an implementation similar to the

one described in [3], and secondly, with IxChariot, which

is a widely used voice evaluation tool [5]. Finally, a third

tool based on the PESQ evaluation model was used to

determine the average end-to-end delay with real

embedded VoIP clients. With such setup, we can evaluate

the performance of the different wireless access

technologies based on the following test objectives:



VoIP quality performance with the E-Model;



Voice quality characterization for different wireless

accesses, signal conditions, configurations and fading

profiles;



Benchmark of two voice quality evaluation tools based

on the E-Model;



Estimation of the average end-to-end delay when a real

embedded VoIP client is used;



Effect of simultaneous background traffic during a

VoIP call;



Characterization of delay sources and possible

optimizations.

The E-Model is a voice quality evaluation model that is

based on network performance metrics. It is based on a

mathematical algorithm and provides an “R” performance

value based on the sum of four “impairment factors”

considered to be cumulative. The algorithm is depicted in

Equation (1) where, “Is” is Signal to Noise Ratio, “Id” is

delay (ms), “Ief” is packet loss (%), and “A” is expectation

factor.

100

RIs IdIefA

=− −−+

(1)

In practice, ITU-T proposes to use a simplified version

of this algorithm. The simplified algorithm considers that

noise cancellation is encountered in the network and also

dismisses the expectation factor. The expectation variable

is supposed to be used to provide a balance for some

environments in which the user expects a degraded quality,

such as satellite connections. However, since this variable

is merely subjective it is recommended to ignore it. The

simplified algorithm is depicted in Equation (2).

93.2

RId Ief

=− −

(2)

The R value can be associated with the Mean Opinion

Score (MOS) values, which is a subjective grade for voice

quality based on studies carried out by ITU-T. However,

even though the R-value can match a MOS value, it cannot

predict the absolute opinion of an individual user.

In this paper we calculate the MOS scores with two

tools based on the E-Model: a Nokia proprietary tool and

IxChariot, which is a widely used tool. These tools send

352 A. ARJONA ET AL.

dummy packets that resemble VoIP packets. The packet

size and transmission intervals are tied to the modeled

codec. Based on the received packets, network perfor-

mance values are calculated and the E-Model algorithm is

applied to determine a MOS score. Figure 1 shows an

overview of the environment and the E-Model based tools.

In this paper we emphasize the performance of the

G.729 codec, which is the only codec supported by all the

measuring tools used in this research. G.729 is also similar

to AMR-NB. AMR codec is the main building block for a

future codec for 3GPP based networks. ITU-T has set out

standards for maximum voice quality for several codecs,

including G.729 and G.711. However, there is not yet

agreement on a standard AMR codec maximum quality

definition in relation to the E-Model. Therefore, we can

make the assumption that the performance values

measured with G.729 codec are representative and are a

useful basis for our analysis. In addition, G.711 is not an

appropriate codec for wireless networks such as HSDPA

due to its high bitrate. However, G.711 is one of the most

largely supported codecs, and it is widely used in the

Internet. Also, due to legacy equipment it is used in many

cases, even though AMR and other lower bitrate codecs

(e.g. iLBC) are encouraged. For this reason we study both

G.729 and G.711.

2.2. VoIP Signaling Performance Methodology

The evaluation methodology consisted of a variety of

VoIP calls using Nokia N95 terminals. We chose this

terminal due to its widespread penetration in the market

and because it includes an embedded VoIP client by

default. This client can also be configured to work with

other SIP systems (e.g. Gizmo project). We did not use a

3rd party implementation with Skype because there were

no suitable clients for the N95 at the time of the study. We

captured SIP packet traces directly from the mobile

terminal wireless interface [6]. With such variables we

evaluated the different wireless networks available from

the following test objectives:



SIP registration delays



VoIP call signaling delays (post-dial, answer-signal,

and call-release delays)

The two main activities in VoIP calls are: first, a

registration to the VoIP server which is required to make

and receive calls, and second, the voice call setup itself.

The packet captures were carried out with a NSN proprie-

Figure 1. VoIP quality test environment.

Figure 2. VoIP signaling test environment.

tary tool with a function similar to TCPdump, and analyzed

with Wireshark Protocol Analyzer [7].

All the calls were carried out with two identical

terminals with exactly the same setups, registered to the

same VoIP server in the NSN IP Multimedia Subsystem

(IMS) and via the same wireless access in an interference

free environment. The measured scenarios were

HSDPA-to-HSDPA, WCDMA-to-WCDMA, and WiFi-

to-WiFi calls. The maximum transfer bitrates were set in

the RNC and HLR configurations to model different

wireless access scenarios. For WCDMA, maximum uplink

and downlink transfer rates were fixed at 64/64 kbps and

128/128 kbps. For HSDPA downlink was 3.6Mbps and

the uplink was fixed at 128 kbps. In the case of WiFi,

transfer rates were left with default configuration (802.11g

and maximum transfer rate). Figure 2 shows the test

environment setup.

The core network and IMS system were privately

owned and under very low load. The wireless access

systems were based on NSN Release 5 equipment for

HSDPA and WCDMA tests with default settings. For

WiFi tests, we used a Belkin Pre-N Router with default

configuration. The core network and IMS system were

based on Nokia equipment. The tests executed consisted

of multiple iterations of each of the voice call scenarios

and registration to the VoIP server. We provide average

result values from the measurements. The measurements

took place during February-March 2007.

2.3. Live Network Case Study Methodology

The final stage of our study consisted of evaluating voice

quality in a live HSDPA network. The network evaluation

took place in the Helsinki metropolitan area, and the live

network in use was provided by Elisa, Finland’s largest 3G

operator. The HSDPA coverage in the Helsinki metro-

politan area is densely deployed and assumed to be based

on NSN equipment similar to the one used in our lab

measurements (Release 5 equipment). Therefore, its per-

formance is directly comparable to our previous results.

The test objectives for this phase are as follows:



Characterize the base performance of the network

(throughput and round trip time) under different signal

conditions



Evaluate the VoIP quality in different signal conditions

TOWARDS HIGH QUALITY VOIP IN 3G NETWORKS 353

AN EMPIRICAL APPROACH

(excellent, medium and poor)



Evaluate the average VoIP quality in a mobile scenario

and determine the signal quality distribution for the test

route

Our approach to the live network measurements was

modeled in the following way. First, we made a basic

network performance evaluation in different radio

environments based on signal-to-noise ratio Ec/N0 levels

[4].

Ec/N0 values are an objective figure for quality

conditions because they take into account both signal

strength and the current interference level encountered in

the cell. Based on these basic network performance figures,

we can evaluate the average performance in terms of

maximum downlink and uplink throughput, as well as

average round trip time for a particular Ec/N0 range. As a

result, we are able to define three signal conditions ranges:

1) Good signal 2) Medium signal, and 3) Poor signal.

Second, we evaluate the VoIP quality with the same NSN

Proprietary tool used in previous tests under the three

different signal conditions. This allows us to get a good

metric of what is the quality in a static scenario under

specific signal conditions. Third, we evaluate the average

VoIP quality under a mobile scenario. The test route

chosen crossed a major part of the Helsinki metropolitan

area from West to East. The tests were carried out along

the route in both directions twice. In addition, we

measured the signal levels (Ec/N0) along the whole

driving route and carry out statistical distributions for the

values.

An obvious limitation of our study is the fact that due to

the nature of a live network, we are not able to know or

control the other user traffic that could be taking place at

the same time. Therefore, we are not able to pinpoint the

sources of e.g. a sudden quality drop or reduced bitrate.

However, since we carried out multiple tests, our study

provides a realistic view of what is the actual performance

that could potentially be achieved in the field. The

measurements for the live network study took place during

July and August 2007.

3. VoIP Quality Analysis and Results

3.1. HSDPA/WCDMA VoIP Performance

The tests to evaluate VoIP quality involved the following

variables: signal conditions, wireless access, and fading

profiles. Signal conditions were modeled to provide

different Ec/N0 levels by using attenuators. However, the

results in this paper show that this variable does not make

any sustainable difference and therefore, average result

values are given instead. The wireless access technologies

used were restricted to HSDPA/128, WCDMA 128/128

and WCDMA 64/64. There was no reason to use higher

bitrates in this study since VoIP packets require a low

bandwidth. Therefore, we emphasize the limits in which

VoIP can actually be used with an adequate quality level.

The bitrates were fixed and therefore, features that adapt

bitrate (by increasing or decreasing) during packet

switched connections were not used during the tests.

Fading was applied with Propsim C8 fading simulator

using Pedestrian-A 3km and Vehicular-A 30km fading

profiles. The jitter buffer had a depth of 200ms and first

packet play delay of 120ms. That is, all packets are

delayed at least 120ms to provide a cushion for possible

jitter. These are common settings in VoIP clients for

wireless cellular systems. According to Wang et al. [8], a

conservative jitter buffer playout delay is about 150ms.

Our results are consistent and show that the achieved

quality in the HSDPA system is competitive. Based on

ITU-T G.107 [2] quality was in average medium for

HSDPA with both measurement tools (NSN Proprietary

Tool and IxChariot). The average MOS was roughly 3.7

(see Figure 3 and Figure 4). This is a good figure

especially considering that typical PSTN systems provide

MOS values around 3.5. In the case of WCDMA, quality

differed depending on the bitrate used. WCDMA 128/128

provided low quality and WCDMA 64/64 gave a low/poor

quality level. The results also show a difference between

the measurement tools. Our proprietary tool was able to

differentiate more clearly the quality levels between

WCDMA 128/128 and 64/64. However, IxChariot does

not recognize much difference between these two bitrates.

In any case, both tools show that quality in WCDMA is not

optimal and is around MOS 3.0 at its best. WCDMA 64/64

MOS varied between 2.25 and 2.7. ITU states that MOS

below 2.5 is not recommended for voice services and that

nearly all users will be dissatisfied with such a service.

Therefore, we can expect that the end user experience with

VoIP WCDMA is not stable and will vary.

Table 1 presents the average end-to-end delays in the

experiments (including jitter buffer playout delay). The

results also show very similar performance regardless of

the signal conditions modeled or the fading profile applied.

The reason probably relates to fast power control

mechanisms which are able to handle such changes in

signal conditions in HSDPA and WCDMA. We

recommend that further studies would be performed using

noise or traffic generators instead of only modeling signal

scenarios with attenuators.

Figure 3. VoIP performance evaluation with proprietary tool.

354 A. ARJONA ET AL.

Figure 4. VoIP performance evaluation with IxChariot.

Table 1. Average VoIP end-to-end delays (including jitter buffer).

Proprietary

Tool IxChariot

Access PedA

3km VehA

30km

PedA

3km VehA

30km

HSDPA/128 215ms

217ms

223ms

225ms

WCDMA 128/128

295ms

300ms

368ms

381ms

WCDMA 64/64 315ms

355ms

370ms

365ms

Figure 5. Effect of background traffic.

Figure 6. Jitter average.

Figure 7. Packet loss percentage.

Finally, we point out that both measurement tools yield

quite similar results, with exception of WCDMA 64/64.

However, in this case we can observe that our proprietary

tool is actually more accurate than IxChariot, especially

since IxChariot does not seem to recognize any

performance difference between WCDMA 128/128 and

64/64 accesses.

3.2. VoIP Performance with Simultaneous FTP

Background Traffic

We also conducted some experiments where we added

background traffic. The tests included a small number of

simultaneous users running FTP downloads in order to

evaluate if they had any effect on the VoIP performance.

As we expected, a limited number of users cannot affect

VoIP quality (see Figure 5). The reason is tied to the

Round Robin Scheduling used in the system, which

divides bandwidth equally among users. With only 4 simul-

taneous users, each user will be given enough bandwidth

on a timely basis (every few milliseconds). In order to

measure the effect of background traffic we encourage

tests with a much larger number of users, e.g. 15-20 would

be required. This is out of the scope of this document.

Likewise, testing different scheduling techniques such as

Weighted Proportional Fair is of interest. However, there

are several simulation based studies [9,10] that study VoIP

capacity gains for different scheduling schemes including

mixed traffic scenarios. However, note that [11]

analytically showed that QoS constraints on VoIP reduce

the benefit from the Proportional Fair algorithm over

Round Robin scheduling.

3.3. Effect of Jitter and Packet Loss

The next test included experiments with jitter and packet

loss. Jitter and packet loss are presented in Figure 6 and

Figure 7. From the results we can see the average jitter and

packet loss measures for different access networks.

The results show an increased jitter and packet loss for

WCDMA 128/128 and 64/64. Further delay analysis

shows that this increase is most likely caused by constant

RLC retransmissions. RLC retransmissions have an effect

on both jitter and packet loss. Every time a RLC

retransmission takes place, it will cause a ~200ms delay

peak. This peak can potentially fill the jitter buffer causing

an overflow, which results in packet loss. Packet loss also

affects voice quality. The frequency of RLC retrans-

missions is dependent of the access in use. Figure 8 shows

an example of the RLC retransmissions (200ms delay

peaks) for different wireless access technologies.

The performance of these wireless accesses would

improve if RLC retransmissions are avoided as much as

possible. One possibility is to use the unacknowledged

mode (UNACK) feature in the RNC. The principle of

operation in HSDPA [1] is such, that the BTS estimates

TOWARDS HIGH QUALITY VOIP IN 3G NETWORKS 355

AN EMPIRICAL APPROACH

Figure 8. RLC retransmissions.

the channel quality of each user based on the physical layer

feedback on the uplink. Subsequently, link adaptation and

scheduling takes place at a fast pace. When the packets are

first received at the BTS, they are buffered. Then, the BTS

transmits the packet; however, it will still keep it in the

buffer. The reason being that in case of a failure in the

transmission (e.g. decoding failure), a retransmission will

take place directly from the BTS without requiring any

action from the RNC. This is a powerful advantage since

the retransmissions are combined at the terminal. However,

if there is a physical layer failure, such as a signaling error,

then an RLC retransmission is required, and packets are

retransmitted from the RNC (see Figure 9). This obviously

results in an increase in delay, which is not beneficial for

services like VoIP. While RLC retransmissions are not a

very frequent event in HSDPA in static scenarios, they are

more likely in mobility scenarios. In contrast, in WCDMA,

all retransmissions are RLC retransmissions requiring

RNC involvement. In the RLC unacknowledged mode,

packets are not retransmitted even if some are lost, for

example due to cell change operation [1].

3.4. Codec Performance Evaluation

Even though our study focus was on low bit rate codecs

(e.g. AMR or G.729), we also evaluated the performance

of the G.711 codec. Using G.711 codec in wireless

environments is not encouraged due to its higher bit rate.

However, since it is one of the most widely supported

codecs, there are cases in which it will be used due to other

codec incompatibilities. The performance was measured

with a proprietary tool. Tests with WCDMA 64/64 using

G.711 failed most of the time or resulted in very long

delays of several seconds and are therefore excluded.

Figure 9. BTS retransmissions handling.

Figure 10. Codec performance evaluation (G.729 and

G.711).

Table 2. G.711 codec jitter and packet loss (PedA 3km).

Access Jitter Average Packet Loss %

HSDPA/128 13ms 0.34

WCDMA 128/128

19ms 2.78

Figure 10 shows the VoIP quality comparison for both

G.729 and G.711 codecs. Table 2 summarizes the jitter

average and packet loss encountered when using the

G.711 codec.

3.5. Embedded VoIP Client Evaluation

These tests aimed at determining the additional delay

resulting from real embedded VoIP clients, such as the one

included with the N95. The test setup consisted of

establishing a VoIP call using an IMS system with the

G.729 codec. Subsequently, we measured the offset delay,

that is, the delay between the moment when the original

audio sample occurs to the moment the audio sample is

reproduced in the other calling end. The tool used for

offset measurements was Malden DSLA [12]. Figure 11

shows the measurement environment. The results show

that the total offset delay including the VoIP client

processing delay is rather high (see Table 3). ITU-T

recommends 400ms as the maximum delay for voice

services with a reasonable quality. With delay above this

limit, conversations are not interactive anymore and result

in talker overlaps. Therefore, a voice service with very

high delays results in a situation in which most, if not all

users are dissatisfied. As a comparison, current circuit

switched voice services have a delay of roughly

230

–

250ms.

With the results we can estimate the client processing

delay by subtracting the average end-to-end delay from

our tests based on the E-Model, 215ms, 295ms, and 350ms

respectively. The result is roughly 210ms additional

processing delay when using a real embedded VoIP client.

This value differs considerably from the more optimistic

processing delay estimations of 50-75ms available in

research from [13,14].

356 A. ARJONA ET AL.

Figure 11. Offset delay measurement environment.

Table 3. Sources of delay (G.729 codec).

HSDPA

/128 WCDMA

128/128 WCDMA

64/64

RTT Delay 85ms 170ms 225ms

Jitter Buffer

(100-200ms) 130ms 125ms 125ms

Total E2E Delay 215ms 295ms 350ms

Total E2E Delay,

including embedded

client delay 425ms 505ms 560ms

3.6. HSDPA/WCDMA Overall Effect on VoIP

Performance

End-to-End delay is the main reason for low voice quality.

With the total end-to-end total delay average values we

can extend the analysis by dividing the sources of delay

(Table 3).

With this estimation it is quite clear to understand why

VoIP does not perform well in current systems with

handheld terminals, and particularly live networks, even

when the round trip time (RTT) is low. The final

end-to-end delay is just too high. We finalize our VoIP

quality analysis by modeling the resulting VoIP quality

MOS with the additional embedded VoIP client

processing delay based on the E-Model (see formula 2).

Figure 12 shows this estimation. The results represent a

case of a laptop client versus using an embedded client in a

handheld device such as the N95 VoIP client. The figure

considers both delay and packet loss impairment factors. It

must be noted though, that in a laptop client there will also

be an additional processing delay. However, such delay is

considerably lower, ~50ms in a worst case scenario [15].

Thus, still ~160ms lower than with the mobile device

tested.

Future features such as HSUPA in further 3GPP

releases will slightly improve performance. For example,

the expected average RTT for HSUPA networks is roughly

65ms (a reduction of 20ms compared with HSDPA). This

reduction however does not improve the VoIP quality

when using a laptop. That is, the average MOS with a

laptop will still be the same. Contrastingly, the expected

quality improvement for an embedded client is about 0.2

Figure 12. Overall VoIP quality with laptop and embedded

handheld clients.

points in the MOS score. The main reason for the limited

quality improvement is that the major sources of delay,

and therefore, main impairment factors reducing VoIP

quality are not directly related only to the wireless access,

but to the VoIP client implementation. However, as we

described previously, if some HSUPA features like

UNACK mode are enabled in the wireless network, it will

be possible to reduce the size of the jitter buffer

implementation without compromising the VoIP quality.

Furthermore, a reduction in the client processing delay is

extremely important in order to seriously improve the

VoIP quality in the mobile environment.

4. VoIP Signaling Analysis and Results

In this section we analyze the latencies for VoIP using the

Session Initiation Protocol (SIP). This is an important

metric because long delays in the call setup seriously harm

the overall VoIP experience; people have certain

expectations based on the current circuit switched services,

and it is crucial to meet those.

4.1. SIP Registration Setup

The signaling [16] and delay measurements for SIP

Registration to the VoIP server in the IMS system are

depicted in Figure 13. The measurements show that the

registration times with HSDPA and WCDMA are about

30% and 50% higher than with WiFi. While this might not

seem much, we should remember that SIP registration

requires a very limited number of messages. Therefore, as

more messages are required, such as with 3GPP based

registration, delays will increase.

4.2. VoIP Call Signaling

ITU E.721 [17] recommends values for call setup delays

in circuit switched calls. The recommended values for call

setup (post-dial delay) are 3s for local, 5s for toll and 8s

for international connections, with 6s, 8s, and 11s as 95%

values. The “call answer” (answer-signal) delay reflects

TOWARDS HIGH QUALITY VOIP IN 3G NETWORKS 357

AN EMPIRICAL APPROACH

Figure 13. SIP registration signaling delays.

Figure 14. VoIP call signaling delays.

the time it takes from the moment the receiving end

accepts the call until the call is actually established. E.721

recommendation is 0.75s for local, 1.5s for toll, and 2.0s

for international connections, with 1.5s, 3.0s, and 5.0s as

95% values. Finally, “call end” (call-release delay) is the

time it takes for the call to be terminated [18,19]. The

signaling [16] and delay measurements for voice call

setups when a PDP context is active and the terminal is

registered to the IMS system are depicted in Figure 14.

The results show an expected increment in the call

signaling delays depending on the access used. Since all of

the network elements were located in a private network,

the environment could be though of as providing local

calls. Our results also show that an embedded mobile VoIP

client experiences an increased delay compared to a PC

client, such as the one measured by Curcio and Lunden [18]

with a WCDMA network.

The setup delays for VoIP calls might be impacted with

additional delays in a cellular system in cases were there is

no active PDP context, and also due to a required regis-

tration to the IMS. The PDP context activation delay was

~3 seconds in our tests. Simulations by Pous et al. [20]

propose 2.24 seconds. Based on these values, the always-

on enabled calls can be in line with E.721 recommen-

dations. However, when the PDP context is not active, the

delay with WCDMA can vary between 11 to 17 seconds,

and thus, exceed the recommended values. HSDPA delay

in this case is around 8 seconds, which is similar to the

recommendation for international calls. However, additional

delays from e.g. traversed networks, gateways, and proxies

could result in larger total delays than those recommended.

Figure 15. Average throughput in Elisa HSDPA network.

Figure 16. Average round trip time in Elisa HSDPA

network.

5. Live Network Case Study

5.1. Generic HSDPA Performance

In this section we describe the generic evaluation of the

live HSDPA (Release 5 equipment) network performance

in Helsinki. The results for throughput and round trip time

are depicted in Figure 15 and Figure 16. The measurement

results show an increase in round trip time delay when

compared to the average values measured in the lab

environment (85ms). This means that the VoIP quality

(MOS) will be worse than our results in Section 3, and

therefore VoIP support will be even more difficult.

Throughput was measured via multiple file downloads and

uploads from a local server in Finland; while RTT was

measured with 32Byte ICMP Echo Request and Reply

(ping) packets to the same server.

5.2. VoIP Quality

The VoIP quality in the Elisa HSDPA network is likewise

slightly lower than our lab measurements (see Table 4).

The mean opinion score was 3.5, 3.5 and 3.3 for good,

medium and poor signal conditions. However, we have to

consider that once again, the VoIP quality was measured

for laptop based VoIP communication. That is, it does not

account for the additional processing delay for the

terminal VoIP client implementation previously described.

358 A. ARJONA ET AL.

The results in the live case still indicate that VoIP support

for a handheld embedded client will be poor. However,

these values do take into account the jitter buffer play out

delay. The most noticeable difference between the three

scenarios is the packet loss ratio, which increases as the

signal quality decreases.

5.3. VoIP Quality in Mobility Scenarios

The mobile environment tests were measured from a van

driving through a test route at average speeds of 60-80

km/h without stopping. The selected test route crosses the

Helsinki metropolitan area from East to West and it is

entirely covered by Elisa HSDPA network according to

their publicly available coverage map. The test route was

about 18.5km and it took approximately 15min to travel.

The test route was driven several times to validate the

results.

The results show that the average performance is lower

than in static scenarios. A mobile scenario obviously

brings several additional challenges due to the different

cell changes along the test route. The number of cell

changes along the route was 28 and were characterized via

the changes in scrambling codes used. Table 4 summarizes

the VoIP quality results.

Furthermore, in mobility scenarios the amount of RLC

retransmissions required is very noticeable. To

characterize the retransmissions, we conducted an

additional test along the test route in which we sent

continuous ping packets of 32B (see Figure 17). The

results show a large amount of delay peaks resulting from

these retransmissions. Therefore, it further supports our

lab measurements and emphasizes the importance of the

unacknowledged mode feature. We expect that this mode

would potentially take the majority of large delay peaks,

and thus, improve VoIP quality. However, if this mode is

used, there is a possibility that the packet loss ratio will

increase, and for that reason, it is very important to

validate future results as well even if the feature is enabled.

In addition, during the mobile tests, we recorded the

signal conditions to characterize the signal quality distribution

Table 4. VoIP quality in Elisa HSDPA network (including

jitter buffer).

Scenario Delay

Avg. Jitter

Avg. Packet

Loss %

MOS

Good Signal

(Ec/N0 -3 to -5) 288ms

19ms

0.4 3.5

Medium Signal

(Ec/N0 -7 to -9) 283ms

19ms

1.0 3.5

Poor Signal

(Ec/N0 -11 to -13)

266ms

14ms

2.6 3.3

Mobile

Environment 331ms

22ms

1.9 3.2

Figure 17. Round trip time during mobility tests.

Figure 18. Signal quality distribution.

Figure 19. Detailed signal quality distribution.

along the test route. The measurements show that in

general, it is highly probable to get a good signal level and

that the coverage is well deployed (see Figure 18 and

Figure 19).

6. Future Directions

At the current moment, the performance of VoIP in 3G

networks is far from optimal. However, with some of the

features and improvements in further 3GPP releases, the

performance will improve. For instance, Release 6

equipment reduces RTT to roughly 65ms, and even lower

with Release 7. Likewise, with Release 7 operators have

other choices for deployment prior to full VoIP rollouts.

For instance, advances such as Circuit Switched voice

over HSPA (CS over HSPA) can improve capacity to

similar levels as with VoIP. In this case, traditional voice

TOWARDS HIGH QUALITY VOIP IN 3G NETWORKS 359

AN EMPIRICAL APPROACH

is carried over packet data. Hence, since VoIP does not

provide any significantly better capacity figures over CS

over HSPA, operators can delay VoIP deployment can be

delayed until adequately performing terminals and

networks are available. This however, is only possible if

several features are upgraded in several network elements.

These improvements occurred while this manuscript was

under review. CS over HSPA is expected to be included in

3GPP Release 7 [21].

7. Related Work

Although, there is prior work investigating the VoIP

performance in WCDMA and HSDPA systems, it is not

very extensive and mostly based on simulations. For

instance, some papers [22,23,24] study VoIP performance

in WCDMA and provide some baseline results. In addition,

other works [25,26] provide some estimated values for

processing delays. In these studies, the assumption for the

estimations is based on whether the call is towards a

landline or a mobile end. Some performance simulations

are also available [8,10,13,14,27

–

29]. However, the

simulations only provide a delay budget rather than a

description of the end user experience. Contrastingly, our

study focuses on end user experience and VoIP quality

rather than delay budgets alone. The delay budget values

used in simulations vary from 80-150ms for studies

ignoring encoding/processing delays and jitter buffer

implementations [9,27,28,30,31], to 250-300ms for

studies that assume such delays to some extent [8,10,

13,14,29]. In addition, the estimations used in simulations

are in general overly optimistic in regards to, e.g. client

processing delay. Kim [14] considers the processing/

encoding delay to be 50ms, while Ericson [13] assumes

around 75ms. These delay values include the jitter buffer

playout delay as well. Therefore, it is noticeable they are

too optimistic, especially when compared to our

experiment results with actual handsets and VoIP jitter

buffer client implementations.

Even though, it is understandable that the exact

encoding/processing delays and jitter buffer playout

delays are client specific, unless they are modeled

accordingly, or at least, to some extent, the differences in

performance between simulations and actual deployments

will remain very visible. Therefore, simulations results are

only comparable to laptop based performance at its best

and not to actual handheld performance, which in the end

is the primary use case for VoIP services. Other simulation

study [28], notices the importance of reducing RLC

retransmissions to improve performance in FTP and

HTTP browsing. However, the study does not address its

importance for VoIP services. Finally, Wager and

Sandlund [32] conduct simulations to determine the

amount of possible lost frames of VoIP speech in HSDPA

mobility scenarios.

In regards to VoIP signaling, SIP call setup delays and

signaling performance have been studied previously

mostly for Internet scenarios. ITU E.721 [17]

recommendation and an IETF Internet Draft [33], provide

call setup delays recommendations for circuit switched

and Internet Telephony systems respectively. Additionally,

Eyers and Schulzrinne [19] provide guidelines for Internet

Telephony call setup and signaling transfer delays. In

regards to 3GPP based wireless accesses, Kist and Harris

[34] provide simulations for transfer delays with 3GPP

signaling, while Fathi et al. [35] and Pous et al. [20]

modeled signaling performance. Further, Curcio and

Lunden [9] provide measurements for a WCDMA setup

using laptop clients for local, international and overseas

calls. Most of the mentioned research focuses on

simulations, and does not consider some end user cases

such as calls in wireless environments starting from

different states. Additionally, performance with different

wireless radio accesses and configurations under the same

conditions is not available. Also, the available works do

not use an embedded VoIP client in a handheld mobile

terminal, which yields different delay values compared to

a PC. HSDPA signaling performance has not been

evaluated either. Our research aims at covering these items.

The importance of evaluating a mobile terminal relies in

the fact that the eventual substitution of circuit switched

calls in 3GPP networks (HSDPA and WCDMA) for VoIP

calls will take place with a handheld mobile device and not

with a PC or laptop. Likewise, multi-radio devices can

provide ubiquitous access via different wireless access

technologies with distinct performance characteristics.

The lack of actual measurement performance values in

literature could be mainly due to the unavailability of

integrated VoIP clients in the terminals and available

HSDPA networks. However, with the introduction of

some multi-radio devices with VoIP capabilities (e.g.

Nokia N95, Nokia 6110), it is possible to use VoIP

applications without a PC.

8. Conclusions

Multiple measurements were carried out to evaluate and

characterize the VoIP quality and VoIP signaling

performance in HSDPA and WCMDA wireless accesses.

The results show that HSDPA access is capable of

providing a competitive VoIP quality compared to circuit

switched voice. However, WCDMA in 128/128 and 64/64

bitrate configurations can only provide low and poor

qualities, the main issues are long delays and packet losses,

which occur often due to RLC retransmissions that

overflow the jitter buffer capacity. However, the main

issue with HSDPA is not only tied directly to the wireless

access performance, but to the mobile device capabilities.

Our results show that embedded mobile VoIP clients can

introduce an increased delay due to processing when

compared to laptop performance. This processing includes

360 A. ARJONA ET AL.

e.g. encoding/decoding, and other operating system tasks.

The additional delay has a considerable voice quality

reduction effect. Further, the results from the test cases

experimented in a live network resulted in lower

performance when compared with similar laboratory

measurements. Also, the effect of mobility in regards to

VoIP quality degradation is quite noticeable. The

degradation is due to handovers during the test route that

increase the ratio of RLC retransmissions.

Therefore, the main aspects that can potentially

improve VoIP quality performance with the current

systems are mainly to reduce the number of RLC

retransmissions by using unacknowledged mode,

potentially use smaller jitter buffer sizes, and reduce the

embedded VoIP client processing delays. High quality

VoIP in 3G networks will be possible. However, it is tied

to improvements in several areas such as wireless network

delay, client implementation, and client processing delay.

Finally, a main improvement developed while this

manuscript was in process is CS over HSPA, which

improves capacity and thus, can allow operators to delay

VoIP deployment projects until networks and terminals

have better performance.

9. References

[1] H. Holma, and A. Toskala, “HSDPA/HSUPA for UMTS,”

John Wiley, 2006.

[2] ITU-T, Recommendation G.107 “The E-model, a

computational model for use in transmission planning,”

2003.

[3] R. Cole and J. Rosenbluth, “Voice over IP performance

monitoring,” ACM SIGCOMM’01.

[4] A. Arjona, C. Westphal, A. Ylä-Jääski, and M.

Kristensson, “Towards high quality VoIP in 3G networks:

An empirical study,” In Proceedings IEEE AICT’08,

Athens Greece, June 8–13, 2008.

[5] IxChariot, http://www.ixiacom.com.

[6] A. Arjona and A. Ylä-Jääski, “VoIP call signaling

performance and always-on battery consumption in

HSDPA, WCDMA and WiFi,” in Proceedings IEEE

WiCOM’07, Shanghai China, September 21–23, 2007.

[7] Wireshark Protocol Analyzer,

http://www.wireshark.org.

[8] B. Wang, K. Pedersen, T. Kolding, and P. Morgensen,

“Performance of VoIP on HSDPA,” IEEE Vehicular

Technology Conference VTC’05 Spring, Vol. 4, pp.

2335–2339, May 30–June 1, 2005.

[9] P. Lundén, and M. Kuusela, “Enhancing Performance of

VoIP over HSDPA”, In Proceedings IEEE VTC’07 Spring,

April 21-24, 2007.

[10] A. Braga, E. Rodriguez, and F. Cavalcanti, “Packet

scheduling for VoIP over HSDPA in mixed traffic

scenarios,” in Proceedings 17th IEEE PIMRC’06,

September 2006.

[11] H. Kim, “Loosing Opportunism: Evaluating Service

Integration in an Opportunistic Wireless System,” IEEE

INFOCOM’07, May, 2007.

[12] Malden DSLA,

http://www.malden.co.uk/dsla.

[13] M. Ericson and S. Wänstedt, “Mixed traffic HSDPA

scheduling–impact on VoIP capacity,” in Proceedings

IEEE VTC’07 Spring, April 21–24, 2007.

[14] Y. Kim, “VoIP Service on HSDPA in Mixed Traffic

Scenarios,” in Proceedings IEEE CIT’06, September 2006.

[15] Cisco, “Understanding delay in packet networks,”

Document ID: 5125, March 2007.

[16] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J.

Peterson, R. Sparks, M. Handley, and E. Schooler, “SIP:

Session initiation protocol,” RFC 3261, IETF June 2002.

[17] ITU, Recommendation E.721, “Network grade service

parameters and target values for circuit switched services

in the evolving ISDN,” 1999.

[18] I. Curcio and M. Lundan, “SIP call setup delay in 3G

networks,” ISCC’02, Taormina Italy, July 1–4, 2002.

[19] T. Eyers and H. Schulzrinne, “Predicting Internet

telephony call setup delay,” IPTel2000, Berlin, April 2000.

[20] M. Pous, D. Pesch, G. Foster, and A. Sesmun,

“Performance evaluation of a SIP based presence and

instant messaging service,” 3G 2003, June 25–27, 2003.

[21] Nokia Siemens Networks and Nokia, “Supporting CS over

HSPA,” 3GPP R2-073487, August 2007.

[22] F. Poppe, D. Vleeschauwer, and G. Petit, “Choosing the

UMTS air interface parameters, the voice packet size and

the de-jittering delay for a voice-over-IP call between a

UMTS and a PSTN party,” IEEE INFOCOM, Vol. 2, pp.

805–814, April 2001.

[23] F. Poppe, D. Vleeschauwer, and G. Petit, “Guaranteeing

quality of service to packetized voice over the UMTS air

interface,” 8th International Workshop on Quality of

Service, pp. 85–91, June 2000.

[24] R. Cuny and A. Lakaniemi, “VoIP in 3G networks: An

end-to-end quality of service analysis,” in Proceedings

IEEE VTC’03 Spring, April 2003.

[25] ITU-T, Recommendation G.114, “One-way transmission

time,” 2003.

[26] TIATR-41.1.2, “VoIP end to end delay budget planning

for private networks,” Cisco 2000.

[27] G. Rittenhouse and H. Zheng, “Providing VoIP service in

UMTS-HSDPA with frame aggregation,” in Proceedings

IEEE ICASSP’05, March 18–23, 2005.

[28] L. Bajzik, L. Korössy, K. Veijalainen, and C. Vulkán,

“Cross-layer backpressure to improve HSDPA

performance,” in Proceedings IEEE PIMRC’06, Helsinki

Finland, June 2006.

[29] P. Hosein, “Scheduling of VoIP traffic over a time-shared

wireless packet data channel,” in Proceedings IEEE

ICPWC’05, January 23–25, 2005.

[30] Y. Seo and D. Sung, “Performance of VoIP in HSDPA

based on an adaptive power allocation scheme,” in

Proceedings IEEE WCNC’06.

[31] P. Hosein, “Capacity of packetized voice services over

time-shared wireless packet data channels,” in

Proceedings IEEE INFOCOM’05, March 13–17, 2005.

[32] S. Wager and K. Sandlund, “Performance evaluation of

HSDPA mobility for voice over IP,” in Proceedings IEEE

TOWARDS HIGH QUALITY VOIP IN 3G NETWORKS 361

AN EMPIRICAL APPROACH

Vehicular Technology Conference VTC’07 Spring, April

22–25, 2007.

[33] H. Lin, T. Seth, A. Broscius and C. Huitema, “VoIP

Signaling performance Requirements and Expectations,”

IETF Draft, June 1999.

[34] A. Kist and R. Harris, “SIP Signaling Delay in 3GPP”,

IFIP Interworking’02, October 13–16, 2002.

[35] H. Fahti, S. Chakraborty, and R. Prasad, “Optimization of

SIP session setup delay for VoIP in 3G wireless

networks,” IEEE Trans. on Mobile Computing, Vol. 5, No.

9, September 2006.