On the Optimization of Real Time Performance of Software Defined Radio on Linux OS

doi:10.4236/cn.2013.53B2054

Paper Menu >>

Journal Menu >>

Communications and Network, 2013, 5, 292-297

http://dx.doi.org/10.4236/cn.2013.53B2054 Published Online September 2013 (http://www.scirp.org/journal/cn)

On the Optimization of Real T ime Performance of

Software Defined Radio on Linux OS

Zhen Wang, Limin Xiao, Xin Su, Xin Qi, Xibin Xu

State Key Laboratory on Microwave and Digital Communications, Tsinghua National Laboratory for Information Science and

Technology, Research Institute of Information Technology, Tsinghua University, Beijing, China

Email: wangzhen11@mails.tsinghua.edu.cn

Received July, 2013

ABSTRACT

With the evolution of the communication standards, Software Defined Radio (SDR) is faced with an increasingly im-

portant problem to balance more and more complex wireless communication algorithms against relatively limited proc-

essing capability of hardware. And, the competition for computing resources exacerbates the problem and increases

time-delay of SDR system. This paper presents an integrated optimization method for the real-time performance of SDR

on Linux OS (operating system). The method is composed of three parts: real-time scheduling policy which ensures

higher priority for SDR tasks, CGROUPS used to manage and redistribute the computing resources, and fine-grade sys-

tem timer which makes the process preemption more accurate. Acco rding to the experiments, th e round-trip data trans-

fer latency decreases low enough to meet the requirement for TD-SCDMA via the application of the method.

Keywords: SDR; Real-time Priority; CGROUP; System Timer

1. Introduction

Software Defined Radio, which holds the promise of

fully programmable wireless communication systems [1],

has shown more and more importance in the develop-

ment of wireless communication systems. An SDR based

on general-purpose processor (GPP) enables us to dy-

namically modify software to realize different commu-

nication standards, instead of time-consuming hardware

redesign, which consequently reduces the developing

time [2]. In comparing with traditional wireless commu-

nication system based on DSP or FPGA, an SDR system

based on GPP is a more flexible and convenient to de-

velop for researchers [3,4], because high-level program-

ming language (e.g. C/C++) is generally easier to de-

velop and debug than Verilog or VHDL.

Although SDR offers many benefits to the develop-

ment of wireless communication system, it cannot be ig-

nored that the increasing computational expense brought

about by the updates of wireless communication protocol,

has posed immense challenge to the limited computing

resources of hardware. Inevitably, the real-time per-

formance of an SDR based on GPP would be affected by

this problem. And worse still, when a system is com-

posed of many SDR tasks, like C-RAN [5] (the whole

function of BTS is designed to implemented on GPP),

the competition for hardware capacity from these tasks

would be beyond what the multi-core solutions could

handle. Without efforts to manage and redistribute the

computing resources, the problems mentioned above

would make the real-time performance of the SDR sys-

tem unguaranteed.

As known, in a personal computer, it is an important

task for OS to manage and allocate the computing re-

sources to different kinds of processes according to their

priorities. Similarly, redistributing and rescheduling

processes in an SDR system, which enables us to pre-

serve enough computing resource to those time-critical

tasks, is a main method to improve the real-time per-

formance.

As a traditional process scheduling tool of Linux ker-

nel, the principle and implementation of scheduling pol-

icy on Linux OS have been illustrated in many papers

and books like Understanding The Linus Kernel [6].

Through applying different scheduling policies offered

by Linux kernel to various processes, the tasks with

real-time priority will be allocated more CPU time than

other tasks and take precedence on execution. Consider-

ing the strict demand for latency from the time-critical

processes in an SDR system, instead of scheduling all

processes equally, adjusting priorities by selecting ap-

propriate process-scheduling strategies, will make great

contributions to improving the real-time performance

without upgrading hardware.

Despite that scheduling strategy promoting the

real-time performance of the SDR system, it still leaves

much to be desired. For example, setting different priori-

Z. WANG ET AL. 293

ties to the processes is a relatively imprecise allocation to

CPU time. And what’s worse, without constraints to

CPU consumption, a process may consume extra and

even the whole CPU resources when no other processes

could preempt it. There are many scenarios where this

excess CPU share can cause unacceptable utilization or

latency [7]. For an SDR system, it would introduce

scheduling latency when a time-critical process has been

woke up to preempt CPU resources against the process

currently running.

Therefore, CGROUPS (Control Groups), which is a

new resource management tool provided by the Linux

kernel [8], should be taken into application. By imposing

caps on the CPU usage time of other tasks, CGROUPS

enables us to spare more CPU bandwidth to the SDR

tasks and reduce the frequency of process switches. More

specifically, the CPU share can be changed dynamically

as needed.

As mentioned above, real-time priority guarantees that

the time-critical process will occupy the computing re-

sources when competing with other tasks; meanwhile,

CGROUPS reduces the occurrence of thos e competitions

by constraining the CPU usage of other tasks. In addition

to that, a high-resolution timer [9] configured in Linux

Kernel would minimize scheduling latency via making

process switches more immediate; and in consequence,

optimize the overall real-time performance of SDR sys-

tem.

In this paper, real-time priority, CGROUPS and high-

resolution timer are utilized in combination. According

to the experiments, this integrated method could con-

straint the data transfer latency to meet the requirement

of 3G standards like TD-SCDMA.

The rest of this paper is organized as follows. Section

2 describes the system model used to analyze the data

transfer latency of an SDR system. Section 3 proposes

the integrated method to optimize the real-time perform-

ance of SDR based on GPP. And then, we analyze the

latency requirement for TD-SCDMA in section 4. In sec-

tion 5, we design and conduct some experiments based

on proposed system model to validate the effectiveness

of the optimization method. Finally, in section 6, we

draw conclusions based on analysis of experiment re-

sults.

2. System Model

To validate the effectiveness of the integrated method to

optimize the real-time performance of SDR systems run-

ning on Linux OS, a system model based on GPP is de-

signed and implemented.

As shown in Figure 1, the system model is composed

of three parts: baseband processing board, adapter board

and RF front end. We choose PC on Linux OS as the

baseband processing platform, which is major responsi-

ble for data processing. The adapter board mainly im-

plemented by FPGA serves as an important role in con-

trolling the RF front end and transferring data between

the PC and the RF front end. The RF front end acts as a

transceiver.

In baseband processing board, our SDR task is com-

posed of three main processes. Firstly, the physical-layer

process is responsible for the PHY processing like

modulation and demodulation. Secondly, the high-layer

process is designed to implement the algorithms of Data

Link Layer and Network Layer. Finally, the data inter-

face process serves as a role of interacting with adapter

board through USB 2.0. Besides, to better validate the

effectiveness of resource management via use of the

method proposed, there still exist some other tasks (the

number of tasks is greater than the number of CPU cores

of PC), which bring the increase of computing resources

consumption.

In this model, the latency of data transfer between RF

front end and baseband processing board reflects the

real-time performance of the SDR system. Further, there

are three factors that might have major effects on the

latency. Firstly, because of the limited interface band-

width of USB 2.0, transfer delay between RF front end

and processing board is inevitable. Secondly, baseband

processing board would spend a period of time in proc-

essing data from RF front end, which is called processing

time. Finally, the scheduling latency under Linux OS

would be introduced when process switches occur. Tak-

ing data interface process as example, when there need

read data from RF front end, Linux kernel would spend

some time in reallocating computing resources to data

interface process from the process currently running on

CPU.

3. Integrated Optimization Method

As mentioned above, the transfer delay, processing time

and scheduling latency limit t he real-time performance of

an SDR system. Generally, transfer delay is fixed be-

cause the packet size and interface bandwidth have been

defined by transfer protocol and hardware. And then, the

processing time is mainly determined by the algorithmic

complexity and computing capacity. Therefore, a general

optimization method should be aimed to redistribute the

Figure 1. System model.

Z. WANG ET AL.

294

computing resources to reduce the cost of process sched-

uling and process switches.

Considering such features, three main methods will be

taken in combinations, that is, real-time schedule policy

for time-critical tasks, constraint of the CPU consump-

tion on disturbance tasks via use of CGROUPS, and re-

configuration of f ine-grain timer in Linux kernel.

3.1. Real-time Priority

On Linux OS, processes are divided into three categories:

interactive processes, batch processes, and real-time

processes [6]. Meanwhile, because of the features of dif-

ferent processes, there exist three scheduling algorithms:

SCHED_NORMAL, SCHED_RR, and SCHED_FIFO.

The SCHED_NORMAL policy is designed to schedule

conventional, time-shared processes. Then, SCHED_RR

and SCHED_FIFO are aimed at real-time processes.

In order to reduce process response time and avoid of

process starvation, the Linux kernel schedules processes

based on time sharing technology [10] and allows proc-

esses being preempted according to priority order. When

a process gets ready to run in CPU, the kernel checks

whether its priority is greater than the priority of the cu r-

rently running process. If true, the execution of current

will be interrupted and the scheduler is invoked to select

the process of a higher priority to run. And additionally,

the real-time processes enjoy higher priority than any

other ordinary processes, so that, a real-time process will

not be interrupted by ordinary process unless it has fin-

ished executing, while it can preempt other ordinary

process if need as shown in Fi gure 2.

Considering such feature of process scheduling, we

can specify a real-time scheduler for these time-critical

processes in order to get the real-time performance im-

proved. In an SDR system, data interface process running

in baseband processing boar d is designed to interact with

RF front end, which has strict requirement for latency;

that is, it must read or write data in specified time slot

following air interface specifications. Otherwise, if the

data transfer is delayed, the speed and quality of data

processing will be diminished, as a result, the real-time

performance of SDR system will be affected.

Figure 2. Process preemption.

Therefore, the data interface process should be speci-

fied as real-time process, which makes sure that the data

interface has a higher priority than other ordinary proc-

esses running on the same platform. The data transfer

will not be interrupted by other processes, and what’s

more, when some data packages need to be read from RF

front end, the data interface process will seize the CPU

even if an ordinary pro cess is currently running.

3.2. Computing Resource Redistribution

CGROUPS is an abbreviation of Control Groups, which

provides a mechanism for aggregation and partitioning

sets of tasks, and all their future children, into hierarchi-

cal groups with specialized behavior [11]. A set of tasks

(processes or threads) are associated with a set of pa-

rameters for one or more subsystems. A subsystem is a

module that makes use of the task grouping facilities

provided by CGROUPS to treat groups of tasks in par-

ticular ways and redistribute computing resource such as

CPU, memory, IO of block device as designed. There are

five main subsystems in CGROUPS and the details will

be introduced below.(Table 1)

The initial goal of CGROUPS is to provide a unified

framework for resource management, not only integrat-

ing the existing subsystems, but also providing the inter-

face to the new subsystems which may be developed in

the future. Nowadays, CGOUPS has been taken into ap-

plications in a variety of scenarios, especially in some

network services such as Taobao-a leading online C2C

auction company in China [12 ], where it is used as a tool

of redistributing computing resource for large servers and

OS-Level virt u al i zat i on [16] .

What’s important for SDR systems, is that we can use

the cpu subsystem to assign a specific CPU shares to

processes running on the same OS and place constraints

to CPUS usage of those less important tasks. In cpu sub-

system, there are two main configuration options to allo-

cate CPU resource: cpu.cfs_period_us and cpu.cfs_

quota_us [13].

Reconfiguration of CPU bandwidth by set certain

value (from 1000 to 1000000 microseconds) to cpu.cfs_

quota_us and cpu.cfs_period_us, works immediately and

Table 1. Subsystem of CGROUPS and function.

Subsystems Function

cpu Allocate the cpu occupancy for a set of tasks

cpuset Assign cpus and mem to a set of tasks

memory Memory resource controller

devices Device whitelist controller

blkio Set limits to the I O of block device

Z. WANG ET AL. 295

efficiently. The cpu.cfs_quota_us specifies a maximum

CPU time that a set of tasks could use in a period of time

which is set in cpu.cfs_period_time. For example, when

the cpu.cfs_quota_us is set to 10000 while the cpu.cfs_

period_us is 100000, tasks in this group will use 10ms

(milliseconds) in 100ms at most, which means the usage

of CPU is limited to 10 percent.

Through establishing constrains to CPU usage via us-

ing cpu.cfs_quota_us and cpu.cfs_period_us, we can

prevent the over much consumption from some processes

and spare more resources to the time-critical processes in

our SDR system. Via use of CGROUP, we could better

isolate the execution of tasks in SDR system than multi-

cores solutions, because the competition for computing

resource could not be relieved when the number of tasks

is greater than the number of cores.

3.3. Fine-grain Timer in Linux Kernel

The passing of time is important to the Linux kernel,

because there exist lots of kernel functions which are

time-driven except of event-driven. These periodic tasks

occur on a fixed schedule, depending on the System

Timer which is a programmable piece of hardware that

issues an interrupt at a fixed frequency [10]. Then, sys-

tem time gets updated and those tasks get performed by

interrupt service routine handling for this timer.

Indeed, System Timer plays as similar role as the

alarm clock of a microwave oven [10], which makes us-

ers aware of that the cooking time interval has elapsed,

while System Timer reminds computers that one more

time interval has elapsed based on some fixed frequency

established by the kernel. The frequency of the system

timer is programmed on system boot based on a static

preprocessor defi ne , HZ.

The main benefit of a higher HZ is the greater accu-

racy in process preemption, consequently, the scheduling

latency decreased, which improve the real-time per-

formance of SDR system. As mentioned above, the Sys-

tem Timer interrupt is responsible for decrementing the

running process’s timeslice count. When the count

reaches zero, a flag called NEED_RESCHED is set and

the kernel runs the scheduler as soon as possible. Now

assume that a given process is running and has 2 ms of

its timeslice remaining. In 2 ms, the scheduler should

preempt the running process and begin executing a new

more important process. Unfortunately, this event does

not occur until the next timer interrupt, which might not

be in 2 ms. At worst the next timer interrupt might be

1/HZ of a second aware. With Hz = 100, a process can

get nearly 10 extra ms to run; by contrast, with Hz =

1000, the extra time would be limited under 1ms.

Due to the decrease of latency created by preemption

delay, the real time performance of SDR system would

get improved by greater accuracy in process preemption.

Taking data interface process as example, even if it has a

higher priority, the delay of data transfer is still out of

control when scheduling latency is introduced. In con-

clusion, fine-grain system timer makes these time-critical

processes wait less time to seizing the CPU and response

more timely .

4. Latency Requirement For TD-SCDMA

As a 3G standard, TD -SCDMA has a stricter demand for

data transfer latency. In order to calculate the maximum

transfer delay that could be tolerated, some detail of

physical channel signal format need to be elaborated in

advance.

The radio frame of 1.28 Mcps TD-SCDMA has a

length of 10ms, composed of two 5ms subframes. In each

subframe, there are seven traffic time slots and three spe-

cial time slots as shown in Figure 3. The 5ms subframe

contains 6400 chips. The traffic time slots are 864 chips

long. A physical channel is transmitted in a burst, which

is transmitted in a particular timeslot within allocated

radio frames.

Using the subframes structure, TD-SCDMA can oper-

ate on both symmetric and asymmetric mode by properly

configuring the number of downlink and uplink time

slots [15]. Figure 4 takes the H-ARQ [15] as an example

of asymmetric mode. The HS-DSCH related ACK/

NACK must be transmitted on the associated HS-SICH

in the next but one subframe. The time between the last

HS-DSCH and the HS-SICH would be spent in process-

ing and uplink data. If we take the first downlink time

slot as zero time reference, responses should be ready

before 3.45ms, which is the total time of downlink (in-

cluding the DwPTS). And 3.45 ms is the threshold of

arrival time.

In other words, if the uplink data has not been ready

before the time for the uplink time slot transmission

comes, it means the response failure. In the next section,

Figure 3. Subframe struct ure .

Subframe #n

Last

HS-

DSC

HS-

SIC

Subframe

#n+1

Subframe

#n-1

Threshold of arrival time (3.45

ms)

Zero time reference

Baseband processing time + Uplink time

Figure 4. H-ARQ.

Z. WANG ET AL.

296

existence of four infinite loops as disturbance programs

on PC (PC has two CPU cores). Consequently, the ratio

of response failure is 0.7841, on the contrary to 0.00024

without any interference on the same platform. It proves

that the existence of resource competition will make sys-

tem performance especially latency deteriorated because

less computing resources would allocated to data inter-

face process, when the number of tasks is greater than

that of CPU cores.

experiments are proposed to measure the round-trip la-

tency and calculate the ratio of laten cy over 3.45 ms, i.e.

response failure.

5. Experiment Result and Analysis

To validate the effectiveness of the proposed methods,

we conduct some experiments based on the system

model mentioned in Section 2.

In order to measure the round-trip time of the SDR

platform, the whole procedure of data transmission be-

tween PC and RF front end is illustrated as Figure 5. It

begins when the downlink data is received from the air

interface and ends when the uplink response is ready to

be sent by RF front end. Firstly, Downlink data arrive at

PC a little later than the air interface becau se of the delay

of USB2.0 transmission and process schedule, which is

shown in the third and fourth line of Figure 5. The PC

processes the downlink data and then sends the associ-

ated response i.e. ACK/NACK as shown in the fifth and

sixth line of Figure 5.

When data interface process has read the last data

packet and is waiting for the next, for avoiding of process

starvation, the kernel will allocate CPU to disturbance

tasks. But the issue is, when the data interface process is

woke up to read data from adapter board, the kernel has

to cost a period of time to reallocate CPU to it as ana-

lyzed in section 3.3. Therefore, even after we set

real-time priority to data interface process, the ratio of

round-trip latency over 3.45ms is only reduced to 0.582,

which still far more than 0.00024 in ideal environment.

Next, a series of experiments are conducted to validate

the effectiveness of the proposed methods. We set real-

time priority to data interface process, and then, record

the response failure ratios in different CPU usage con-

straints (i.e. no constraints, 80%, 60%, 40%, 20%, 10%)

to disturbance programs by reconfiguring cpu.cfs_

quota_us in cpu subsystem. The value of system timer

frequency is set as 128 Hz, 512 Hz, 1024 Hz.

As the analysis in section 4, the round-trip latency

must be constrain to 3.45ms at least, otherwise, some

uplink responses would not be ready in the RF front end

before the uplink time slots, i.e. the response fail. In ex-

periments, time between the first downlink time slot and

the arrival time of the uplink responses, would be re-

corded as round-trip latency to calculate the ratio of re-

sponse failure. The results are shown in Figure 6. The constraints to

CPU usage are reflected in x-axis and the ratios of re-

sponse failure are drawn in y-axis; meanwhile, system

timer of 128Hz, 512Hz and 1024Hz are represented by

In order to quantitatively analyze the influence of the

competition for computing resource on performance of

SDR system, we run our data interface program with the

Figure 5. Procedure of data transmission in experiments.

Z. WANG ET AL. 297

10 20 30 40 5060 70 80 90 100

-4

-3

-2

-1

ratio of response failure

cons trains to C PU usa ge of distruba nce programs (pe rcent)

e xpe ri ment re su l t

S yste m t i m er 12 8Hz

S yste m t i m er 51 2Hz

S yste m t i m er 10 24Hz

Figure 6. Experiment result.

blue curve, green curve and red curve respectively. The

System Timer of 1024 Hz performs better than that of

512 Hz and 128 Hz (default value in PC), because that

increasing the timer frequency to 1024 Hz lowers the

worst-case scheduling delay to just nearly 1 m. Besides,

along with more stringent constrains to CPU usage of

disturbance programs, the ratio of response failure has

decreased obviously, which proves that via using

CGROUPS, we can spare more computing resource to

our SDR tasks. Better yet, when the CPU consumption of

disturbance programs is limited to 10 percent and the

frequency of System timer is raised to 1024 Hz from 128

Hz, the round-trip latency will achieve a satisfactor y ratio

of response failure, 0.00056, that is magnitude equal to

that in environment lack of competition and lower

enough to meet the requirement of TD-SCDMA.

6. Conclusions

In this paper, we use real-time priority, CGROUPS and

high-resolution system timer in co mbination as above, to

optimize the real-time performance of SDR system.

Taking TD-SCDMA as example, we justify it by ex-

periments that the integrated use of these methods can

provide a latency guarantee to data transfer to meet the

requirement for this 3G system even under the competi-

tion for computing resource from other tasks.

7. Acknowledgements

This work was supported by State Key Laboratory team

building project ”the key technology research of inte-

grated wireless communication”, S&T Major Project

(2012ZX03003007-004) and National Science Foun-

ation of Beijing (41 10001). d

REFERENCES

[1] K. Tan, J. Zhang, J. Fang, H. Liu, Y. Ye, S. Wang, Y.

Zhang, H. Wu, W. Wang and G. M. Voelker. Sora: High

Performance Software Radio Using General Purpose

Multi-core Processors. In NSDI 2009.

[2] S. Mamidi, E. R. Blem, M. J. Schulte, et al., (2005, Sep-

tember). Instruction Set Extensions for Software Defined

Radio on a Multithreaded Processor. In Proceedings of

the 2005 International Conference on Compilers, Archi-

tectures and Synthesis for Embedded Systems, pp.

266-273.

[3] J. Zhang, K. Tan, S. Xiang, Q. Yin, Q. Luo, Y. He, J.

Fang and Y. Zhang, “Experimenting Software Radio with

the SORA Platform,” In ACM SIGCOMM Computer

Communication Review, Vol. 40, No. 4, 2010, pp.

469-470. doi:10.1145/1851275.1851268

[4] P. Guo, X. Qi, L. Xiao and S. Zhou, “A Novel GPP-based

Software-Defined Radio architecture,” In Communica-

tions and Networking in China (CHINACOM), 2012, 7th

International ICST Conference on, pp. 838-842.

[5] C-RAN http://labs.chinamobile.com/cran/

[6] D. P. Bovet and M. Cesati, Understanding the Linux

Kernel. O’Reilly Media, 3 Edition.

[7] P. Turner, B. B. Rao and N. Rao, “CPU Bandwidth Con-

trol for CFS,” In Proceedings of the Ottawa Linux Sym-

posium-OLS, Vol. 10, 2010, pp. 245-254.

[8] H. Ishii, Fujitsu’s Activities for Improving Linux as Pri-

mary OS for PRIMEQUEST. Fujitsu Science Technology

Journal, Vol. 47, No. 2, 2011, pp. 239-246.

[9] S. T. Dietrich, D. Walker. The evolution of real-time

linux. In Proc. 7th Real-Time Linux Workshop, 2005, pp.

3-4.

[10] R. Love. Linux Kernel Development. Addison-Wesley, 3

Edition.

[11] CGROUPS.

http://www.kernel.org/doc/Documentation/cgroups/cgrou

ps.txt

[12] The Practice of Resource Control Using Cgroup in Tao-

Bao main servers.

http://wenku.baidu.com/view/19668a5677232f60ddcca11

3.html

[13] CPU subsystem.

https://access.redhat.com/knowledge/docs/en-US/Red_Ha

t_Enteprise_Linux/6/html/Resource_Management_Guide/

sec-cpu.html.

[14] C. S. Wong, I. K. T. Tan, R. D. Kumari, J. W. Lam, W.

Fun. (2008, August). Fairness and Interactive Perform-

ance of O (1) and CFS Linux Kernel Schedulers. In In-

formation Technology, ITSim 2008. International Sym-

posium on, Vol. 4, pp. 1-8.

[15] 3GPP TS 25.221 V9.4.0 “Physical Channels and Map-

ping of Transport Channels onto Physical Channels,”

November, 2011.

[16] M. Rosenblum, The Reincarnation of Virtual Machines.

Queue, Vol. 2, No. 5, p. 34.

doi:10.1145/1016998.1017000