Modeling of Data Reduction in Wireless Sensor Networks

doi:10.4236/wsn.2011.38029

Paper Menu >>

Journal Menu >>

Wireless Sensor Network, 2011, 3, 283-294

doi:10.4236/wsn.2011.38029 Published Online August 2011 (http://www.SciRP.org/journal/wsn)

Modeling of Data Reduction in Wireless Sensor Networks

Glenn Patterson, Mustafa Mehmet-Ali

Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada

E-mail: g_patt@ ece.concordia.ca, mustafa@ece.concordia.ca

Received June 27, 2011; revised July 21, 2011; accepted July 31, 2011

Abstract

In this paper, we present a stochastic model for data in a Wireless Sensor Network (WSN) using random

field theory. The model captures the space-time behavior of the underlying phenomenon being observed by

the network. We present results regarding the size and spatial distribution of the regions of the network that

sense statistically extreme values of the underlying phenomenon using the theory of extreme excursion re-

gions. These results compliment many existing works in the literature that describe algorithms to reduce the

data load, but lack an analytical approach to evaluate the size and spatial distribution of this load. We show

that if only the statistically extreme data is transmitted in the network, then the data load can be significantly

reduced. Finally, a simple performance model of a WSN is developed based on a collection of asynchronous

M/M/1 servers that work in parallel. We derive several performance measures from this performance model.

The presented results will be useful in the design of large scale sensor networks.

Keywords: Random Field Theory, Space-Time Behavior, Extreme Values, Data Reduction, Data Load, Mean

Packet Delay

1. Introduction

In many applications, it is anticipated that wireless sen-

sor networks (WSNs) will be composed of a large num-

ber of stationary sensor nodes randomly deployed on a

large terrain to monitor an environmental phenomenon.

The sensors are expected to be very basic devices having

limited computational power, transmission range, data

storage, and energy resources. Thus, it is expected that

the nodes will need to work cooperatively in order to

deliver the collected data to an information sink where it

can be accessed by the end user.

In the literature, there are many examples of algo-

rithms designed to collect the node data in such a way

that the energy consumption in the network is reduced.

Fundamentally, all of these algorithms seek to find a

suitable way to limit the size of the data load (i.e. the total

number of nodes which attempt to forward their data to

the sink). We now discuss several of these algorithms.

In [1], the authors use the idea of representing a spatial

phenomenon with contour lines to reduce number of

nodes which transmit sensor data. They describe a fully

distributed method for forming the contour lines where

the contour lines are constructed from the node data at

the sink. In [2], the Tiny Aggregation (TAG) service is

presented. This scheme uses SQL-based aggregation

queries to reduce the size of the data set. The aggregation

process is performed in-network with aggregates com-

puted on the data as it flows between the sensor nodes

towards the sink. Irrelevant data is discarded and relevant

data is combined into more compact records whenever

possible. In [3], the Clustered Aggregation Technique

(CAG) is presented for collecting data in a WSN. This

algorithm reduces the size of the data set by using que-

ries which exploit the inherent spatial correlation of the

data set. This protocol forms clusters of nodes with val-

ues within some threshold of one another and selects a

cluster head. In each cluster, only the cluster head reports

its value to the sink and therefore the algorithm is lossy.

The authors use simulation to show that a modest trade-

off in accuracy of the data using the CAG algorithm pro-

vides huge savings in terms of energy over the TAG al-

gorithm discussed earlier. In [4], the Power-Efficient

Gathering in Sensor Information Systems (PEGASIS)

data aggregation algorithm is developed. The authors not

only address the issue of energy savings, but also

achieving low latency in data delivery to the sink. One of

their goals is to find a convergecast structure which can

achieve low energy consumption and low latency. Unlike

TAG and CAG which use a tree structure to collect data

from nodes at the sink, PEGASIS uses a chain-based

structure. One node in each chain is selected as a leader.

284 G. PATTERSON ET AL.

Data collection occurs along the chain with the aggregate

value returned to the leader node. The leader node then

transmits the aggregated data to the sink.

To our knowledge, the existing literature is devoid of

research that attempts to determine an analytical model

for the size and spatial distribution of the data load in a

sensor network. Thus, while the existing literature ad-

dresses questions relating to how to reduce the size of the

data collected in a WSN, there are no models that esti-

mate the potential size of the reduced data load. The ex-

isting literature relies solely on simulation results to es-

timate this size. Since simulation faces severe complex-

ity problems in large scale WSNs, there is clearly a need

for analytical models which capture the size of the data

set. Since some data in a WSN may be more critical than

other data, we will be primarily interested in the size of

the data set when only the most critical data is transmit-

ted to the sink. In this respect, the present work will

compliment much of the existing literature devoted to

developing various algorithms for data gathering in sen-

sor networks.

In this work, we model the node data in a sensor net-

work as samples taken from an underlying Gaussian

random field. This random field is used to describe the

space-time behavior of the phenomenon being observed.

Random field models are popular in modeling large-scale

environmental phenomena that exhibit correlated random

variation in time and space [5,6]. We then use the con-

cept of high level excursion sets from random field the-

ory to study the size and spatial distribution of the set of

nodes that experience statistically “extreme” data in the

network. Statistically extreme data contains information

on where the state of the phenomenon is most critical

and in many applications we can imagine that it is this

data which is most important to the end user. It is shown

that the nominal data load experienced by a sensor net-

work can be significantly reduced if only statistically

extreme data values are transmitted to the sink. Then, the

notion of a high level excursion set is extended to define

contour lines of the random field on the network de-

ployment. It is shown that if we only transmit data from

nodes that are close to these contour lines, then, the data

load in the network can be further reduced.

The remainder of this paper is organized as follows. In

Section 2, we introduce random fields and present our

model for the node data in a WSN. In Section 3, we pre-

sent results related to the average data load in a sensor

network and show how the theory of high level excursion

regions of Gaussian random fields may be used to model

the extreme data load in the network. In Section 4, we

develop a performance model for a large scale WSN. In

Section 5, we present numerical and simulation results

regarding the analysis of the total data load and the net-

work performance. Finally, Section 6 contains the main

conclusions of the paper.

2. Random Fields and Modeling

2.1. Random Fields and the Underlying

Phenomenon

In this work, we use random field theory to describe the

behavior of the underlying phenomenon being monitored

by the network. A random field is simply a multidimen-

sional generalization of a one-dimensional random proc-

ess. In the physical sciences, random fields have been

used to model phenomena in such diverse areas as for-

estry, geomorphology, geology, turbulence, and seis-

mology [6]. Consider the following phenomena that are

amenable to modelling with random fields: Depth of

snowfall across a surface during a snowstorm; pollutant

concentration in a lake; shear stress along a fault line in

the earth; height of the ocean surface; amount of recov-

erable solar energy; areal density of the population of a

species; agricultural crop yield; distribution of rainfall on

a crop; inflow of water into a reservoir; density, porosity,

and permeability of soil; intensity of an earthquake [5].

Many physically occurring phenomenon exhibit Gaus-

sian or nearly Gaussian behavior [5,6] and, we will

therefore assume that space-time behavior of the under-

lying phenomenon is described by a Gaussian random

field. We assume that the network has been deployed to

monitor an environment on a subset of two-dimensional

space, , with area denoted by

S



S. We denote





the value of the Gaussian random field at a

location



sand time





0,t



. For a Gaussian ran-

dom field on





0,S



, the joint finite-dimensional

distribution is Gaussian at all points





0,tS (,)

1,i,n



, where n is some arbitrary integer. The co-

variance function between the data at two points













,, ,

j j

tt 0,S



ss is defined as:















,, ,

,, ,,

iij j

iij jiij j

Ct t

EXtXtmt mt







ss ss

(1)

where





,mts denotes the mean value of the random

field at location S



sat time .





0,t

Many phenomena are modeled as being separable in time

and space [5,7], which allows the covariance function to be

expressed as the product of two independent functions,









,, ,,,

iij jSijTi j

Ctt CCttss ss



(2)

where





Sij

Css and are spatial and tem-

poral covariance functions respectively. We note that a

function is only admissible as a covariance function if it

is positive definite. We will consider both separable and

non-separable models in this work.



Tij

Ctt



G. PATTERSON ET AL.

285

We will assume that the covariance function is iso-

tropic in space. This means that the spatial covariance

function may be represented as a function of the Euclid-

ean distance between pointsi

swhich we denote

by ,ij

, S



. then have that: We







SijSij



ss

(3)

Finally, it will also be assumed that the random field is

mean-square differentiable in space. A necessary condi-

tion for a Gaussian random field to be mean-square dif-

ferentiable is that



Sijij



exists and is finite

at ,0



 [5]. In practice, very few space-time phe-

nomena exhibit this property. However, in [5] it is shown

that if even a small amount of local averaging is per-

formed on the signal from the underlying random field,

then the resulting locally averaged field will be mean-

square differentiable. We note that local averaging may

be already “built-in” to the nodes in a WSN. It is ex-

pected that the nodes will have some sensing radius, r,

and we posit that the value that the node stores in mem-

ory will often be obtained by averaging over the region

of radius r centered about the node’s position [8].

Among the common spatial covariance functions [9]

that are mean-square differentiable, is the rational quad-

ratic function given below,



1ij

SR ij



























(4)

where 12

0, 0



. The parameter 1



controls how

fast the spatial covariance dies with distance and the pa-

rameter 2



controls the smoothness of the sample paths

of the random field. Higher values of 1



correspond to

stronger spatial correlation of the random field. This spa-

tial covariance function has the desired property of

mean-square differentiability discussed above.

In [9], a common stationary temporal covariance func-

tion is given. Let ij

ttt  be the time difference

between two samples. Then this covariance function is

given by,



Tij T

CttCt e





 (5)

In [10], the author argues that one setback of separable

covariance models is that they suffer from a lack of

smoothness away from the origin, (0, 0t











,21

ijtij

ij t

Ct t



 



















  (6)

where











is the modified Bessel function of the

second kind of order t







 and ,,







are positive

constants [10].

2.2. Model for Sensor Network Data

We first assume that N static sensor nodes have been

deployed uniformly on . The density of the node de-

ployment is therefore,



NAS



. We represent the

position of each node with a two dimensional vector



s, 1, ,iN





. We then assume that the sensor

network takes samples of a continuous Gaussian random

field at the locations of the sensor nodes at discrete times,

ttt. We let





denote the vector of sam-

pled node values at time . That is,

,1,,K













,, ,,,,

kkkNN

tXtXtX t













Xsss

1, ,kK



. We form the joint vector of node data at all

sampling instants as:



 

,,,

tt t









XX XX

also has a joint Gaussian distribution which will

now be determined. We construct the vector of the

means at each node at the sample instants as:

, ,,

tt t













Mmm m (7)

where









[,, ,,, ,]r

tkkN

mtmtm tms ss

k is the

vector of node means at sampling instant

,,K



.

The covariance between the data at two nodes at two

sampling instants 12

is determined from the par-

ticular covariance model of the underlying random field.

We form the NN



matrix of covariances between

the data at all nodes at times as:

 





121 2

,,,1

,, ,

kki kjk

tt ij

WW CXtXt













XX ss (8)

We then form the super covariance matrix

W as:

NN2

1,1 1,21,

2,1 2,22,

3,13,2 3,3

,1 ,1,

Tr Tr

KK KK

WW W

WWW

WW W











































)



. The

author proposes several non-separable covariance models

among which is an isotropic, stationary, space-time co-

variance model which makes the underlying phenome-

non Markovian in time. Let two points in the field be

separated by distance ,ij



and be sampled at times

seconds apart. For a random field on

t





0, 



this model is then given by:

We can then write the joint distribution of

as:







122

2π

Tr W



 



XX X

xM xM

x (9)

286 G. PATTERSON ET AL.

where . Unless it is other-

wise stated, we will assume throughout this work that the

mean value of the field is constant at a given time across

so that ,

 



, ,,

tt txx xx



mtmts1, ,SiN



, .





0,t

Clearly, in the simulation of sensor network data gen-

eration according to the full joint distribution has dimen-

sionality drawbacks. In cases where the covariance

structure gives rise to node data that is Markovian in

time as in (6), the dimensionality will be reduced. At a

given time instant we have that:

 





 





12 1

,,,

kk k

tt t











XX X

xx xx

Therefore, the data at an arbitrary sampling instant can

be generated based solely on the node data at the previ-

ous sampling instant and does not require knowledge of

the node data at all previous sampling instants. Now

consider the vectors of node data at two consecutive

sampling times, and . In [5], it is shown

that the conditional distribution





t



 

1kk



XX is also

Gaussian with p.d.f given by:

 







1,1 1

112

2π

kkk kkkk

tt N

kkk







 





xm xm

XX xX



(10)

This distribution is determined by the conditional

mean vector and conditional covariance matrix given by:





,1 1,111

1kkkkkkk

kk WW



 

 mm Xm (11)

,| 1,,11,1, 1

kkkkk kkkkkk

WWWWW





 



(12)

where k

, k

and is given by

(8). It is said that when the vector is actually

observed as 

mm



tXX



11kk



,kk



t



, then the observed vector

can be substituted into (11), and (12) in place of

k therefore permitting an updating of the prior

p.d.f. by the posterior p.d.f.

1k











f



 







111

kk tk k



XX xX .

This completes our representation of the data in a WSN.

3. Applications of Excursion Regions of

Random Fields

In the following analysis, we seek to determine the spa-

tial distribution of the data load and the global average

data load in a sensor network when nodes forward their

packets to the sink only if their sensed value is greater

than some threshold, b. The analysis in this section,

without loss of any generality, assumes that the underly-

ing isotropic Gaussian random field has been standard-

ized to have zero mean and unit variance.

3.1. Spatial Distribution of Data Load

As discussed earlier, it will be necessary to limit the

number of nodes which attempt to transmit their data

packets to the sink in a large-scale WSN. As well, we

noted that the sink will often be most interested in where

the state of the phenomenon is most severe or extreme.

The random variation of a phenomenon in space and

time can lead to occurrences of values with significant

deviation from the expected value of the phenomenon.

These values are called “extreme values”. A central topic

in random field theory is the analysis of the size (i.e. area)

of the regions in the plane where a random field exceeds

an extreme level. We will use the theory of the sizes of

these extreme regions in order to develop a spatial un-

derstanding of the data load in a sensor network. Since it

often the extreme data values which are most important

to the end user, we will assume that a sensor node only

attempts to transmit its data packet to the sink if the

value it observes from the underlying field is above some

high level, b.

Let us denote by ,ib

the area of the i’th isolated re-

gion on the network deployment area, where the value of

the field exceeds the level . We will call these

regions “excursion regions”. In [5], the average size (ie.

area) of an isolated excursion region for an isotropic

mean-square differentiable Gaussian random field is

given by:

b



12π,

EA b























(13)

where





b and







are the cumulative distribution

function (CDF) and probability density function (pdf) of

a standard normal random variable respectively. The

parameters Xk



, 1, 2k



, correspond to the directional

derivatives of the field in the direction of each of the

axes in 2-dimensional space and depend on the degree of

spatial correlation. For a spatially isotropic random field,



,



 are equal to one another and are given by,





d, 1, 2

Sij







 

 (14)

For the rational quadratic covariance function, these

derivates can be computed for (4) as 2







,

1, 2k



We note that as the degree of spatial correlation in-

creases, the average area of the isolated excursion re-

gions given by (13) also increases.

Our assumption that the nodes have been uniformly

deployed implies that the number of nodes on an isolated

excursion region is governed by the spatial Poisson pro-

cess with parameter given by the density of the node

G. PATTERSON ET AL.

287

deployment,



NAS



. We call a node whose posi-

tion lies within an isolated excursion region for a given

level b at a sampling time, an “excursion node”. We note

that by virtue of their location, excursion nodes necessar-

ily have a data value exceeding the level b. Let ,ib

denote the number of nodes on the excursion re-

gion for extreme value level b. We have that:

ith



Pr ,

kEA

EA e

Kk b











 

(15)

where we have approximated the area of an excursion

region with its expected value (13). The average number

of nodes on the excursion region is given by:

'ith

ib ib

KEAb







 

(16)

In [5], it is further shown that as the level b increases,

the number of isolated excursion regions approaches a

spatial Poisson process with the parameter given by:















(17)

From (17), the rate of isolated excursion regions in the

plane, b



, is inversely proportional to the average area

of an isolated excursion region. Therefore, as the degree

of spatial correlation increases, the frequency of the ex-

cursion regions in the plane decreases proportionately.

Expressions (13) and (17) imply that when the spatial

correlation of the underlying phenomenon is high, the

nodes that observe extreme values will tend to be located

on a relatively small number of large excursion regions.

On the other hand, for low value of spatial correlation,

the nodes that observe extreme values will tend to be

located on a large number of small excursion regions.

The preceding comments offer great insight into the spa-

tial distribution of excursion nodes (equivalently, the

data load) on the network deployment area.

A natural question arises, how high must the level b be

taken so that (13) and (17) hold. In [5], it is recom-

mended that b be at least twice the standard deviation of

the random field. An excellent study on the accuracy of

these expressions for non-asymptotic levels is found in

[11] where the authors look at a random field model de-

scribing the salt-induced delamination of a concrete slab.

3.2. Average Global Data Load

In this section, we derive an expression for the total av-

erage number of nodes which observe data above an ar-

bitrary level, b, and therefore attempt to forward their

data to the sink.

We define the global excursion area as the total area of

the network deployment where



tbs and denote

this area by b



. In [5] it is shown that for a region with

area 0, the average area within 0 that exceeds b is

given by

a a











01ab when the random field is spa-

tially homogeneous. Since we have assumed that the

underlying phenomenon is spatially isotropic (which

implies homogeneity, see [5,7]), we have that:



 

1AS







b







(18)

where





b is the cumulative distribution function

(CDF) of a standard normal random variable. We note

that this expression only depends on the level of the

threshold b and is independent of the spatial correlation

of the random field.

Expression (18) will allow us to determine the total

average data load experienced by a WSN in response to

the sampling of the phenomenon for an arbitrary value b.

Suppose that a sensor node only attempts to send its own

data packet to the sink if the data value in this packet

exceeds the level b. Denote the total number of excursion

nodes in the network by b

. Since we have assumed

that the sensor nodes have been uniformly deployed on S

with density



, the average number of excursion nodes

in the network (equivalently, the average number of data

packets or data load) at a sampling time is given by:



1Nb













(19)

This last expression shows that the total average num-

ber of excursion nodes is the same as if the data at the

nodes were i.i.d. This obviously follows from the invari-

ance of b



to the degree of spatial correlation. Thus,

the spatial distribution of the data values according to the

covariance model of the random field has no bearing on

the total average data load. In this respect, this expres-

sion describes the data load only on a global scale over

the whole network and does not capture information on

the location of the excursion nodes themselves. If the

level b is extreme, then the results from Section 3.1 can

be used to gain insight into the spatial description of this

total data load at various regions throughout S will often

be needed in large-scale WSNs.

An ability to describe the spatial distribution of the

data load (as in Section 3.1) will provide insight into

local contention for the channel, the spatial distribution

of newly generated data packets, and the spatial distribu-

tion of energy consumption. The utility of our expression

for the average global data load in (19) is useful from the

perspective that all node data must pass through the in-

formation bottleneck that occurs around the sink in a

WSN [8]. Many characteristic performance measures

will be strongly influenced by the behavior of the net-

work in the region closest to the sink and so, (19) de-

scribes the average data load that the nodes in this region

288 G. PATTERSON ET AL.

must bear. Thus, (19) will occupy a fundamentally im-

portant role in assessing the performance of the network.

3.3. Contour Nodes

In many WSN applications, it may be sufficient to rep-

resent the phenomenon with the data that belongs to a set

of discrete contour levels 12

,,,

bb b in the plane. We

note here that it may be assumed that the nodes imple-

ment some form of quantization of the values they sense

from the underlying random field. Since the sensor nodes

have limited storage capacity, we assume that a quanti-

zation of the continuous values of the node data is per-

formed when digitizing the analogue signal. For j



1, ,



, let us define a quantization region of half- width

around each level

b, where 0







. We will

then assume that if the value of the underlying random

field at a node location is in the range jj

,

then the value stored at the node is



δδ,bb

b. Thus, we can

think of all nodes whose values get quantized to a level

b, as belonging to a contour line associated with this

level. We will call such nodes “contour nodes”.

In [7] it is conjectured that for Gaussian random fields,

if the mean and covariance functions are both continuous,

then the sample paths of the random field are almost cer-

tainly continuous. Consider a single contour quantization

level b. Suppose we consider the areas ,δib



and

,δib associated with the excursion regions for

the levels and respectively. The continuity

of the sample paths assures us that each isolated excur-

sion region for level is contained within a corre-

sponding excursion region for level . Due to the

quantization of the sensed data, the number of nodes

“on” the contour line therefore corresponds to the

number of nodes between the boundaries of the two ex-

cursion regions. Let ,ib denote the number of contour

nodes that belong to the i’th contour line. The above

comments imply that:

A'ith

δb

ith

δb

b

b



,,δ,δibib ib

EA EA















δ

(20)

Let bdenote the average number of nodes that form

the contour line for level b over the entire network. From

the discussion above, we have:

bbb

GK K

(21)

where δb

, δb

are given by (19).

3.4. Extension to Non-Stationary Phenomenon

In this section, we give an extension of the preceding

results to scenarios where the Gaussian phenomenon

exhibits a form of spatial non-stationarity. Many natu-

rally occurring phenomena can be characterized as oc-

curring at an epicenter and affecting points in the sur-

rounding environment in inverse proportion to their dis-

tance to the epicenter. We will call such phenomenon

“point-source” type. We model phenomena of the point-

source type by introducing a location dependent mean at

each point in the environment surrounding the epicenter.

Recall our earlier observation that when the mean of a

spatially correlated phenomenon is constant across ,

the average number of excursion nodes in a WSN can be

computed as if the data were i.i.d. An analogous result

holds when the mean is allowed to vary across for

fixed time. In this case, the average number of excursion

nodes can be computed as if the data were independent

but not identically distributed. Denote the data value at

the node

1, ,iN



 by





ts. We assume that each





ts, 1, ,iN



 has a mean value given by





mts, 1,,iN



 and variance . Using the

previous comments, we have the following for the num-

ber of excursion nodes at any time:









Pr ,

tXstb

bmt



















(22)

where , 1,,



, are independent standardized

normal random variables. The accuracy of (22) will be

demonstrated later through simulation results.

4. WSN Performance Model

In this section, we construct a performance model of a

WSN using a contention based MAC protocol in order to

study the performance of the network when only excur-

sion nodes transmit their data to the sink. We seek to

construct a first-cut model that’s usefulness lies in its

simplicity and lack of dependency on the specifics of the

MAC and routing protocols. Next, we state our basic

assumptions:

 The time is diveded into slots of duration



seconds.

 The nodes collect data at discrete time instants every



seconds which will be assumed to be an integer

multiple of slot duration,



. After each sampling

time, only excursion nodes attempt to transmit their

data to the sink.

 Each excursion node will encapsulate its sensor read-

ing into a single packet of fixed length. Non-excursion

nodes only participate in relaying the data packets

from the excursion nodes to the sink. Downstream

excursion nodes may also assist in routing packets

from upstream excursion nodes.

 Sensor nodes access the channel through a modified

CSMA type of MAC protocol. Since we have a dis-

G. PATTERSON ET AL.

289

tributed system, there may be many transmissions

going on in the system simultaneously which are not

synchronized with each other. Thus, the traffic load

will be transported to the sink by a number of asyn-

chronous servers working in parallel.

4.1. Routing

In our performance model, we require a rough descrip-

tion of how data flows in a WSN. We assume without

loss of generality that the WSN has been deployed on a

semi-circle with radius with the sink located at the

origin. It will be assumed that the transmission range of a

node is Tx, which will also be referred as the one-hop

distance. We will assume that a packet advances on a

straight line path towards the sink by the amount of node

transmission range, , following each successful

transmission.

We now determine the minimum distance between two

nodes that experience simultaneous successful packet

transmissions. Each node with a packet to transmit con-

tends for the channel with its neighbors within its trans-

mission range. The successful transmission of a data

packet in a contention neighbourhood implies that there

were no hidden nodes that transmitted during this time

slot. We assume that a node’s listening range is the same

as its transmission radius. Then, each contention neigh-

bourhood can be visualized as two concentric circular

regions of radii Tx and respectively, which is

depicted in Figure 1. In Figure 1, node A is the node

currently attempting transmission to the next hop node B.

The nodes outside of the inner circle of radius Tx cen-

tred at node A and within a distance Tx of node B are

hidden from node A. We note that not all nodes in the

outer circle will corrupt the packet node A sends to node

B (ie. nodes in the outer circle that are further than Tx

from node B). However, we assume that all nodes in the

outer circle are hidden. If node A’s transmission is to be

successful, then no hidden nodes transmit during node

A’s transmission to node B and there are no other colli-

sions with A’s data packet. From this explanation, we

conclude that two nodes having successful packet trans-

missions should be separated by at least 3Tx from each

other. We may therefore assume that each contention

neighbourhood will have a radius of

r2Tx



32 Tx

r. As a re-

sult, the maximum number of simultaneous transmis-

sions in the network will be given by,

2Tx

Hcr



where 32c. Due to channel contention, the number

of simultaneous transmissions in the network will vary in

time.

Figure 1. Illustration of a contention neighbourhood. Nodes

outside the inner-circle do not transmit during a time slot. The

x’s represent sensor nodes in the contention neighbourhood.

4.2. Contention Model and Packet Service Time

Next, we describe the access of the sensor nodes to the

channel. Recall that nodes with packets to forward access

the channel using a slotted CSMA type of protocol. We

note that each successful transmission will be preceded

by a contention interval. In a contention neighbourhood,

let us assume that each node with a packet contends in-

dependently for the channel during a time slot with

probability , which will be called the “attempt” prob-

ability. Further, it will be assumed that

1pn, where

is the number of nodes contending for the channel

within a node’s transmission range. This value of p

maximizes the probability of successful transmission in a

slot,

P, which quickly approaches to a constant value

of 1e

as n increases (stabilization is reached after

about 4



ontending nodes) [20]. That is, 1

Pe

when 1pn



. Assuming that

P has reached this as-

ymptotic value, the contention interval will be geometri-

cally distributed in number of slots with parameter,



. The mean packet service time in seconds is

composed of the sum of this contention interval and the

packet transmission time. We have,





TT

(23)

where

T is defined as the average time to transmit a

packet which is determined by the data rate of the chan-

nel and the average packet length. As is customary, we

assume that packet lengths are exponentially distributed.

Further, we will assume that packet service times, ,

G. PATTERSON ET AL.

290





are exponentially distributed with the mean service time

given by (23).

4.3. Packet Arrival Process

According to (19), the average of number of excursion

nodes in a WSN observing a correlated stationary Gaus-

sian phenomenon is given by . In this

performance model, we will make the simplifying as-

sumption that the packet generation is uniformly distrib-

uted over the network area and that the level b is not

necessarily extreme. This assumption will allow us to

compare the performance of the network when non-ex-

treme and extreme data is transmitted. Thus, even though

the spatial distribution of data in a WSN when the level b

is extreme occurs in clusters of isolated Poisson distrib-

uted excursion regions, we will assume that the location

of each excursion node is random on the deployment

area. We then define the packet generation rate per unit

area as,



bN











(24)

Since the data traffic flows to the sink, the traffic load

per unit area will increase as the distance to the sink de-

creases. It will be assumed that the traffic density will be

same at the equidistance points from the sink. Let us de-

fine as the traffic density (i.e. total packet rate per

unit area) at a distance r from the sink. The traffic that

will pass through the boundary of the semicircle with

radius r will be the total data load generated by all the

excursion nodes located outside of this perimeter. Thus,

we have,







22 22

1π

π2

Rr Rr

rrr

 





(25)

It will be assumed that the traffic density throughout a

contention neighbourhood will have the same value as its

center. Then the total packet arrival rate at a contention

neigborhood whose center is at a distance r from the sink

will be given by,

 

πTx

rrc



r

mean service time given by (23) and mean arrival rate of

(26)

4.4. Packet Delay

We will model each contention neighbourhood as a dis-

tributed server. Each server on the routing path from an

excursion node to the sink can be modelled as a “virtual”

M/M/1 queue. Let us consider a contention neighbour-

hood located at . , 1

jTx

rjrjJ

We will model each server as a M/M/1 queue with

(26) with



. Thus, the mean packet delay of a

packet in a tion neighbourhood at j hops from the

sink will be given by the M/M/1 formula,

conten





d (27)

where







T.

contention neThe ighbourhoods that a packet passes

through on its routing path correspond to a series of

M/M/1 queues in tandem. The queue for the last hop

which interfaces with the sink will have the highest traf-

fic load. Clearly, if this queue is stable, then the queues

of the preceding hops will also be stable. Thus for the

stability of the system, we require that







T.

The average total delay of a packeti- which has orig

(28)

Next, we will determine the average total delay of

ted j hops away from the sink will be given by,

, for 1,,

jjJ

Dd

i

cket in the network. Let us define probability that a

node will be located j hops away from the sink,

Pr(a node is hops from sink)

pj





Since all the excursion nodes generate packets at the

same rate, then, the average total packet delay in the

network is given by,





D (29)

. Numerical and Simulation Results

this section, we plot curves of analytical expressions

.1. Average Data Load Results for a Random

Figure 2, we plot the average number of excursion

derived in the paper and verify their accuracy through

simulation results. In the following, it is assumed that the

underlying phenomenon is described by a spatially iso-

tropic Gaussian random field. All simulation results have

been obtained by averaging over 200 simulation runs. In

each run, the nodes have been randomly placed over the

network area according to uniform distribution.

5Field with a Constant Mean

nodes in the network normalized to the total node popu-

lation using (19) and the normalized average number of

contour nodes for different half-widths





0.025, 0.05, 0.1





using (21). Here we have assumed a

covariance structure (4) with 12

50, 1

rational quadratic



. Since the

nodes are homogeneous, theseond to the results corresp

normalized data load experienced by the network due to

G. PATTERSON ET AL.

291

excursion/contour nodes attempting to forward their

packets to the sink. It may be seen that for an extreme

value level at least twice the standard deviation of data, b

> 2, the excursion node population is well below 5 per-

cent of the total node po- pulation. These results also

illustrate that greater amount of data reduction is possible

when only the contour node population is allowed to

transmit data to the sink.

In Figure 3, we demonstrate the effect of the correla-

tion range parameter, 1



, when the phenomenon has

rational quadratic spatial covariance (4) on the average

area of an isolated excuion region for a standardized

Gaussian random field for levels b = {2, 2.5, 3}, and

fixed 21



, using (13). It may be seen that the average

size of an excursion region increases with the level of

correlan the other hand, for a given level of corre-

lation, the average size decreases with increasing level b.

In Figure 4 we plot the average number of isolated

excursion regions that occur on the network deployment

tion. O

area (17) as a function of 1



for standardized Gaussian

data with rational quadratic spatial covariance (4) for

levels b = {2, 2.5, 3} assumng that the nodes have been

deployed on a circular region with radius 400 meters. It

is seen that the average number of excursion regions de-

creases as the level of correlation increases.

Next, we present simulation results that verify the ac-

curacy of expressions for the average numb

er of excur-

on and contour nodes in the network, using (19) and

(21) respectively. We consider a WSN with N = 2000

sensor nodes deployed on a circular area of radius R =

400 meters. In Tables 1 and 2, we assume that time is

fixed and all network data has been standardized so that



mst ,





 In both tables, the node data is

Figure 3. The average area of an isolated excursion region

(m2) as a function of the correlation range parameter, α1

for levels b = {2.0, 2.5, 3.0}, and fixed roughness parameter

α2 = 1.

Figure 4. Average number of excursion regions on a circu-

lar deployment area with radius 400 metres as a function of

the correlation range parameter, for levels b = {2, 2.5, 3}

on with rational quadrariance where

and fixed roughness parameter α2 = 1.

generated according to an NN joint Gaussian distribu-

ti tic cova150,









(note that the entriee covariance matrix are

computed through (4)). Table 1 presents both the ana-

lytical and simulation results for the average number of

excursion nodes in the network for different levels b. As

may be seen, the analytical and simulation results are in

close agreement with each other over a wide range of

levels b. The last column in the table shows the percent-

age data load based on simulation results which illus-

trates the reduction in the nominal data load that can be

s of th

Figure 2. The average data load normalized to the total

network data load using the excursion and contour node

populations.

292 G. PATTERSON ET AL.

5.2. Average Data Load Results for a Random

d with atially anime Vary

Nex (22)

for m-stationary random fields is accurate under

enarios where the phenomenon occurs as an impulse at

hich has been deployed

achieved by only transmitting data generated by the ex-

cursion nodes for different levels b. It may be seen that

the load drops below one percent for values of b > 2.5.

The Table 2 presents the results for the average number

of contour nodes in the network for different contour

levels b with the contour line half-width fixed as

0.0 5



. Again, we can see that the analytical and

simulation values closely match each other over a wide

range of levels b. The percentage data load in Table II

ates that only transmitting data generated by

contour nodes can further reduce the nominal data load

in a WSN. It may be seen that the load drops below one

percent for values of b > 1.7 when 0.05

demonstr



.

Table 1. Analytical and simulation results for the average

number of excursion nodes in the network.

N = 2000 nodes, R = 400 m, α1 = 50, α2 = 1

Level b Kb Kb

(Analysis) (Simulation) %Data Load

0.00 1000.49.78 0 995.54

0.20

0.50

8 841.4808.21 40.41

30.74 617.08 614.72

0.70 483.93 489.36 24.47

1.00 317.31 320.24 16.01

1.20 230.14 217.69 10.88

1.50 133.61 131.49 6.57

1.70 89.13 84.71 4.24

1.90 57.43 57.68 2.88

2.00 45.50 42.60 2.13

2.20 27.81 27.33 1.37

2.50 12.42 13.03 0.65

2.70 6.93 7.23 0.36

2.90 3.73 4.19 0.21

3.00 2.70 2.54 0.13

Table 2. Analytical and simular average nu-

mbeontour nin networ k.

tion results fo

r of codes

N = 2000 nodes, R = 400 m, α1 = 50, α2 = 1, δ = 0.05

Level b Gb G

(Analysis) b

(Simulation) %Data Load

0.00 79.76 78.96 3.95

0.20

0.50

Fiela Spd Ting

Mea

t, we present simulation results which show that

ean non

an epicentre and induces an isotropic Gaussian random

field in the surrounding region.

In the general scenario, we assume that at time t = 0 an

impulse of magnitude c occurs at the epicenter and con-

sider a WSN with N nodes w

ithin some proximity of the epicentre. For t > 0, we

assume that a Gaussian random process occurs at each

node and that these processes are Markovian in time and

correlated according to the non-separable, Markovian

model given in (6). To model the effect of the impulse on

the node data, we assume that at t = 0 each node senses a

random value with a Gaussian distribution where the

data values are correlated according to (6), but with an

initial mean value inversely proportional to the node’s

distance to the epicentre, ,iE



. Thus, we have





iiE



 for node i. We assume that the mean

value at node i decays according to







0.1

mt me



.

To generate simulation res, we suppose that ults

1000N



nodes have been deployed at random on a

circular area of radius 250R



mis eters. Further, it

ass

chosen on

umed that the location of the epicentre is randomly

the deployment area and the initial impulse is

of magnitude 250c



anes at time 0td strik



. For

any time 0t, the covariance between the node data is

given according to (6) with parameters with 50,





1, 2





. l associate excursion nodes with

the level .0

We wil



. The node data at each sampling time

is generated based on the observed node data at the

g instant according to the conditional

Gaussian tion given by (10). In our simulation,

we generate samples of the node data every 5 seconds

until the phenomenon reaches steady-state (i.e. when the

mean value is 0 at all points across the deployment).

Analytical and simulation results are presented in Table

3. Table 3 shows that the average number of excursion

nodes in the network is accurately predicted by (22)

when the phenomenon exhibits spatial non-stationarity in

the mean sense and also a Markovian temporal evolution.

We note that when computing (22), the variance of each

node’s data at a sample time, 2(,)



s, is computed via

the covariance expression (6) with ,0, 0

ii t

pre-

vious sam

plin

istribu





. The

mean value of each node’s data at a sample time is given

according to the model we in the previous

paragraph.

778.18 6.72 3.84

3.51 70.39 70.11

0.70 62.44 61.54 3.08

1.00 48.38 48.71 2.44

1.20 38.84 39.09 1.95

1.50 25.92 26.66 1.33

1.70 18.82 20.23 1.01

1.90 13.14 14.05 0.70

2.00 10.81 11.57 0.58

2.20 7.11 7.06 0.35

2.50 3.51 3.55 0.18

assumed

G. PATTERSON ET AL.

293

Table 3. Analytical and simulation results for the average

number of excursion nodes after an Impulse phe nome non.

Time (s) Simulation Analysis

0 211.0833 213.3950

5 97.3333 96.1510

10 61.3333 56.9225

15 37.7500 40.0463

20 30.2500 31.7547

25 29.1667 28.1931

30 25.9167 25.3503

35 23.7500 24.3748

40 23.1667 24.0566

45 23.000 22.9921

5.3. Mean Delay Results

this section, we present plots of the average packet

tance to the sink using (27)

nd (28) in a hypothetical large-scale WSN of 5000

delay as a function of hop-dis

nodes deployed on a semi-circle of radius 500 meters.

The transmission range (equivalently, the hop-distance)

is set to 25

r meters. We have assumed the under-

lying phenomenon is a standarized isotropic Gaussian

random field with rational quadratic covariance with

parameter 12

50, 1

values



. As usual, we assume

that only excurison nodes attempt to transmit their pack-

ets to the sink. We assume all excursion node packets

have mean length nd that the data rate of

the channel is 30 kbps. This gives a mean packet trans-

mission time of 2.67 msec





T. Finally, we

assume a slot time of 10 sec

of 10 bytes, a





.

In Figure 5, we show the mean delay experience by a

packet at each hopfor excursion distance in the network

2.0, 2.5} ug (27). For a given

ng (28). This figure illustrates the reduction

not just take b arbitrarily high or there will be no trans-

node levels b = {1.5, sin

vel b, this figure shows the increase in mean delay as

the packet approaches the sink. It also shows how re-

ducing the data load by using extreme levels (b = 2.0,

and b = 2.5 can be considered extreme) reduces the mean

delay of packets, especially as the packets move closer to

the sink.

In Figure 6, we show the mean total delay of packets

originating at each hop distance from the sink in the

network usi

the mean delay achieved by increasing the level asso-

ciated with the excursion nodes. Note how there is only a

small reduction in the delay when the level is increased

from b = 2.0 to b = 2.5 whereas there is a significant

delay readuction when the level is increased from b = 1.0

to b = 2.0. There is clearly, a practical limit to the mini-

mal achievable total average delay. Obviously, we can-

Figure 5. The mean delay experienced by a packet at each

hop distance from the sink.

Figure 6. The total average delay in the WSN for packets

originating at different hop distances from the sink.

missions in the system. The level b, of course, should be

verage

ata load in the network for a given level b. In our sce-

chosen according to the needs of the end-user and as we

have seen, one can use (19) to evaluate the total a

nario, we have 1.0

= 93.2763 nodes, 2.0

= 13.7507

nodes, and 2.5

= 31.0483 nodes.

6. Conclusions

In this pape have presented an r weanalytical model for

e data values observed by the nodes in a WSN. We

nsor network data as samples from a

aussian random field. We have presented results for the

have modeled the se

G. PATTERSON ET AL.

294

tour Maps:

Monitoring and Diagnosis in Sensor Networks,” Com-

s: the International Journal of Computer

unication Networking (ACM), Vol. 50, No.

average total area of the network that experiences data

above an arbitrary level b, and the average area of an

isolated excursion region containing extreme data when

the level b is chosen sufficiently high. It has been shown

that the number of excursion regions on the network de-

ployment approaches a Poisson distribution for extreme

levels, b. The average size of the isolated excursion re-

gions increases while their frequency decreases when the

level of spatial correlation is increased. It has also been

observed that the sum of the average size of excursion

regions and the average excursion node population for a

level b are a function of b alone and not on the degree of

spatial correlation. We have quantified the data reduction

that may be achieved by only transmitting data from ex-

cursion and contour nodes for various levels b. Finally, a

performance model has been presented for large-scale

sensor networks that can be used to derive the mean de-

lay experienced by packets at each hop on their journey

to the sink. The results of the paper will be useful in the

design of large sensor networks in ensuring that the net-

work meets the requirements of the end user.

7. References

[1] X. Meng and T. Nandagopal, et.al. “Con

puter Network

and Telecomm

15, 2006, pp. 2820-2838.

doi:10.1016/j.comnet.2005.09.038

[2] S. R. Madden, M. J. Franklin, J. M. Hellerstein and W.

Hong, “TAG: A Tiny Aggregation Service for Ad-Hoc Sen-

sor Networks”, Proceedings of OSDI, Boston, June 2002.

[3] S. Yoon and C. Shahabi, “The Clustered AGgregation

(CAG) Technique Leveraging Spatial and Temporal

Correlations in Wireless Sensor Networks,” ACM Trans-

actions on Sensor Networks, Vol. 3, No. 1, article No. 3,

March, 2007. doi:10.1145/844128.844142

[4] S. Lindsey and K. M. Sivalingam, “Data Gathering Algo-

rithms in Sensor Networks Using Energy Metrics,” IEEE

Transactions on Parallel and Distributed Systems. Vol.

13, No. 9, 2002, pp. 924-934.

doi:10.1109/TPDS.2002.1036066

[5] E. Vanmarcke, “Random Fields: Analysis and Synthesis,”

iley Encyclopedia of Sta-

n Random Fields

. Willig, “Protocols and Architectures For

ral

MIT Press, Cambridge, 1983.

[6] R. Adler, “Random Fields,” W

tistical Sciences, 2006.

http://www.mrw.interscience.wiley.com/emrw/97804716

67193/ess/article/ess2164/current/html

[7] P. Abrahamsen, “A Review of Gaussia

and Correlation Functions,” 2nd Edition, Norwegian

Computing Center, April 1997.

http://www.math.ntnu.no/~omre/TMA4250/V2007/abrah

amsen2.ps.

[8] H. Karl, A

Wireless Sensor Networks,” Wiley, New York, 2005.

[9] M. Vuran, O. Akan and I. Akyildiz, “Spatio-Tempo

Correlation: Theory and Applications for Wireless Sensor

Networks,” Computer Networks, Vol. 45, No. 3, 2004, pp.

245-259. doi:10.1016/j.comnet.2004.03.007

[10] M. Stein, “Space-Time Covariance Functions,” Journal

of American Statistical Association, Vol. 100, No. 469,

2005, pp. 310-321.

doi:10.1198/016214504000000854

[11] G. A. Fenton, “A Random Field Excursion Model of

Salt-Induced Concrete Delamination,” Journal of Re-

search of the National Institute of Standards and Tech-

nology, Vol. 99, No. 4, 1994, pp. 475-483.