Wireless Sensor Network, 2010, 2, 668-674
doi:10.4236/wsn.2010.29080 Published Online September 2010 (http://www.SciRP.org/journal/wsn)
Copyright © 2010 SciRes. WSN
A Real-Time Urban Traffic Detection Algorithm Based on
Spatio-Temporal OD Matrix in Vehicular Sensor Network*
Ke Zhang, Guangtao Xue
Department of C omput e Science & Technolo gy, S han gh ai Ji ao To n g Uni v ersi t y , Shan gh ai, C hi n a
E-mail: xue-gt@cs.sjtu.edu.cn
Received July 4, 2010; revised August 7, 2010; accepted September 10, 2010
Abstract
Currently, there are kinds of algorithms in order to detect real-time urban traffic condition. Most of these
approaches consider speed of vehicles as a main metric to describe traffic situation. In this paper, we find out
two important observations through several experiments. (1) In urban city, the speed of vehicles is influenced
significantly by some factors such as traffic lights delay and road condition. The actual situation rarely satisfy
hypothesis required for these solutions. Therefore, these traditional algorithms do not work well in practical
environment. (2) Traffic volume on a road segment shows strong pattern and changes smoothly at adjacent
time. This feature of traffic volume inspires us to define a metric: traffic-rate, which is used to detect traffic
condition in real time. In our solution, we develop a novel traffic-detection algorithm based on original-
destination (OD) matrix. We illustrate our approach and measure its performance in real environment. The
performance evaluations confirm the effectiveness of our algorithm.
Keywords: KNN; Linear Least Square; OD Matrix; Traffic Status Detection
1. Introduction
Nowadays, traffic congestion has been a serious problem
in many urban cities. In China, it caused about 5%-8%
GDP wasted each year. Therefore, governments cost
million dollars to build Intelligen t Transportation System
(ITS). In response to these challenges, Original-
Destination (OD) matrix is often required as application
input to provide service for transportation. OD matrix is
a non-negative matrix(, )
f
ij which represents volumes
of traffic from a source ito a destinationj. In a road
network, element(,)
f
ij in the OD matrix means the
number of vehicles from road segment i to road seg-
ment j.However, traditional algorithms to construct OD
matrix do not work well in real world, since their solu-
tions do not consider features of real traffic condition.
Furthermore, the traditional algorithms of traffic detec-
tion have limitations when runs in real environment. In
their solutions, speed of vehicles is considered as a main
metric to identify traffic states. Nevertheless, we find
that speed is influenced significantly by traffic lights.
In this paper, we study this problem using GPS de-
vices which have been equipped in taxis and buses in
Shanghai. All GPS vehicular data are transferred to the
local APs which are deployed in intersections, and GPS
reports would be sent to a data center, where we run our
algorithm to construct OD matrix and analyze traffic
condition. We receive a great support from Shanghai
government and have deployed a cost-effective system to
collect traffic information. However, there are still lots of
challenges to process information in a metropolis like
Shanghai.
First of all, due to vehicular GPS signal is always
varying, all GPS location data we collected are not only
large and noisy but not uniformly distribute. Some areas
near downtown produce most of traffic volumes. Figure
1 shows millions of GPS reports on each road from Janu-
ary 5 to January 20 in 2007. In addition , according to our
statistics, almost 30% vehicles lost their GPS reports.
Besides, the solution must consider the fact that not
every vehicle has installed GPS device.
Although taxies and buses have installed vehicular
wireless communication devices whose communication
range is about 200 meters, it is costly if we deploy AP
every 200 meter. Thus, we need to interpolate traffic
volumes on road segments that do not have AP. Fur
*Supported by the National Natural Science Foundation of China unde
r
Grant No. 60970106, the National Grand Fundamental Research 973
Program of China under Grant No.2006CB303000, and Key Program
of National MIIT under Grant 2009ZX03006-004.
K. ZHANG ET AL.
Copyright © 2010 SciRes. WS N
669
Figure 1. GIS map of Shanghai with spatial distribution of
GPS reports from Jan.5 to Jan.20, 2007.
thermore, because of errors of GPS data (the range of the
error is about 20 meters to 100 meters), we have to firstly
apply our own GPS-correcting algorithms to abandon
“dirty” data and amend incorrect ones.
The last but not the least, it is d ifficult to captu re every
factor which has effect on traffic condition. For instance,
traffic lights delay and complicated condition of road
have significant impact on traffic condition. Meanwhile,
these factors are difficult to be estimated using a simple
model. This is why most proposed algorithms do not
work well in real life.
Fortunately, we find out a spatial-temporal model that
is able to overcome these limitations to reconstruct OD
matrix. In addition, we define a metric: Traffic-Rate,
which is used to describe current traffic condition. We
are going to discuss them in Subsection 2.5.
The remainder of this paper is organized as follows:
Subsection 1.1 provides background and compares our
approach with related work s. We then introduce our spa-
tial-temporal algorithm and traffic-detection solution in
detail in Section 2. We demonstrate our evaluation result
in Section 3. Finally, we present conclusion in Section 4.
1.1. Related Work
Due to importance of OD matrix, there are a number of
works focus on how to estimate OD matrix accurately in
recent years [1-3]. However, most of their solutions work
on highways or freeways, where traffic lights timing and
behavior of drivers are not issues because there are no
intersections and vehicles’ routes fo llowed a certain way.
On the other hand, traffic situation of an urban is so
complicate that the technical conditions which required
for these solutions are difficult to be satisfied. Yin Zhang
et al [4] illustrate a novel algorithm to estimate a traffic
matrix in Internet whereas our solution runs in the Ve-
hicular Sensor Network (VSN), more importantly, they
do not give the fact to prov e their observation.
With rapid evolvement of intelligent traffic system
(ITS), a lot of efforts focus on monitoring traffic and
incidents detections installing sensors like camera on
roads [5]. From a practical perspective, these approaches
need large sums of money to deploy numbers of sensors.
Moreover, Lin et al [6] and Coifman et al [7] propose
schemes to detect traffic condition by installing traffic
detectors. Meanwhile, their solutions work on freeway
not an urban area. How to define an appropriate metric to
describe traffic condition is also a hot topic for research-
ers. For example, Jungkeun Yoon et al [8] analyzed main
factors which influence traffic states. Furthermore, some
researchers [9-11] propose interesting schemes to esti-
mate velocity of vehicles to reflect traffic condition.
However, they do not take condition of roads into ac-
count. Therefore, their solutions have limitations when
runs in real environment.
2. Spatio-Temporal Original-Destination
Matrix
In this section, we give a brief introduction to OD matrix
first. Then we present our algorithm in detail. This con-
struction algorithm consists of two important rules that
are observed by experiments. We take advantage of OD
matrix to implement traffic-detection so lution.
2.1. Original-Destination Matrix
For a road network with
N
intersections which con-
nect
M
road segments, the OD matrix is a
square
M
M
matrix. In this paper, the element (, ,)
f
ijt
in OD matrix represents traffic volumes from iroad to
iroad during[, )tt t
. In order to get more convenience
for calculation, we convert OD matrix as a 2-dimensional
array, where columns represent OD matrix at different
time while the rows represent the evolution of traffic
volumes in each route.
2.2. The Spatial Component
After extracting and analyzing millions of real vehicles
GPS reports, we find the first important fact that there
are strong correlations of traffic volumes among road
segments. In practice, road ito some extent has similar
traffic flows with road
j
.
As is shown in Figure 2, we choose K (K = 4) road
segments which have the most similar traffic-rate (the
rate represents throughput of traffic in a road) evolution
for each road segment. Finally, we find out that most
neighbors consist of a linear combination of the target
road segment. In our solu tion, we develop this discovery
by using an approach: KNN (k-Nearest Neighbors).
K. ZHANG ET AL.
Copyright © 2010 SciRes. WSN
670
010 2030 40 5060
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time (m i nute)
TrafficRate
Target
Neighbor1
Neighbor2
Neighbor3
Neighbor4
Figure 2. Traffic-rate comparisons between 4 neighbors
and target road.
The KNN algorithm to calculate correlation is given as
follows:
1) A GPS dataset is used as the train dataset to con-
struct a complete OD matrix.
2) Any row in the OD matrix could be considered as a
vector. We take advantage of (1) to calculate similarity
of vectors. Always, we set K = 4 that a road segment has
four neighbors.
3) We use linear regression to find a vector of weights
( )(1,2...)kk K
so thatiroad is expressed by combina-
tion ofkroad . In (1), iroad represents the ith row of OD
matrix.
(, )cos|||| ||||
i
ik
i
k
k
road road
Correlation roadroadroad road

(1)
4
1
()
K
ik
k
roadk road
(2)
Even though we use just one train dataset to find
neighbors for each road segment, the neighbor relations
of the road segment can still be hold for other dataset.
2.3. The Temporal Component
As is show in the Figure 3, the left picture shows a dis-
tribution of traffic-rate from 11:00p.m to 13:00p.m during
one day. We interpolate these values and present them in
the right picture, which implies that traffic rate changes
in a small range (less than 0.1). Thus there is the second
important truth: The evolution of traffic volumes on a
road segment can keep “stable” during a time interval,
which is based on the fact that a number of vehicles do
not appear suddenly in the same time at same place,
namely (, ,)(, ,)
f
ijtfijt t
.
Meanwhile, from the second fact, we can construct a
050100 150
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Tim e (m i nut e)
TrafficRate
050100 150
0.05
0.1
0.15
0.2
0.25
0.3
Tim e (minut e)
TrafficRate
Figure 3. Traffic-rate in 4roads within 2hours.
temporal constraint matrix TM to show the smoothly
changes of traffic volume on each road.
The construction of TM matrix is simple, and the form of
TM matrix is given as follows:
110
01 1
00 1
TM







2.4. Construction of OD Matrix
Linear Least Square procedure is often used when con-
structing OD matrix in industry. Constructin g OD matrix
is modeled as follows:
QBX
(3)
Where Q represents total traffic volumes on the road
segment and can be easily measured by device like traf-
fic detector [6 ,7]. In (3), where
B
is a topological matrix
which represents the connectivity of each road segment
in an area. Our goal is to find Xwhich is the OD matrix.
In practice,
B
is often a low-rank matrix. Hence, there
are probably several solutions to this equation so we
have to answer two questions as follows:
1) Which solution is the best?
2) What constraints should we add to estimate X
more accurately?
As we mentioned above,
B
and Xare sparse ma-
trices. In implementation, we set a rank value as input
and pick the solutio n whose rank is less than our inpu t as
final result.
Through analyzing several experimental results, we
are inspired from the spatial component that each road
can be linear expressed by its neighbors, so we can take
advantage of KNN to build a spatial matrix SP to ex-
press which rows in the OD matrix are neighbors.
The constructio n of SP is given as follows:
1) Use KNN method to find the set of weights for each
K. ZHANG ET AL.
Copyright © 2010 SciRes. WSN
671
road segment.
2) For k = 1, 2, 3,…K, we let (,) 1SP i i
and
(,)( )kSP ijk

Then, a spatial constraint equation is given as follows:
*0SP X
(4)
Meanwhile, according to the temporal feature, the
temporal constraint equation is given as follows:
*0TM X
(5)
Now, we have acquired two constraint equations
through capturing features of traffic volume. In our sys-
tem, we use standard Linear Least Square (we develop
this procedure in Matlab) which is based on (3),(4),(5) to
estimate
X
.
2.5. Real-Time Traffic Condition
Description and Alarm
How to describe traffic condition is a meaningful but
difficult problem which always fascinate researchers
[9-11], who proposed variety of algorithms to try to pre-
dict speed of vehicles. Intuitively, a fast transit velocity
implies a good traffic condition. However, another
non-negligible is that speed can be influenced by many
factors which are difficult to measure, such as behavior
of drivers, traffic lights delay and conditio n of ro ad.
As is shown in Figure 4, we select a taxi randomly
and display its changes of speed during an hour. We can
see that the taxi stop when t = 17min and runs slow at
some time. These are normal phenomenon because of
traffic lights which have a significant impact on speed.
On the other hand, as is described in Figure 4, speed
does not have a regular pattern to pred ict in reality.
Instead, we propose a solution to capture the traffic
status and broadcast warning message to avoid traffic
05 10 15 20 25 30 35 40 45
0
10
20
30
40
50
60
Tim e (M i nute)
Velocity (kmph)
Figure 4. Speed changes during an hour.
co ngestion. We do not just use traffic volumes as a metri c
to identify traffic status. Because different road has dif-
ferent number of lanes and length, it is unreasonable to
use traffic volume to identify traffic jam in such cases.
Considering different condition of roads, in this p aper,
we define “traffic-rate”tRas a metric to describe current
traffic condition. The number of lan es and length of road
are taken into account in our design.
We calculatetRus ing followi ng:
1
1
in
tin out
S
SS
R
(6)
where inSexpresses the number of vehicles that enter into
the road during[,)tt t
, outSexpresses the number of
vehicles that leave the road during [,)tt t .
Meanwhile, we assign a threshold
for each road seg-
ment. From a simple perspective, if we find>tR
, we
think current traffic status is changing. Before we discuss
how to classify traffic status in detail, we give a solution
to calculate
dynamically.
There is a traditional solution to calculate
: we can
analyze the past GPS reports and take advantage of a
maximum a posteriori (MAP) classification with both
likelihood functions and a priori probabilities to calcu-
late
for each road segment. But this solution cost too
much time because of thousands of road segments. More
importantly, it does not have enough flexibility.
Now, we propose an experimental formula below to
calculate
dynamically.
(+)
L
N
VT D
(7)
where V expresses the mean speed of vehicles on the
road and T is an estimation value which can be calcu-
lated by using GPS information to show how long a taxi
may stay in the road. In (7), where D is the traffic light
duration, L represents length of the ro ad an d N represents
number of the lanes. Since on a road, the bigger
L
N
is,
the more vehicles the road can contain.
However, we observe that traffic light duration can be
calculated as follows:
out
L
DS
V
 (8)
whereL, Vand outShave the same mean we mentioned
above. Equation (8) has an intuitive mean that vehicles
should leave the road in traffic light duration.
Then, (7) is equivalent to
=1 + out
N
S
   (9)
Rather than previous studies, we do not simply iden-
tify two traffic states: good and bad. In our design, we
divide current traffic status into five states: FREE,
K. ZHANG ET AL.
Copyright © 2010 SciRes. WSN
672
NORMAL, ALERT, BUSY and OVERLOAD. Since we
must send warning messages of traffic condition to vehi-
cles before traffic condition becomes worse. Another
benefit is that we can make flexible policies when we are
processing traffic information. The rules to identify traf-
fic status on a road are given as follows:
1) 0tR
, the traffic state is FREE;
2) 2tR
 , the traffic state is NORMAL;
3) 23tR
 , the traffic state is ALERT,
4) 34tR
 , the traffic state is BUSY;
5) 41tR
, the traffic state is OVERLOAD.
In our experiments, 0.23
for most cases. When we
detect that current traffic state is ALERT, we start to
send warning message to those related vehicles around
the road.
3. Experiments
3.1. Experimental Setup
We have collected a great number of real traffic data
which are generated in real time by installing GPS de-
vices in taxis and buses from data center in Shanghai.
Even we do not collect all vehicular GPS reports since
some vehicles do not install GPS devices, the evaluations
also show that our solution can work well.
The fields (id, longitude, latitude, velocity, angle,
timestamp) denote the GPS reports, where id identifies a
taxi, the pair (longitude, latitude) shows current coordi-
nates of the taxi, timestamp is the time report, the
pair(velocity, angle) shows current speed and driving
direction of the taxi, which is u sed with Shanghai map to
calculate the short-term destination in our algorithm.
The basic experimental parameters are given as follows:
3.2. Methodology
In order to demonstrate the effectiveness of our solution,
we have developed a mobility model to generate vehicu-
lar GPS reports in a target area. The mobility model con-
sists of three parts:
1) We load GIS map of Shanghai to generate topo-
logical matrix.
2) We set source and destination coordinates for each
node. Traffic light duration is set according to surveys.
3) The nodes send GPS reports randomly and some
nodes do not send. Moreover, for making our simulation
close to real life, we also insert real GPS traces into our
test dataset.
Our initial OD matrix lost elements at some positions
and some elements have error in their value.
At last, we measure performance using the Normal-
ized Mean Absolute Error [4].
,: (,)0
,: (,)0
ˆ
|(,) (,)|
|(,)|
ijMij
ijMij
X
ij Xij
NMAE Xij
(5)
3.3. Performance
Figure 5 shows the performance of our algorithm ST and
comparison between SRSVD Base [4], KNN, and ST
when GPS reports lost randomly. The default parameters
have been listed in Table 1. We can clearly realize that
ST outperforms the other interpolation algorithms in our
experiment. Sin ce we cap ture s patial and temp oral feat ur es
of traffic flows, ST has more rules and techniques to
interpolate missing values than KNN and SRSVD Base.
Although there is high GPS data loss, ST can still keep a
good performance. From the comparison, we find that
there is significant difference between ST and other
algorithms when GPS-LOSS-PROBABILITY = 0.6.
KNN works well if enough GPS reports have been col-
lected in the datacenter. However, the performance of
KNN decreases sharply when more and more GPS data
lost. Because even we have chosen the best K neighbors
for target route, we still do not have another rule to in-
terpolate traffic flows for target road segment.
SRSVD Base does not work well, since it is to some
extent a pure technique of matrix calculation. Therefore,
it is not able to be adapted to complicated traffic condi-
tion.
3.4. Simulation
The basic experim ental pa ram eters are di splay ed in Tabl e 1.
In followed experiments, we assume that 30% vehicles
do not install GPS devices. Figure 6 shows evolutions of
tR on different road segments. At first, we select three
different road segments randomly in center of Shanghai.
Secondly, we collect the GPS reports on these roads
from 11:00 to 12:00 on Jan.7 2007. At last, we recon-
00.1 0.2 0.3 0.40.5 0.6 0.7 0.80.9
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
GP S report l oss probability
NM A E
KNN
SRSVD Base
ST
Figure 5. Performance for GPS report loss.
K. ZHANG ET AL.
Copyright © 2010 SciRes. WS N
673
Table 1: Experimental parameters.
Target Area 2
3.5km
Time duration 2hours (11:00-13:00)
Time granularity 5min
Number of road segments 146
Weather Sunny
structed OD matrix using ST to calculatetR. In this
experiment, we find that traffic lights have little effect on
Traffic-Rate and Traffic-Rat e chan ges smoothly.
In order to demonstrate performance of equations of
:
We select 10 road segments randomly and monitor the
traffic volume from 9:00-22:00 on Jan.10 2007. Then we
calculate tRand
every 5 minutes. Meanwhile, we use
mean speed of these vehicles to detect current traffic
status. Finally, we compare our result with pre-defined
traffic state.
Figure 7 shows that the accuracy of our solution and
comparison with traditional policy. Apparently, fro m this
experiment, speed is not a good metric to detect traffic
condition. Since it is often influenced by many factors,
and there are always some vehicles have “strange” driv-
ing behaviors because of habits of drivers. On the other
side, we can clear see that even 90% of accuracy can be
achieved at sometimes. There are also two least values
which are ab out 0.712 at 12:00 and 0.734 at 17:00. Since
they are two important time points when people come off
duty, and traffic condition at that time is difficult to be
captured. But in general, the performance of thresh-
old-based Traffic-Rate is good and adapted to the real
environment.
In Figure 8, the axis of abscissas represents proportion
of vehicles that installed GPS devices. This experiment
010 2030 40 50 60
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Time (Mi nut e)
Traffic-Rate
road1
road2
road3
Figure 6. Traffic-Rate changes during an hour on 3 roads.
0246810 1214 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Tim e (Hour)
Accuracy
Traffic-rate
Speed
Figure 7. Accuracy of threshold-based Traffic-Rate and
mean speed of vehicles.
0.5 0.55 0.60.65 0.70.750.8 0.85 0.9 0.951
0
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1
Ve hi cles Instal li ng G PS Devices
Accuracy
Accuracy
Figure 8. Accuracy of threshold-based Traffic-Rate.
proves that the more vehicles send GPS reports, the bet-
ter performance of our algorithm is.
4. Conclusions
In this paper, we have presented a practical solution to
detect traffic condition which is based on OD matrix in
an urban city by using a cost-effective system of vehicle
GPS sensors. Our algorithm consists of spatial and tem-
poral components which are based on neighbors of roads
and stability change of traffic volumes during a time.
Meanwhile, we use traffic-rate to capture traffic status
rather that speed. As a result, our solution overcomes
many limitations in existing approaches and yields a
good performance.
5. References
[1] D. Bhattacharjee, K. C. Sinha and J. V. Krogmeier,
“Modeling the Effects of Traveler Information on Free-
way Origin-Destination Demand Prediction,” Transpor-
tation Research, Vol. 6, No. 9, 2001, pp. 381-398.
K. ZHANG ET AL.
Copyright © 2010 SciRes. WSN
674
[2] G. L. Chang and J. F. Wu, “Recursive Estimation of
Time-Varying Origin-Destination Flows from Traffic
Counts in Freeway Corridors,” Transportation Research
Part B, Vol. 28, No. 2, 1994, pp. 141-160.
[3] A. Medina, N. Taft, K. Salamatian, S. Bhattacharyya and
C. Diet. “Traffic Matrix Estimation: Existing Techniques
and New Directions,” Proceedings of ACM SIGCOMM,
2002.
[4] Y. Zhang, M. Roughan, W. Willinger and L. L. Qiu, “Spa-
tio-Temporal Compressive Sensing and Internet Traffic
Matrices,” Proceedings of ACM SIGCOMM, 2009.
[5] Y. Cho, “Estimating Velocity Fields on a Freeway from
Low-Resolution Videos,” IEEE Transactions on Intelli-
gent Transportation Systems, Vol.7, No. 4, 2007, pp.
463-469.
[6] W. Lin and C. Daganzo, “A Simple Detection Scheme for
Delay-Inducing Freeway Incidents,” Transportation Re-
search, Vol. 31A of Part A, 1997, pp. 141-155.
[7] B. Coifman, “Identifying the Onset of Congestion Rapidly
with Existing Traffic Detector,” Transportation Research,
Vol. 37 of Part A, 2003, pp. 277-291.
[8] J. Yoon, B. Noble and M. Liu, “Surface Street Traffic
Estimation,” Proceedings of ACM Mobicom, 2007
[9] H. Z. Zhu, Y.M. Zhu, M. L., Li and L. M. Ni, “SEER:
Metropolitan-Scale Traffic Perception Based on Lossy
Sensory Data,” Proceedings of ACM INFOCOM, 2009.
[10] M. McNally, J. Marca, C. Rindty and A. Koos. “Tracer:
In-Vehicle, Gps-Based, Wireless Technology for Traffic
Surveillance and Management,” Technical Report UCB-
ITS-PRR-2003-23, California Partners for Advanced
Transit and Highways (PATH), July 2003.
[11] J. Ygnace, C. Drane, Y. Yim and R. Lacvivier. “Travel
Time Estimation on the San Francisco Bay Area Network
Using Cellular Phones as Probes,” Technical Report
UCB-ITS-PWB-2000-18, California Partners for Ad-
vanced Transit and Highways (PATH), September 2000.