Journal of Geographic Information System, 2011, 3, 334-344
doi:10.4236/jgis.2011.34031 Published Online October 2011 (http://www.SciRP.org/journal/jgis)
Copyright © 2011 SciRes. JGIS
Moving towards Personalized Geospatial Queries
Giorgo s Mo untrakis1, Anthony Stefanidis2
1Department of Environmental Resources Engineering, State University of New York College of Environmental
Science and Forestry, Syracuse, USA
2Center for Geospatial Intelligence, Department of Geography & Geoinformation Science, George Mason University,
Fairfax, USA
E-mail: gm@esf.edu, astefani@gmu.edu
Received July 3, 2011; revised August 8, 2011; accepted August 10, 2011
Abstract
Geospatial datasets are typically available as distributed collections contributed by various government or
commercial providers. Supporting the diverse needs of various users that may be accessing the same dataset
for different applications remains a challenging issue. In order to overcome this challenge there is a clear
need to develop the capabilities to take into account complicated patterns of preference describing user
and/or application particularities, and use these patterns to rank query results in terms of suitability. This pa-
per offers a demonstration on how intelligent systems can assist geospatial queries to improve retrieval ac-
curacy by customizing results based on preference patterns. We outline the particularities of the geospatial
domain and present our method and its application.
Keywords: Geospatial Databases, Geographic Information Systems, Geospatial Queries, Similarity Learning,
Preference Modeling, Adaptive Systems, Digital Government.
1. Introduction
Geospatial information enjoys an increasingly important
role in modern day societies, as it is used to support a
variety of activities ranging fro m long-term planning and
modeling, to emergency response and disaster manage-
ment. Geospatial datasets may be diverse in nature, rang-
ing from digital imagery and raster datasets to thematic
layers of geographic information systems (GIS), vector
data, and diverse sensor feeds. These datasets are col-
lected, stored, and distributed by a variety of federal (e.g.
the National Geospatial-Intelligence Agency—NGA),
state (e.g. various state GIS offices), or local (e.g. town
records) agencies. In addition to these authoritative data-
sets we are now witnessing the emergence of volunteered
and participatory GIS [1], with datasets collected and
contributed by non-profit organizations (e.g. Ushahidi)
or even individuals. Through advancements in sensor
technology, computer hardware, and software we have
now reached the point where massive amounts of diverse
types of geospatial datasets are integrated in distributed
petabyte-size archives.
As the applications that use geospatial datasets are
quite diverse, it is not rare to have the same dataset (e.g.
a specific GIS layer) accessed by different users to sup-
port diverse applications (e.g. location-based services
through cell phone apps), or decision-making activities
(e.g. land use modeling, crisis monitoring). In order to
support query-based information retrieval (IR), geospa-
tial datasets are indexed with metadata describing their
essential properties (e.g. date, scale, accuracy, resolution,
time, provider). Queries are typically performed by hav-
ing a user stating his/her preference in terms of these
metadata, e.g. “retrieve satellite images of Fukushima,
Japan after March 11 2011, with pixel resolution equal
to or better than 1 meter”. The suitability of available
datasets is the n evaluated u sing standard dista nce metrics
to compare them to the query request and rank them ac-
cording to their similarity to the query parameters [2].
However, standard IR approaches fail to capture pref-
erence differences among diverse users. For example, a
transportation expert and an emergency responder may
have different preference patterns as they aim to retrieve
satellite imagery depicting an area of interest at a specific
instance. In the above-provided example relating to the
Fukushima earthquake and subsequent tsunami, the
transportation expert would prefer the most recent image
available after the earthquake (e.g. from May, 2011), as
her task may be to update the road network maps and
capture the current state of transportation infrastructure
G. MOUNTRAKIS ET AL.335
in the area. On the other hand, the emergency responder
may be interested in imagery showing the tsunami at its
farthest location, before it started receding, to better as-
sess the full extent of the impact area. For this responder
the most recent imagery may therefore be of lesser value
than imagery captured one to two hours after the earth-
quake.
The above is a simple example that shows potential
variations in user-database interaction. Users that attempt
to access collections of geospatial datasets have diversi-
fied information needs, reflecting differences in the ex-
perience and/or task at hand. As the user community of
geospatial information is growing and becoming in-
creasingly more diverse, such preference variations are
becoming the norm rather than an exception. Standard IR
techniques fail to take into account such complex pref-
erence patterns. In order to overcome this shortcoming
we need to support the customization of user query exe-
cution by taking into account the particularities of a
user’s preferences. In this paper we present our approach
to model user preferences in geospatial applications in
order to improve the performance of geospatial queries.
Before we proceed to the specifics of our method, we
address trends in the generation, storage, and delivery of
geospatial information.
2. Geospatial Dataset Availability
Authoritative geospatial datasets have been traditionally
generated, used, and delivered by a variety of local, state,
and federal government agencies. They are typically
available as distributed collections through correspond-
ing portals.
A representative example of a federal collection of
geospatial datasets is the National Atlas (http://www.
natio nalatla s. go v/) , of fer in g m ap co ver a ge ac r oss t he US ,
with various themes (e.g. agricultural and transportation
data overlaid on basic maps). The National Geospa-
tial-Intelligence Agency (NGA, www.nga.mil) is offer-
ing access to charts and images. Users can access this
information through its Raster Roam interface. Users
can access a specific file either by selecting an area in a
map display, or by using geographic names in a gazet-
teer-like approach. Furthermore, the Federal Geographic
Data Committee (FGDC) (www.fgdc.gov) of the US
Geological Survey (USGS) offers a distributed discovery
mechanism comprising regional clearinghouses for digi-
tal geospatial dataset deli very. The Environmental Pro-
tection Agency (EPA) offers a wide variety of geospa-
tially-referenced information (e.g. water quality and haz-
ardous waste data), queried through a zip-code based
system (http://www.epa.gov/enviro/html/qmr.html). In an
effort to address the particular needs of disaster response,
government agencies have set up dedicated portals that
aggregate specific types of information. For example the
Geospatial Multi-Agency Coordination Group effort
(GeoMAC) (http://www.geomac.gov) aggregates fire-
related information (incl. fire perimeter, terrain, and
MODIS satellite datasets) across the continental United
States. The Natural Hazards Support System (NHSS)
( http://nhss.cr.usgs.gov) is another example of an inte-
grative portal, offering information on various natural
hazards, e.g. volcanic, earthquake, and flooding informa-
tion, together with satellite imagery. These federal-level
datasets are complemented by countless regional collec-
tions of geospatial datasets collected and distributed
through states, cities, and municipalities.
This early model of government-driven geospatial
dataset collection and ad ministration evol ved through the
proliferation of commercial remote sensing and geospa-
tial analysis endeavours. For example, TerraServer
(www.terraserver.com) is a collaboration of commercial
(Microsoft, Compaq) and federal (USGS) partners, that
offers a collection of digital imagery from numerous
providers, arranged by location (e.g. coordinates, city
name, street address, zip code), in various time instances.
USGS photography comprises few terabytes of data, and
is accessible through several host servers. Another nota-
ble commercialized collection of geospatial datasets is
Mapquest (www.mapquest.com), using maps of the
complete US in a variety of scales (e.g. 1:100,000,
1:25,000), 1-meter resolution aerial photography, and
detailed street maps. Probably the most popular com-
mercial implementation is the adaptation of Keyhole
technology to build Google Earth (earth.google.com),
aggregating a massive collection of satellite and field
data.
The latest evolution of geospatial dataset availability is
the on-going emergence of volunteered and participatory
geographic information (VGI). Crisis mapping is a par-
ticularly relevant example, with the aggregation of au-
thoritative datasets with contributed multimodal infor-
mation to capture the consequences and evolution of a
catastrophic situation [3,4]. Ushahidi (http://ushahidi.
com) and its utilization during the Haiti 2010 earthquake
disaster is by now a classic example of VGI at work. In
addition to these examples, where the general public is
contributing information, we also have services like
Google MapMaker (http://www.google.com/mapmaker),
where citizens are given the opportunity to perform in-
formation extraction tasks, like road centerline delinea-
tion, thus contributing directly geospatial information.
Thus we see that geospatial datasets are made avail-
able in numerous distributed collections of terabyte-sized
arc hives of go vernme nt, comme rcial, or non-pr ofit age n-
cies, each following established standards and specifica-
Copyright © 2011 SciRes. JGIS
G. MOUNTRAKIS ET AL.
Copyright © 2011 SciRes. JGIS
336
4) Based on user request an indexing mechanism is
used to return all potentially similar objects, in essence
filtering dissimilar ones to accelerate the retrieval proc-
ess. One filtering example is to temporary identifying all
buildings larger than 2000 m2 and ignores all other build-
ings.
tions in terms of a variety of parameters, including accu-
racy, format, metadata, scale, organization. Users access
these datasets through the corresponding agency portals,
either by browsing collections, or by forming meta-
data-based queries as mentioned in the previous section.
The challenge faced by applications employing geospa-
tial databases is to support the diverse needs of various
users that may be accessing the same dataset for differ-
ent applications. In order to do so we need to be able to
take into account the complicated patterns of preference
that correspond to a user and/or application, and use
these patterns to rank existing datasets in terms of suit-
ability.
5) On this filtered object collection a similarity algo-
rithm with properties extracte d from a knowledge base is
applied. The output is either a certain number of best
answers (e.g. 10 best datasets) or answers within a spe-
cific similarity range (e.g. higher than 80%). For further
information see section 6 of this paper.
6) The results are presented to the users to assess their
similarity accuracy. To help our paper readability, here are the definitions
of three important terms as presented within the context
of this paper: In the above information flow there are several areas
of interest that the database community is working on.
Various disciplines are involved in the process and many
different approaches have been proposed. Specifically,
large distributed information source repositories are cre-
ated and issues related to storing and accessing these
databases are investigated. Ontologies are introduced to
compensate for different field descriptions, as well as
multi-node architectures and theoretical database models
to support them. Query languages and indexing mecha-
nisms for faster information retrieval are developed.
Similarity refers to how appropriate is a given re-
sponse to a geospatial information request.
Preference relates to users expressing their indi-
vidual suitability metrics for similarity.
Similarity learnin g is the process of identifying and
expressing in mathematical terms user preference on
suitability.
3. Similarity in Geospatial Information
Our work concentrates on step five on the previous list.
The goal is to develop a similarity algorithm that will
rank the results in an accurate way. In order to do so,
when a user is performing a geospatial information re-
quest, some identification information of user prefer-
ences is forwarded to a knowledge base (dotted arrow on
the graph) and the appropriate similarity profile is ex-
tracted and incorporated in the query process.
Before we get into the specifics of our similarity learning
approach, let us first examine the information retrieval
process and the corresponding steps involved. Every
request for geospatial information involves a collection
of methods, some of which have been addressed exten-
sively in the literature and some others are newly inves-
tigated. In Figure 1, a schematic representation of the
query process is shown. The following steps take place: The current methodologies used for similarity assess-
ment of geospatial information have a common charac-
teristic: they are non-adaptable to specific user prefer-
ences, instead they are expressed as pre-defined similar-
ity measures and remain the same independently of
task/user requirements. Similarity calculation is per-
formed by storing geospatial information metadata as
points in the feature space and using a distance metric to
measure correlation to these points [5-7]. As mentioned
above, commonly used metadata information includes
expressions of resolution, accuracy, spatial extent, scale,
date, and source. Usually a Minkowskian p-distance [8]
is employed to define the similarity measure and is de-
fine d as:
1) Users request an information object from the data-
base (or more than one). For example a user may request
all buildings within a given area that are larger than 2000
m2 and within 1 km from a highway exit.
2) Their request is translated into a structured query
that the system understands and that is compatible with
the database collection. In this step the user-provided
information is matched to specific database fields and
content. For example, ontology may be used to match the
query for “buildings”, a non-existent term in the database,
to “single-family detached houses”, an existent field in
the database and therefore resolve ambiguity.
3) A query language is used as a mediator between
user and database. This step is essential to convert user
preferences into an automated executable code for in-
formation retrieval. One predominant example of such
programming language is Structured Query Language
(SQL).

1
1
(,) p
p
n
pi
i
Lxyx y

i
For p = 2 we have the traditional Euclidean distance
metric. If p = 1 then the Minko wskian distance expresses
the Manhattan distance func tion. Another funct ion is the
G. MOUNTRAKIS ET AL.337
Geospatial Information
Figure 1. Query processi ng for geospatial informati on access.
Quadratic distance that is a weighted form of the
multi-dimensional Euclidean. Other functions and corre-
sponding mathematical expressions can be found in [9].
The above functions provide a simple model that allows
efficient indexing by dimensionality reduction tech-
niques. On the other hand though, this simplicity makes
it impossible for these functions to take into account
complex patterns of preference of diverse users, and use
them to rank query responses accordingly.
So why not develop adaptable similarity methods spe-
cifically designed for geospatial databases? After all, this
has been an active field of research for decades in other
domains (e.g. text retrieval, web mining). The benefits of
such work are obvious, but are the task simple enough to
undertake? A major reason why adaptable similarit y mod-
els for geospatia l informatio n have not yet progressed s ig-
nificantly comes from the considerable challenges im-
posed by the nature of the problem .
4. Similarity Modeling for Geospatial Data:
Not that Easy after All
Adaptable similarity models for geospatial data impose a
dual difficulty. The first comes from the multiple disci-
Fil ter S ource s
User
Request dataset
Similarity Profile
Calculate Simi larity
Return best ma tches
Formulate Query
Knowledge
Base
Visualize Results
Grid
Copyright © 2011 SciRes. JGIS
G. MOUNTRAKIS ET AL.
338
plines involved in the similarity learning task and it is
not unique to geospatial datasets. The second difficulty
rises from the particularities of the geospatial domain
and corresponding user needs. This uniqueness of the
geospatial domain is the focus of our attention.
4.1. The Interdisciplinary Nature of Similarity
Learning
Three general areas of research have been suggested in
[10], namely psychology, data mining and machine
learning. Briefly discussed here, the psychologists have
concentrated on the human understanding and expression
of similarity. Their research verifies that there is an ob-
jective parameter (i.e. user dependent), which has to be
addressed for successful similarity modeling. Building
on that, several tasks from the data mining field can be
borrowed to accomplish our goal, tasks such as classifi-
cation, regression, time series analysis and others. As
expected, similarity learning naturally also falls under
the general category of machine learning methods. The
influence from multiple disciplines such as statistics,
databases and artificial intelligence in machine learning
is well-documented in the literature. From the similarity
learning perspective an important distinction of machine
learning algorithms is between supervised and unsuper-
vised methods. Since most similarity learning algorithms
learn from example, in other words they need a supervi-
sor (teacher) to provide them some reference output val-
ues, they belong in the supervised category.
4.2. Particular it i es of the G e osp atial Do main
Similarity learning in database queries is intrinsically
connected with the data types stored. Geospatial data
have important differences to online analytical process-
ing data, general multi-dimensional data, traditio nal rela-
tional data or transactional data [11]. This uniqueness is
partially attributed to the integrative nature of GIS. Many
of the issues arise from the fact that geographic data span
a wide range of perspectives and interests from the social
to the physical aspects of the problem [12]. This mixture
of perspectives coupled with the growing infrastructure
for gathering information pose the following challenges:
1) Diverse data types. The wide variety of digital geo-
graphic data imposes a number of constraints/demands to
similarity learning algorithms. Distributed datasets are
becoming increasingly prevalent and important as a source
of geographically referenced data [13] and thus tend to
comprise a variety of geo-referenced multimedia data
types, such as still and video imagery, text, graphics, and
ev en au d io and anim at ion s [14] .
2) Dimensiona lity g ro uping and depe ndencie s. Geo-
spatial databases tend to be high-dimensional, as for
example location information is accompanied by radio-
metric content, elevation data, ownership information,
and temporal records. It is important to note that among
these multiple dimensions we can recognize groups that
are highly related among themselves, but remain quite
different from other groups. For example, there exists a
high conceptual affinity among the three spatial dimen-
sions (x,y,z) as they are represented by similar structures
and often have comparable values, while there is an ob-
vious lack of such affinity among them and an alphanu-
meric ownership record. Accordingly, dimensions tend
to be grouped together in conceptual features (e.g. spatial
information, thematic attributes).
However, regardless of conceptual affinity, heteroge-
neous features may display high dependency among
them (e.g. space and time). This dependency needs to be
exploited when querying a database in order to recognize
for example complex spatiotemporal events and patterns.
Querying space and time separately would fail to ade-
quately address this inherent spatiotemporal complexity.
Similarly, the radiometric content of satellite imagery
may be highly correlated to sensor information. This
grouping of dimensions and the need to exploit
cross-grouping dependencies is another issue that differ-
entiates geospatial databases from other high-dimensio-
nal o nes.
3) Data volume. Like many disciplines where learn-
ing algorithms are applied, GIS is rich in data. In addi-
tion to traditionall y considered geospatial datab ases (e.g.
maps, photographs), numerous other databases (e.g.
consumer, medical, and financial records) are now incor-
porating spatial and temporal attributes and hence offer
the possibility of discoverin g or confirming geographical
knowledge [15]. As mentioned above, geospatial dataset
collections are now terabyte-sized, and traditional re-
trieval methods have a hard time to keeping up. Further-
more, maintaining and evaluating these large amounts of
information is a major challenge, leading to frequent oc-
currence of incomplete or missing data.
4) Complexities due to loca l variation. Earth systems
are so intrinsically interconnected that it is difficult to
isolate an analysis conducted on some part of a system
from the affects of other unmodeled aspects [8]. This
translates into potential generalization problems of simi-
larity algorithms. Measured geographic attributes often
exhibit the seemingly contradictory properties of spatial
correlation and spatial heterogeneity. The former (corre-
lation) refers to the tendency of attributes at some loca-
tions in space to be related, also known as Tobler’s first
law of geography [16]: “Everything is related to every-
thing else but nearby things are more related than distant
things”. However, and despite the effect of spatial corre-
Copyright © 2011 SciRes. JGIS
G. MOUNTRAKIS ET AL.339
lation on the major trends of spatial information, geo-
graphic phenomena are often highly localized. Spatial
heterogeneity describes this non-stationarity of most
geographic processes, and expresses the fact that global
parameters do not necessarily describe well the localized
nature of some geographical phenomena.
5) Granularity. In most non-geographic domains,
data objects are meaningfully represented discretely
within the information space without losing important
properties [17]. But this does not seem to extend to geo-
graphic objects [18]: size, shape and boundaries can af-
fect geographic processes, therefore generalization can-
not be achieved without information loss in both raster
and vector representations. Scales and granularities for
measuring time are also complex, preventing a simple
“dimensioning up” of space to include time. Moreover
micro data, observations on individual observational
units, might not always be accessible, e.g. due to dis-
semination, confidentiality or cost constraints. Macro
data (aggregates of micro data) are used instead. Exam-
ples of macro data include counts, frequencies, sums,
averages and other statistics characterizing micro data.
5. Geospatial User Profiles
5.1. Motivation
Until now we defined desired characteristics for a simi-
larity learning algorithm. Similarity is typically calcu-
lated by comparing a stored set of values to the ones the
users query for. First each query value (attribute) is
compared to the corresponding stored one, for example
the time of a stored aerial photograph to the correspond-
ing query value for time, the scale of the stored aerial
photograph to the query and so on for every requested
attribute. Then results from this comparison expressing
similarity within every attribute (similarity in time, scale,
etc) are aggregated to provide an overall similarity met-
ric, a metric showing the overall similarity between the
query and the stored aerial photograph based on these
individual metrics from every attribute.
Existing methodologies concentrate on multi-attribute
(i.e. multi-dimensional) si milarity aggregation to provide
an overall similarity metric. In some cases though prob-
lem complexity relies o n the si milarity calculation within
each dimension separately rather than on their combined
aggregation. This is frequently the case when querying
for GIS datasets. The information retrieval process might
fail because the individual similarity metrics in every
dimension may not be able to capture user similarity
preferences.
A common example of such similarity preference in
GIS is when asymmetric, non-linear user behavior is
exhibited during the direct comparison of attributes. For
example, let us consider a geospatial database and a user
request for an aerial image of specific ground pixel size
for building extraction. User interest decreases gradually
(but not necessarily linearly) as pixel size increases to the
degree that buildings would not be identifiable. Further-
more, the user may have cost considerations (e.g. cost,
storage and processing time) associated with a higher
resolution acquisitio n. This tr anslates to a similarity rela-
tion that can also be non-linear as resolution improves.
So it is easily understood that we need asymmetrical,
non-linear relations to model user preference within each
attribute comparison. Thus, in geospatial queries user
preferences may be significantly more complex than
general queries (e.g. text queries), while the diversity of
users and applications is further emphasizing the need
for efficient modeling. Therefore, modeling user similar-
ity preference within each attribute can substantially help
geospatial queries. Motivated by these observations, the
focus of our work is to investigate the application of
complex functions for user preference within each attrib-
ute. The integration of similarity results from multiple
attributes is part of our future work.
5.2. A User Preference-Based Approach
In order to adapt similarity models to user preferences
we developed a relevance feedback algorithm. Users are
presented with a variety of pairs of requested and re-
turned values and are asked to provide a preference met-
ric for each pair. The corresponding training dataset is
created and used as input for our preference learning
method. Figure 2 shows a typical training session, where
the user is given the Query (X axis), and Database value
(Y axis) and is requested to provide a similarity assess-
ment of these two.
The result corresponds to the Similarity value (Z axis).
The problem can easily be seen as a surface-fitting one,
Figure 2. Training example.
Copyright © 2011 SciRes. JGIS
G. MOUNTRAKIS ET AL.
Copyright © 2011 SciRes. JGIS
340
where it is attempted to substitute the provided three-
dimensional points with a surface (function). For training
several preference models are used of as expressed
thro ugh a varie ty of fuzz y member ship f unctio ns (FMFs) .
The approach is simple yet effective: gradually increase
the complexity of the underlying FMF until an accept-
able solution is reached. The process begins by interpo-
lating a set of planes to the training dataset [19]. We
examine the resulting accuracy and if it is within the
predefined specifications we end the process. These pre-
defined specifications are in essence thresholds describ-
ing the maximum acceptable error between the interpo-
lated functions and the training points. They can be pre-
set by the database designer or adjusted in real-time by
the user. If the results are not within these thresholds, we
examine the obtained plane parameters. This analysis
leads to a decision whether similarity is dependent on the
query value, their difference metric or the actual database
and query values. We continue by interpolating two sig-
moidal functions whose initial approximations are calcu-
lated from the plane properties. If required accuracy is
not achieved, we provide further modeling capabilities
by parameterizing further the FMFs parameters. At the
last stage we obtain the best possible set of FMFs that
express user preference as presented through the training
set. If accuracy is not yet achieved, we trigger a neural
network process to correct local errors. More information
on the training mechanism and the corresponding mod-
eling capabilities can be found in [19].
After the best possible set of functions is identified,
the mathematical properties of the model are stored in
the form of a profile. This profile can also contain a User
ID, and potentially comments/keywords that will allow
usability of the same profile from other users to avoid
retraining the system. For example, such keywords might
be general such as “Photogrammetrist” o r “Biologist”, or
more task-specific such as “Airplane feature extraction”,
“Wetland evaluation”.
To further demonstrate the app licability of the method
a representative example is presented below for a cadas-
tre/real estate application. More specifically, this sce-
nario investigates user preference of a geospatial attrib-
ute expressing parcel value per square meter. The func-
tion is composed of two sub-functions, each one applica-
ble in half of the input space (e.g. Xq > Xdb) to compen-
sate for asymmetrical cases. A result of this trained func-
tion can be seen in Figure 3.
Figure 4 shows similarity isolines (0% to 100% at the
graph floor) of the surface from Figure 3, in essence
combinations of query and database values that would
result in the same similarity value. In addition, two spe-
cific user queries are examined through the two slices,
for parcel value per square meter (PVSM) of $500/m2 (in
orange) and $3000/m2 (in green). Examination of these
two sections leads to two remarks:
1) The left side of each of the two sections examines
the case where the returned PVSM value (Xdb) is smaller
than the query PVSM value (Xq). Here the method is able
to express the gradual decrease of user’s interest. Note in
Figure 4 how user flexibility increases as the PVSM
query value becomes larger.
2) The right side of each of the two sections examines
the case where the returned PVSM value (Xdb) is larger
than the query PVSM value (Xq). From the two sections
Figure 3. Example of a user preference function.
G. MOUNTRAKIS ET AL.341
Figure 4. Contour plot and query examples of this preference function.
it is evident that a s the query PVSM value (Xq) increases
so does the user flexibility on the obtained results. More
specifically, when users request the retrieval of database
objects with $500/m2 PVSM they are less flexible in ac-
cepting larger values than when querying for a $3000/m2
one.
6. Using Profiles in Queries
In order to demonstrate the applicability of our method,
let us consider the following scenario. The City of
Tempe had cameras installed to monitor its downtown
area. Numerous city agencies use this information for
their various needs. For example, let’s consider that im-
agery from these cameras is accessed by both the Police
and Transportation Departments. Let’s also assume that
they perform similar queries, using last year’s New
Year’s Eve imagery database to train personnel in an-
ticipation of this year’s celebrations. They are interested
in recovering an image of the downtown area at 12 mid-
night, to get a snapshot of the situation, so they form a
query to express this request. Even though they form the
same query, the execution of this query proceeds differ-
ently for these two agencies, making use of their prefer-
ences as they are expressed through corresponding pro-
files. Algorithm training is performed based on estab-
lished similarity preferences, and the corresponding
similarity profiles are shown in Figures 6 and 7 for the
Police and Transportation Departments, respectively. For
comparison we also present a generic profile in Figure 5.
By using these different profiles in the query process-
ing it is feasible to rank available imagery differently,
taking i nto acco unt thei r diffe rent ne ed s. For e xample , th e
Police profile has the following main characteristics:
The time interval 11 pm - 12 am is of prime impor-
tance, as this is the instance with the highest crowd
concen t ration a nd o vera ll a c t i vit y.
After 12 midnight interest begins dropping, as people
start leaving , but remains high u ntil 3 am.
On the other hand, the Transportation profile has some
other characteristics:
Its peak is around 12 am, when people (potentially
into xicated) start leaving t he area, posing a highe r risk
of accidents .
Early on, interest is increasing as we move from the
standard t raffic pa tterns of 9 pm to higher tr affic load s
by 10:45 pm.
Intere st drops between 10 :45 pm and 11 :15 p m, as by
that ti me peop le ha ve alre ady arr i ved, and t hus vehi cle
traffic is limited. It starts picking up again after 11:30
pm as few people may be leaving earlier.
A sample of 5 images has been ranked, to demonstrate
the effects of user preferences. This is shown in Figure 8.
For example, that imagery from 11:20 pm is ranked first
for the police department, even though it deviates from
the query request (midnight) by 40 minutes, when there
is an image with only 15 minutes away from the query
time (12:15 am). However, for the above mentioned rea-
sons the 11:20 pm is more suitable for this department’s
needs than the 12:15 am snapshot. Other rankings have
similar explanations based on the above mentioned spe-
cial preference characteristics as expressed through the
corresponding profile. It is obvious that generic profiles
could not express such diverse similarity preference pat
Copyright © 2011 SciRes. JGIS
G. MOUNTRAKIS ET AL.
342
Figure 5. Generic similarity profile.
Figure 6. Police surveillance similarity profile.
terns, limiting the effectiveness of query-based informa-
tion retrieval.
7. Conclusions
Geospatial datasets are becoming increasingly multifunc-
tional, as different users may be using the same dataset
for different applications. Accordingly, the successful
functional integration such datasets in federated geospa-
tial databases depends upon the ability to meet the needs
of expanding and diverse user communities. Therefore,
the development of efficient information retrieval meth-
ods to support the diverse and complicated preference
patterns of different users and/or applications is a crucial
task for the geoinformatics community.
In this paper we presented an approach to meet this
Time
Figure 7. Traffic monito ring similarity profile.
need through the introduction of user profiles of varying
complexity to model the requirements of different classes
of users when attempting to recover specific geoinforma-
tion. Intelligent systems can assist geospatial queries to
improve retrieval accuracy by customizing results based
on preference patterns. The profiles may vary in their
complexity, thus capturing the underlying preference in-
tricacies that differentiate user groups (e.g. the needs of a
transportation expert versus the ones of a police author-
ity).
As presented in this paper, our method emphasizes
preference modeling within specific attributes (e.g. pref-
erences in time, scale, resolution). Our future plans in-
clude the extension of this work to aggregate these indi-
vidual components into composite multidimensional user
profiles. Depending on the application range of a specific
government agency, these composite profiles may reflect
preferences of a single analyst or of a broader unit with a
specific mission and modus operandi.
While user preference profiles were introduced in this
paper as a tool to support information retrieval tasks,
they also encapsulate operational knowledge: they are
expressions of a user’s typical tasks and processes. Ac-
cordingly, we can recognize a very intriguing indirect
benefit of our approach, namely the ability to identify
similarities in user communities that may be operation-
ally different. For example, by comparing user profiles
between groups of analysts from an environmental and
an emergency response agency we may reach the con-
clusion that they have comparable preferences and tend
to perform similar tasks. This information can be used
for operational alignments across different units/agencies.
Furthermore, preference profiles may be used to priori-
tize data collection and information acquisition needs.
Types of datasets that exhibit high similarity preference
Copyright © 2011 SciRes. JGIS
G. MOUNTRAKIS ET AL. 343
Copyright © 2011 SciRes. JGIS
Figure 8. Effects of profiles on geospatial query results (images from www.tempe.gov).
R
Ra
an
nk
ki
in
ng
g
R
Re
es
su
ul
lt
ts
s
1
1.
.
2
2.
.
3
3.
.
4
4.
.
5
5.
.
9
9:
:4
45
5p
pm
m
1
1:
:3
30
0a
am
m
1
12
2:
:4
45
5a
am
m
1
12
2:
:1
15
5a
am
m
Generic Profile Police Profile Transportation Profile
1
1.
.
2
2.
.
3
3.
.
4
4.
.
5
5.
.
9
9:
:4
45
5p
pm
m
1
1:
:3
30
0a
am
m
1
12
2:
:4
45
5a
am
m
1
12
2:
:1
15
5a
am
m
1
1.
.
2
2.
.
3
3.
.
4
4.
.
5
5.
.
9
9:
:4
45
5p
pm
m
1
1:
:3
30
0a
am
m
1
12
2:
:4
45
5a
am
m
1
12
2:
:1
15
5a
am
m
1
11
1:
:2
20
0p
pm
m
1
11
1:
:2
20
0p
pm
m
1
11
1:
:2
20
0p
pm
m
G. MOUNTRAKIS ET AL.
344
across numerous profiles should be updated more fre-
quently than others with lower priority. Combined with
the above mentioned capability to identify across agen-
cies clusters of users with similar needs and preferences,
this would provide crucial support for the reconfiguration
of government resources to best address evolving needs
and emerging challenges.
8. References
[1] M. F. Goodchild, “Citizens as Sensors: The World of
Volunteered Geography,” GeoJournal, Vol. 69, No. 4,
2007, pp . 211-221. doi:10.1007/s10708-007-9111-y
[2] V. M. Megler and D. Maier, “Finding Haystacks with
Needles: Ranked Search for Data Using Geospatial and
Temporal Characteristics. Scientific and Statistical Data-
base Management,” Scientific and Statistical Database
Management, Vol. 6809, 2011, p p. 55-72.
doi:10.1007/978-3-642-22351-8_4
[3] D. Sui, “The Wikification of GIS and Its Consequences:
Or Angelina Jolie’s New Tattoo and the Future of GIS,”
Computers, Environment, and Urban Systems, Vol. 32,
No. 1, 2008, pp. 1-5.
doi:10.1016/j.compenvurbsys.2007.12.001
[4] S. Liu and A. Iacucci, “Crisis Map Mashups in a Partici-
patory Age,” American Congress on Surveying a nd Map -
ping Bulletin, 2010, pp. 10-14.
[5] D. W. Aha, D. F. Kibler and M. K. Albert, “Instance-
Based Learning Algorithms,” Machine Learning, Vol. 6,
No. 1, 1991, pp. 37-66 . doi:10.1007/BF00153759
[6] W. Cheng and E. Huellermeller, “Combining Instance-
Based Learning and Logistic Regression for Multilable
Classification,” Machine Learning, Vol. 76, No. 2-3,
2009, pp . 211-225. doi:10.1007/s10994-009-5127-5
[7] P. Cunningham, “A Taxonomy of Similarity Mechanisms
for Case-Based Reasoning,” IEEE Transactions on
Knowledge and Data Engineering, Vol. 21, No. 11, 2009,
pp. 1532 -1543. doi:10.1109/TKDE.2008.227
[8] B. Batchelor, “Pattern Recognition: Ideas in Practice,”
New York Plenum Press, New York, 1978, pp. 71-72.
[9] D. R. Wilson and T. R. Martinez, “An Integrated In-
stance-Based Learning Algorithm,” Computational Intel-
ligence, Vol. 16, No. 1, 2000, pp. 1-2 8.
doi:10.1111/0824-7935.00103
[10] G. Mountrakis, P. Agouris and A. Stefanidis, “Similarity
Learning in GIS: An Overview of Definitions, Prerequi-
sites and Challenges,” In: M. Vassilakopoulos, A. Papa-
dopoulos and Y. Manolopoulos, Eds., Spatial Databases:
Technologies, Techniques and Trends, Idea Group Inc.,
Calgary, 2004, pp. 294-321.
doi:10.4018/978-1-59140-387-6.ch013
[11] D. Gunopulos, “Data Mining Techniques for Geospatial
Applications,” National Academies White Paper, 2001 .
[12] M. Gahegan, “Intersection of Geospatial Information and
Information Technology,” National Academies White
Paper, 20 01.
[13] National Research Council, “Distributed Geolibraries:
Spatial Information Resources,” National Academy Press.
Washington, DC, 1999.
[14] A. S. Camara and J. Raper, “Spatial Multimedia and Vir-
tual Reality,” Taylor & Francis, London , 19 99.
[15] H. J. Miller and J. Han, “Geographic Data Mining and
Knowledge Discovery: An Overview,” In: H. J. Miller
and J. Han, Eds., Geographic Data Mining and Knowl-
edge Discovery, Taylor and Francis, London, 2001.
doi:10.4324/9780203468029_chapter_1
[16] W. Tobler, “Cellular Geography,” In: S. Gale and G.
Olsson, Eds., Philosophy in Geography, Reidel, Dortrecht,
1979, pp . 379-38 6.
[17] M. Yuan, B. Buttenfield, M. Gahegan and H. Miller,
“Geospatial Data Mining and Knowledge Discovery,” A
UCGIS White Paper on Emergent Research Themes,
2001.
http://www.ucgis.org/emerging/
[18] J. Lin, Y. Fang, W. Zhang and Z. Huang, “Fundamental
Aspects of Access Control for Geospatial Data,” Interna-
tional Journal of Digital Earth, Vol. 2, No. 3, 2009, pp.
275-289. doi:10.1080/17538940902818329
[19] G. Mountrakis and P. Agouris, “Learning Similarity with
Fuzzy Functions of Adaptable Complexity. 8th Interna-
tional Symposium on Spatial and Temporal Databases,”
Lecture No tes in Comput er Science, Vol. 2750, 2003, pp.
412-429. doi:10.1007/978-3-540-45072-6_24
Copyright © 2011 SciRes. JGIS