In integrating geo-spatial datasets, sometimes layers are unable to perfectly overlay each other. In most cases, the cause of misalignment is the cartographic variation of objects forming features in the datasets. Either this could be due to actual changes on ground, collection, or storage approaches used leading to overlapping or openings between features. In this paper, we present an alignment method that uses adjustment algorithms to update the geometry of features within a dataset or complementary adjacent datasets so that they can align to achieve perfect integration. The method identifies every unique spatial instance in datasets and their spatial points that define all their geometry; the differences are compared and used to compute the alignment parameters. This provides a uniform geo-spatial features’ alignment taking into consideration changes in the different datasets being integrated without affecting the topology and attributes.
The appreciation of geo-spatial information by the different information system managers and users as the basis for location based decision-making has led to the need to develop approaches for integrating geo-spatial datasets as the driving force towards the vision of common data storage to increase availability and accessibility of already captured geographic information through exchange and sharing. Most geo-spatial information systems use map layers to organize geographical and geo-spatial objects forming features in datasets. Each layer describes a certain aspect of the modeled real world e.g. roads, buildings, forest, etc. This provides a natural technique to organize and visualize data from different sources making it an efficient way of data storage, manipulation, and analysis [
Features especially on earth’s surface and land uses are constantly changing, so is the need to continuously update, adjust, and align the objects forming features on different layers in geodatabases. These layers are always updated separately by individuals or organizations concerned with certain aspects and locations. However, storing map layers separately makes it difficult to directly solve topological queries that relate to features that belong to many and different layers [
To overcome that, object-based geometry adjustment algorithms [
Geo-spatial datasets are captured using different methods, instruments, reference systems and geodetic datum that make datasets to vary. Thus, different operations and many algorithms exist for carrying out clipping and finding intersection between two datasets. Some focus on merging similar geometric objects [
For the approaches that handle combined primitives, we find a huge number of research efforts in this domain: 1) approaches dealing with geo-spatial object matching, and 2) methods for geo-spatial object intersection or database updating. Different schemes have been proposed: a) layer’s overlying approaches that use algorithms for different geo-spatial data merging tasks, and b) more specific ideas, using individual geo-spatial object adjustment strategies, to achieve full integrated solution. The latter is the focus of this work dealing with geometry alignment basing on current approaches like Boolean polygon matching [
There are two common spatial data models for geo-spatial data storage—raster and vector. In the alignment method, we focus on vector, which is the use of directional lines to represent a geographic feature. Several different vector data models exist, however only two are commonly used in GIS data storage: computer-aided drafting (CAD) and topologic data structure. The focus is on topologic since it maintains spatial relationships among features. Three files types are considered: 1) shapefile as it has been around since the 1980s and remains one of the most common data transfer formats, 2) GML is text and is human readable, and 3) Spatialite as it uses one file and is able to store geometries and query them with spatial functions similar to what is found in geodatabase like PostGIS.
As we developed method, we link topological data structures to the requirement of geo-spatial framework:
1) The ideal method for improving data usability should be based on object-oriented data model [
2) Geo-spatial data from different sources with varying scales must be able to be mapped to the same standards, data model, projection, and representation [
3) The relationships among objects should be modeled and integration of different geo-spatial datasets must handle the three dimension 1) horizontal (adjacency), 2) vertical (overlay), and 3) temporal (time) integration [
The alignment method uses the three characteristics where objects forming features are used as the modeling unit basing on the primary spatial primitives (point, polylines, and polygons) to accomplish the geometrical inconsistency correction through updating and adjustment of objects so that they align in the three dimensions (horizontal, vertical, and temporal) of spatial data. This is done by putting into consideration the desirable characteristics of information systems—being true, up-to-date, standard, flexible, concise, in desired form and sufficient to the needs of users and their need to share [
Geo-spatial data alignment approaches are categorized into global and local/individual. Global methods assume that all features on layer can be aligned using the same parameters. Methods like “automatic image-map alignment problem using a similarity measure named edge-based code mutual information” [
For vector GIS data alignment, there are certain requirements that have to be fulfilled
• maintain the meaning of the shapes—this calls for keeping the attributes;
• maintain the relationship between objects—this is handled under topology;
• separate data into layers for easy modeling and analysis—need to keep objects on layers during alignment.
We added the following to guide in development of alignment method.
• it should be possible to handle individual objects on a layer or parts of objects;
• able to handle points, lines, and polygons or any combination of them.
There are requirements and conditions that must be fulfilled and should exist in datasets during and after alignment method, these are categorized into four: datasets merging requirement, geo-spatial complementary alignment, datasets transformation requirement, and aligned datasets characteristics.
During integration, there are certain dataset merging requirements that need to be satisfied to have a proper GIS vector dataset or layer.
• There should be no slivers (small-unwanted objects) that result from objects intersecting during merging of datasets. If slivers do appear, they should be adjusted during alignment instead of using clean or removal algorithms to eliminate them;
• There should be no danglings (meaningless points and lines) in the final aligned dataset. Nodes should only exist at the end of lines or at intersection of lines. Vertices should only be along a line where there is change of direction;
• Merging should take place on datasets in the same projection, scale, and datum;
• Merging is should be based on objects forming features and primary attribute used for identification.
We introduced the term “geo-spatial complementary alignment” to define three different situations that can exit during geometry adjustment and alignment between Adjustment Dataset (AD) and Reference Dataset (RD). Where AD is dataset that need updating or has objects that need to be adjusted while RD is the dataset used as reference during computing of adjustment and alignment parameters:
1) Single Forward Alignment (SFA): where RD has all the details required to update AD and only comparison with AD is needed to compute alignment parameters. Let us take an example of two datasets—AD having residential plots and RD with both residential plots and houses on those plots. If RD has all recent information on houses needed for AD, then the updates for SFA will be applied where values from RD are transferred to AD and nothing is brought backwards to RD.
2) Single Complementary Alignment (SCA): RD does not have all the required details to update and adjust objects in AD. This means there is need to get some details from AD to supplement on those coming from RD before final alignment of AD can be achieved even when the dataset of interest is AD. For example RD has recent information on houses and AD has proper demarcation of land plots for the houses. The two are needed in order to get updated dataset having plots with properly aligned and corresponding houses.
3) Two-way Complementary Alignment (TCA): Neither RD nor AD have all the required details to stand alone, but both need to be updated. This means that there is need to get details from the two to update each for both to be useful. For example AD has recent information on houses and RD has proper demarcation of land plots for the houses, but we need both datasets to be updated. Another example is where RD and AD are adjacent, but the two need to be updated to obtain a perfect boundary without creating openings and overlaps.
There are dataset transformation requirements that the resulting datasets should satisfy including:
• Objects and features can maintain their meaning (primary attributes);
• Transformation can change the meaning of object if it is needed;
• Transfer of primary attribute can be between two different datasets;
• The relationship between objects should be kept;
• Transformation can move datasets from one projection to another;
• Transformation can change geometry primitive type if needed;
• Coordinates of objects can be changed during transformation;
• Coordinate systems can be changed during transformation;
• Transformations can move datasets from one datum to another;
• Transformations can change the number of objects in layer or dataset through addition or deletion.
The characteristics of aligned datasets include the following:
• shapes having meaning (primary attribute);
• relationship between object maintained;
• data able to be separated into layers for easy modeling and analysis;
• able to identify individual objects or parts of object on a layer;
• no slivers and danglings;
• final projection being that of the reference dataset.
The object-based geometry adjustment algorithms [
The first task is reading the geo-spatial datasets involved in the alignment as layers—reference dataset (RD) that will provide alignment values and adjustment dataset (AD) that has objects to be aligned. The user specifies the AD and RD, also identifies the primary attribute (PA) which is the meaning of features. As the data is read in, the data structures are compared to check for geometry type of objects (points, lines, polygons) before storing them in matrices according to their geometry type. This is vital to ensure that same geometry type are used when comparing, updating, adjusting, and aligning datasets.
Data preparation involves putting the data into the same projection, cleaning and removing unnecessary geome-
Alignment method components and flowchart
tries, matching the corresponding object in the datasets, difference determination, and making sure that geometries being worked on are the same—point with point, line with line, and polygon with polygon. This is achieved by algorithms based either on geometric, topological, or semantic matching [
If the difference obtained is zero, that means the layers are the same and objects do match. If the result is not zero, it could be positive or negative, that means the two datasets have variations. The positive or negative values indicate the direction, to either subtract or add during the geometrical adjustment. It helps also in identifying the objects that are causing the differences. For positive, it means the first dataset have bigger or more geometry components and vice versa for negative. Implying that for the positive, either geometry objects have to be reduced in the first dataset or more objects have to be added in the second dataset. For negative it means they are more or bigger objects in the second dataset thus the need to add in the adjustment dataset if requirement is only to update or to reduce in the second (reference dataset) in case of complementary.
To apply the above, the method takes advantage of the way geo-spatial datasets are organized according to meaning (themes). Each theme is stored independently in layers like road layer having types of roads (like highways, avenues), building layer having different buildings (like houses, plazas, gatehouse, arcades, etc.). This provides a natural way and technique to organize and visualize data from different sources making it an efficient way of data storage, manipulation, and analysis [
During the processing, the requirements vary from one dataset to another. That is why each algorithm is able to run and be called upon to act independently depending on the complementary situations and adjacent requirements.
The method translates and decodes the geometry shape into text by reading and creating the data structure followed by storing the text in a matrix. In MATLAB, the function “shaperead” handles the reading of layers and it populates matrixes, for example S = shaperead (“nakawa.shp”)—reads the nakawa layer and keeps the matrix in variable S. The alignment method creates a data structure if needed for example using “struct” function in MATLAB S = struct (“Geometry”, “Line”, “Bounding Box”, [0 0; 3 3], “X”, [1 2 2 1 1], “Y”, [1 1 2 2 1]). To reference and work on a particular spatial point in a matrix, the approach of specifying its row and column number is used, where in the matrix variable S, specify the row then column: S (row, column). The number of objects inside the structure is computed and is used to determine the number of iterations to perform on the structure during the alignment process. Since the x-y coordinates for points, vertices, and nodes of all objects forming features on a layer are read, geometry adjustment takes place by changing the x-y values that are handled in their respective indexes as variables to ensure the objects and shape can be reconstructed.
The method takes advantage of the geometries editing in a text form and the following capabilities are provided:
1) Creation of points, lines, and polygons;
2) Moving points, lines, and polygons;
3) Deleting points, lines, and polygons;
4) Inserting, moving, and deleting vertices;
5) Combining and exploding of geometries to and from datasets.
The alignment method runs in such a way that it loads the various adjustment algorithms in different combinations to provide the needed geometry alignment. Different functions in existing algorithms were extracted and grouped into the following sub-algorithms that form the alignment method:
1) Reading the datasets and identifying geometry type;
2) Carrying out dataset to dataset referencing and deciding on type of adjustment;
3) Making the number of objects the same in the two datasets;
4) Updating, adjusting, and aligning the geometries using coordinate values;
5) Writing the aligned dataset onto disk.
To put the method into action, we used Nakawa shapefiles having many but varying objects. The data sources were KCCA (Kampala City Council Authority) and UBOS (Uganda Bureau of Statistics) representing Nakawa division of Kampala City in Uganda as shown on
From the above figure, we are able to see the size and shape of objects but not their structure details (geometry type, location, number of objects, and attributes). KCCA dataset is used as RD (on left) has smaller sub-divi- sions called parishes and the one from UBOS used as AD (right) represents Nakawa as one solid object. Using MATLAB (or any data structure details viewer), we extracted the data structure (see
In the tables above, we have the file name characters (Filename), types of geometry objects inside (ShapeType), the location extend of the dataset (Bounding Box), number of objects inside (NumFeatures), the number of attributes associated with each structure (Attributes). The dataset from KCCA (
From table, the two datasets are located between same latitudes as per their Easting (x-coordinates) readings, but in different longitudes (locations) along the northing (y-coordinates) readings. Putting that on the x-y axes, we get
Further analysis shows that two datasets occupy the same size of area of 9057 by 13755 meters on earth’s surface, although they are at different locations because of varying y-values, they have same x-values. This is common for datasets that have been capture and stored using different systems. That means for the two datasets to lie in the same location and to have the same size of boundary box for the two datasets, method carried out dataset to dataset referencing, that we termed “Dataset Referencing”. This is achieved by translating AD through 9,800,000 meters (difference between the y-values as computed in
The next step was to deal with the number objects in the datasets, from
Nakawa from KCCA (left) and UBOS (right)
Location of bounding boxes for Nakawa
. Data structure details of Nakawa data from KCCA
Field | Value | Min | Max |
---|---|---|---|
Filename | <3 × 57 char> | ||
ShapeType | Polygon | ||
BoundingBox | [4.5388e+05, 10031395; 4.6294e+05, 10045150] | 4.5388e+05 | 10045150 |
NumFeatures | 23 | 23 | 23 |
Attributes | <18 × 1 struct> |
. Data structure details of Nakawa data from UBOS
Field | Value | Min | Max |
---|---|---|---|
Filename | <3 × 58 char> | ||
ShapeType | Polygon | ||
BoundingBox | [4.5388e+05, 231395; 4.6294e+05, 245150] | 231395 | 4.6294e+05 |
NumFeatures | 1 | 1 | 1 |
Attributes | <8 × 1 struct> |
. Nakawa geometry bounding box
Lower left corner | Upper right corner | |||
---|---|---|---|---|
X | Y | X | Y | |
Dataset from KCCA | 453879.312 | 10031395 | 462936.3437 | 10045150 |
Dataset from UBOS | 453879.312 | 231395 | 462936.3437 | 245150 |
Difference in Coordinates | 0.0 | 9800000 | 0.0 | 9800000 |
This is done by handling one object in RD at a time and the process involves reading object’s details that are attached to points (in this case the vertices along the segments that make up the edges of polygons and transferring them into AD. This is done by inserting the x-y values and corresponding attributes into matrix having the AD data structure. The process continues until all objects in RD and their attributes are read and transferred. This makes AD to have the same numbers of objects as RD and its attribute matrix will increase as per number of copied objects. This alignment process accomplishes the updating and adjustment of AD.
The final x-y values obtained after updating and adjustment actions are used to replace the x-y coordinates of objects and fed back into the matrix of the data structure. It should be noted that, it is only x-y values of all the components of data structure that are replaced in the matrix. This helps to maintain the attribute and topology/relationship between the objects in the dataset.
The process continues for each vertex and for each object in the dataset being aligned to get a list of values as:
List of x values (Xuv1, Xuv2, Xuv3, Xuv4… Xuvn)
List of y values (Yuv1, Yuv2, Yuv3, Yuv4… Yuvn)
After changing the x-y values, the method compares the attributes/meaning by promoting the user to identify the primary attribute for each object. If the user decides to add more attributes, then she/he indicates so. The method proceeds by reading the attributes from the matrix and appends them to the attributes in the data structure of the target dataset.
The aligned objects are transformed from the matrix format into the vector layer and written to the disk. For the case of shapefiles, three files for each shapefile are created with the same base name but varying file extensions. The extensions are .dbf (attribute format—columnar attributes for each shape, in dBase IV format), .shp (shape format—stores the geometry of the objects), and .shx (shape index format—a positional index of the object geometry to allow seeking forwards and backwards quickly). For example a shapefile of districts will have districts.dbf, districts.shp, and districts.shx files.
After alignment process, point removal algorithm or bend simplification algorithm maybe applied in case there is need to reduce on the number of points or storage or achieve line generalization [
Testing was carried out using different datasets and conditions as described under requirement of the alignment method, where by adjustment Dataset (AD) was updated and objects adjusted so that they align using correspond-ing reference dataset (RD) values that fed into the method as demonstrated in Section 4.3 using Nakawa dataset.
We have shown that geo-spatial data integration can be effectively carried out by incorporating geometry alignment to update and adjust one dataset with changes from another dataset or a known source. This can be easily done by using spatial geometry objects that are manipulated to define all geo-spatial data elements. With this, we obtain a uniform alignment that avoids slivers and danglings that are always created during data merging due to overlaps, openings, and overshoots among geometries of features on layers as a result of variations in data capture, storage, and manipulation approaches. This supplements effective integration of data from a variety of sources that contributes to increased understanding and informed decision-making about actions taking place on earth through answering complex questions in geo-spatial information systems. The method was tested on actual geo-spatial datasets and an analysis of resultant datasets met requirements of topological vector GIS datasets.
To facilitate day-to-day use by GIS practitioners, we plan to convert the method into an application or extension using python that can be used as a plugin in QGIS. This will put all the functionalities into a QGIS menu like “Geometry Alignment” and different functions accessed by QGIS users by clicking on its submenus. The functionalities will come from the different independent algorithms that make up the method. QGIS plugin will have the help that links and explains all functionalities and a well-documented user guide detailing on how to implement all tasks on actual data.
*Corresponding author.