Automatic road detection, in dense urban areas, is a challenging application in the remote sensing community. This is mainly because of physical and geometrical variations of road pixels, their spectral similarity to other features such as buildings, parking lots and sidewalks, and the obstruction by vehicles and trees. These problems are real obstacles in precise detection and identification of urban roads from high-resolution satellite imagery. One of the promising strategies to deal with this problem is using multi-sensors data to reduce the uncertainties of detection. In this paper, an integrated object-based analysis framework was developed for detecting and extracting various types of urban roads from high-resolution optical images and Lidar data. The proposed method is designed and implemented using a rule-oriented approach based on a masking strategy. The overall accuracy (OA) of the final road map was 89.2%, and the kappa coefficient of agreement was 0.83, which show the efficiency and performance of the method in different conditions and interclass noises. The results also demonstrate the high capability of this object-based method in simultaneous identification of a wide variety of road elements in complex urban areas using both high-resolution satellite images and Lidar data.
Due to the importance of geospatial information of road networks in urban and suburban areas, during the last decades, a large number of prominent research works have been performed on automatic road detection [
In research by Ying et al. [
For road detection, panchromatic or multi-spectral images, especially in urban areas, will yield ambiguous results, due to the additional complexities. For example, in an aerial photo or a high-resolution satellite image, both roads and buildings will appear similar, because their construction materials are usually the same. As a result, they cannot be readily distinguished [
Two data sets from a region in the city of San Francisco were used in this research. One dataset contains Quick Bird’s four bands of R, G, B, and NIR with a ground pixel size of 2.4 m (
The major steps of the proposed method are demonstrated in
To prepare Lidar data, three preprocessing operations have been applied as follows:
1) Filtering: this step is mainly designed and applied to eliminate the noise from the raw data, and to enhance the quality of the Lidar data. Octree filter was used for noise reduction, which also minimizes the required storage space for these data. By this filtering, a cube is initially fitted to an overall 3D space. This cube is, then, divided into eight smaller cubes until a predefined threshold is
reached. Afterward, all points in the smallest cube are removed and replaced by a new point in the centroid of these previous points [
2) Triangulation: After filtering the Lidar points cloud; these data are entered into a triangulation process. Lidar data can be triangulated into two forms of [x, y, z] or [x, y, intensity], using Delaunay method [
3) Interpolation: In this process, an interpolation technique, such as bilinear interpolation, was applied to the elevation and the intensity data to produce raster images containing this information.
Optical imaging systems have both high resolution and multispectral capabilities for various earth observation applications. To benefice from these advantages for the objective of road detection, data fusion can be an efficient way to produce the data with both high spatial and multispectral characteristics. In this paper, Gram-Schmidt spectral sharpening was used [
Segmentation is the process of dividing the image into several homogeneous regions which consist of similar pixels. For segmentation, the values of scale, the weight of spectral heterogeneity ( w color ), shape ( w shape ), smoothness ( w smooth ), compactness ( w compact ) and the weight of the spectral bands should be determined. Moreover, other parameters of spectral heterogeneity ( Δ h color
f = w color ⋅ Δ h color + w shape ⋅ Δ h shape (1)
w color + w shape = 1 (2)
w compact + w smooth = 1 (3)
Δ h shape = w compact ⋅ Δ h c o m p a c t + w smooth ⋅ Δ h smooth (4)
Δ h color = ∑ c w c ( n merge ⋅ σ c , merge − ( n obj1 ⋅ σ c , obj 1 + n obj2 ⋅ σ obj2 ) ) (5)
Δ h smooth = n merge ⋅ l merge b merge − ( n obj1 ⋅ l obj1 b obj1 + n obj2 ⋅ l obj2 b obj2 ) (6)
Δ h compact = n merge ⋅ l merge n merge − ( n o b j 1 ⋅ l obj1 n obj1 + n obj2 ⋅ l obj2 n obj2 ) (7)
Urban areas contain objects with a large variety of spectral heterogeneity in each class. Accordingly, the weight of spectral heterogeneity should be lower. Otherwise, the objects will have the unbalanced geometrical shapes, and most of the shape and geometric properties will be lost. To choose the weights for smoothness and compactness parameters, allocating higher weights to compactness leads to compacting objects such as roads, buildings, etc. Also, it will be challenging to distinguish linear objects from nonlinear ones. Choosing a higher value for the weight parameter for shape heterogeneity, as well as, high values for the smoothness parameter give the best results.
The best strategy for road identification is based on a step-by-step model, similar to a decision tree procedure [
The presented model includes four nodes. In the first node, the image is classified into two classes: vegetation and non-vegetation. The remaining process steps are performed on non-vegetation classes. In the second node, the
non-vegetation area is divided into two classes: high and low regions. Geographic objects such as buildings, bridges, and interchange ramps belong to the high-region class, while sidewalks, parking lots, roads, and other open spaces are considered as a grounded and low-region class. In the third node, the low regions are again classified into two classes: low roads and open spaces. In the fourth node, high regions are divided into buildings and high roads. As a final step, the two road classes are merged to create a final road network.
As was mentioned, the designed hierarchical model has the four nodes. For each node, based on the targeted objects, specific features can be extracted from the input data and can be employed for separating them from the rest of objects. Feature extractions at each node are described as following:
First node: the first node aims to divide the study area into vegetation and non-vegetation classes. Due to the specific characteristics of vegetation in the red and near-infrared spectral regions of the electromagnetic spectrum, the proper way to identify this class is through the use of vegetation indexes such as the normalized difference vegetation index (NDVI) [
N D V I = N I R − R e d N I R + R e d (8)
RVI = NIR Red (9)
Second node: the second node intentions to identify and separate high region from low regions. Features used in this mode are the Mean Difference to Neighbor, the Mean difference to Darker Neighbor contains features f1, f2 and f3 respectively, as are presented below.
f 1 = Mean Diff to neighbor = ∑ i = 1 n ( D S M o − D S M i ) n (10)
f 2 = Mean Diff to Darker neighbor = ∑ i = 1 m ( D S M o − D S M i ) m (11)
f 3 = f 1 × f 2 for i that D S M o is greater than D S M i (12)
According to the wide variety of sizes and spectral properties of buildings in the study area, it is obvious that they cannot be placed in a single level of segmentation as one object. Therefore, in such cases, the lowest scale level has to be chosen for these elements. According to the wide variety of size and spectral properties of buildings in the study area, it is obvious that they cannot be placed in a single level of segmentation as one object. Therefore in such cases, the lowest scale level has to be chosen for these elements. Based on these features, some buildings, because of their large size are segmented as the multi-object. To resolve this problem, a post-processing strategy based on a conceptual feature (f4) has been applied. To do so, first, these two classes are classified. Then, for the objects of low region objects class, the perimeter percentage of each adjacent object with objects classified in high regions class is calculated using Equation (13). The result will have the values between 0 and 1. This feature, in fact, searches the objects in low regions class that are surrounded by the class of the high region.
f 4 = Low Region Objects Rel .Border to High Region Objects = ∑ L i Object Perimeter L i = Common Border to high Region Object i (13)
In second feature (f5), in order to improve the results of each object in low regions class, the number of adjacent objects, classified as high regions, is divided by the whole number of adjacent objects and considered as a feature. The values of this feature are also scaled between 0 and 1 (Equation (14)). This feature, indeed, is looking for the objects in low regions class, which are statistically surrounded by high regions objects.
f 5 = m n n: number of neighbors for object;
m: number of High Region Neighbor (14)
Third node: this node is for separating the class of the low region into two sub-classes of low roads and open space. The open space class includes sidewalks, spaces beside the high way, parking lots, and occasional soil coverage. Features used in this node include the average of return wave intensity, contour layer and slope (Equation (15) to (17)):
f 6 = ∑ i = 1 n I n t i n n: number of Pixels for each objects (15)
f 7 = ∑ i = 1 n C o u n t i n n: number of Pixels for each objects (16 )
f 8 = ∑ i = 1 n S l o p e i n n: number of Pixels for each objects (17)
Since there is always some spaces like sidewalks between roads and buildings, the following feature was applied to use its property.
f 9 = Rel .Border to Low Region Objects = ∑ L i Object Perimeter L i = Common Border to low Region Object i (18)
Next feature, i.e. the distance to the nearest low regions, aims to classify and identify the previously classified objects based on f6, f7 and f8. Here, the ratio of distance between considered object centroid from length in low region, and the length of that object has been considered:
f 10 = DistancetoTheNearestlowRegionObject Length of Object (19)
Fourth node: Similar to nodes two and three, in the fourth node, we want to identify and separate the objects of the highroads class from other objects in the high regions class. According to linear nature of objects in the roads class, following features, which determine the linear properties of objects, are used as well.
f 11 = Length Width (20)
The asymmetry feature describes an object in comparison to a regular polygon. An ellipse encircles the object in a way that this feature becomes the ratio of minor radius to significant radius of the ellipse (Equation (21)).
f 12 = 1 − n m (21)
The closer the objects shape to a regular polygon, the closer this value to zero. The next feature is the ratio of length to width for the main line of the object (Equation (22)).
f 13 = Length Width for the main Line (22)
The output of this step will be primary roads network.
Due to the complexity of urban areas and high objects’ density, it is necessary to perform a post-processing method, to improve initial results. To this end, several statistical, spatial and conceptual features, based on initial classification results, are used all the nodes in the hieratical model, except for the first node, as follow:
Second node: In this node, for the reduction of high objects surrounded by objects with the same height, inappropriate segmentation results have to be improved before classification using features (f1 - f5). Here, the objects, whose the absolute value of the difference between their DSM value and an average of this feature in neighbors is less than 30 centimeters, are merged with the merge region algorithm. After this relative improvement, Equation (23) is used to make a feature with more appropriate discrimination possibility.
f 14 = f 1 | Mean DSM − Mean DSM of Neighbors Objects | (23)
Third node: here again, a composite feature, namely compactness, is used. This feature displays geometrical compactness of an object in some ways. This feature is produced by dividing the product of length (m) and width (n) of an object by object’s pixel number.
Compactness = m × n N (24)
Compactness is equal to 1 when the object has a rectangular shape. This criterion feature has been designed by composing the compactness feature and the average distance of the object from the high regions class objects as follows:
f 15 = Compactness MeanofDistancetoHighRegionObjects (25)
Low road objects have lower compactness value due to their linear form. Moreover, logically among the low region class elements, they have more distance compared with high region elements. Accordingly, low road objects have a lower value of feature 15. It is obvious that the improvement of results in the second and third node, due to decreasing noise between the high region and low region, improves the result of node four.
As it was mentioned, before image segmentation, the data, including satellite images and Lidar data, should be preprocessed. The preprocessing stage includes initial property detection from the TIN surface of the Lidar data, its transfer to raster space, and spatial enhancement of Quick Bird multispectral.
Octree filter was used for filtering of Lidar data. To do so and based on the spatial resolution or ground distance sampling (GSD) of the Lidar data, which is 50 centimeters, the noise removal is performed with a 1-meter threshold. After filtering, Delaunay method is used for triangulation of the height information and generation of a digital surface model (DSM), as well as the contours lines. Also, the triangulation of return wave intensity produces an interpolated raster intensity image. Finally, using bilinear interpolation, raster images are produced with a resolution of 60 cm from Lidar data sets. This value is selected to obtain raster elevation data with the same size and resolution of the satellite imagery. In parallel, the Gram-Schmidt method was used to produce the pan-sharpened image from the original satellite data. The output of this process is an image with four spectral bands and a spatial resolution of 60 cm.
E cognition is used in the segmentation step as an object-based image analysis tool, which uses Fractal Net Evolution Approach (FNEA) for segmentation. FNEA is a region growing technique based on local criteria and starts with one-pixel image objects [
As was mentioned, SFS method, which is based on the normalized Euclidian distance discrimination possibility criterion, is used for the design of the hierarchical model.
As the table shows, the vegetation class has the highest discrimination possibility. Then, buildings and open spaces are second and the third classes to be separated first, respectively. According to these results, a model based on a masking strategy is designed to identify the roads. The hierarchical model indeed designs a rule-based system to categorize classes in an order that minimizes the noise and maximizes the discrimination possibility. The model arranges classes based on the ease detection of classes.
In this section, according to the hierarchical model, results of each node are presented. Features used in different classes are shown in
According to the proposed model, the first node aims to divide the study area into vegetation and non-vegetation classes. Due to
Class Filter | Spectral layers weight | Shape | Compactness | Scale | |||||
---|---|---|---|---|---|---|---|---|---|
DSM | Intensity | Red | Green | Blue | NIR | ||||
-- | 0.5 | 0.5 | 1 | 1 | 1 | 1 | 0.9 | 0.1 | 20 |
Non-Vegetation | 0.5 | 0.5 | 1 | 1 | 1 | 1 | 0.9 | 0.1 | 40 |
Low Region | 0.5 | 0.5 | 1 | 1 | 1 | 1 | 0.9 | 0.1 | 60 |
High Region | 0.5 | 0.5 | 1 | 1 | 1 | 1 | 0.9 | 0.1 | 70 |
Class | Vegetation | Building | Open Space |
---|---|---|---|
Discrimination | 9.435 | 8.315 | 7.856 |
f15 | f14 | f13 | f12 | f11 | f10 | f9 | f8 | f7 | f6 | f5 | f4 | f3 | f2 | f1 | RVI | NDVI | level | Class/feature |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1.2< | 0.12< | Level 20 | Vegetation | |||||||||||||||
0< | 0.6< | 0.6< | 0< | 0< | 0< | Level 40 | High Region | |||||||||||
0> | 0.6> | 0.6> | 0> | 0> | 0> | Low Region | ||||||||||||
0.178< | 0.3> | 0.6< | 140> | 80> | 95> | Level 60 | Low Road | |||||||||||
6.8< | 0.96< | 6< | Level 70 | High Road |
In the second node, high regions are separated from the low regions. Therefore, segmentation was performed only on the non-vegetation class (
In the third node, the class of the low region is classified into two sub-classes of the low road and open space classes. Segmentation was, then, performed on the low regions objects (
In the fourth node, we aimed to identify and separate the objects of the high roads class from other objects in the class of the high region. Thus, segmentation was performed on high regions objects (
The result of merging the high roads and low roads, before (
According to the discussions in the object-based post-processing season,
The final results, as well as, merging of the high road and low roads are illustrated in
As it can be seen in
For analytical accuracy assessment, several test samples for each class from original data sets are visually extracted and used to calculate the confusion matrix.
The overall accuracy for identification of different classes is 89.2%, and the kappa value is 0.83. A detailed description and interpretation of each class will be provided subsequently. In the vegetation class, the accuracy was 99% and 79% for the producer and user accuracies respectively. This indicates the potential of well-known spectral indexes, such as NDVI and RV, using object-based classification for extracting vegetation land covers.
User Class/Sample | Vegetation | High Road | Building | Low Road | Open-space | Sum |
---|---|---|---|---|---|---|
Vegetation | 142 | 2 | 26 | 2 | 8 | 180 |
High Road | 0 | 260 | 0 | 0 | 0 | 260 |
Building | 0 | 12 | 1009 | 3 | 6 | 1030 |
Low Road | 0 | 3 | 20 | 352 | 4 | 379 |
Open-space | 1 | 8 | 96 | 23 | 31 | 159 |
unclassified | 0 | 2 | 0 | 0 | 2 | 4 |
Sum | 143 | 287 | 1151 | 380 | 51 | |
Producer | 0.99 | 0.91 | 0.88 | 0.93 | 0.61 | |
User | 0.79 | 1.00 | 0.98 | 0.93 | 0.20 | |
Totals | ||||||
Overall Accuracy | 89.2% | |||||
KIA | 0.83 |
In the case of the high road class, the accuracy was 91% and 100% for the producer and user, respectively. The primary problem in this class is related to the producer accuracy and noise in some samples in the open space and building classes. Because of high road density in some parts of the image, an identification problem occurs at objects' boundaries in this class. On some edges, the error of merging of segmentation results led to the production of roads with edge regions including sidewalks and buildings. Although there are a few misclassifications between buildings with open space, vegetation, and low roads, the accuracy was 88% and 98% for the producer and user accuracies respectively. In the case of the open space class, the producer and user accuracies were 61% and 20% for respectively. Open space class is typically spectrally similar to roads and morphologically similar to building roofs. Therefore, the separation of this land cover class from two other classes is quite problematic. As well as, there are shadows caused by buildings or trees in some areas, has led to some open spaces are wrongly classified in the vegetation. In the low roads class, the producer and user accuracies are 93% and 93% respectively. The reason could be the performance of post-processing step. The appropriate separation of high and low regions classes has greatly facilitated low roads’ identification, as well as the buildings surrounded by low roads. The noise with high roads can be considered as leading noise of this class of objects. Also, roads become darker in some areas due to the shadows of adjacent trees or buildings. Because the nature of these classes, from textural spectral and geometrical point of views, they are similar. This causes so that those classes mistakenly are classified in the open space. Also, the incorrect merging of some other objects near to buildings, in the segmentation step, which affected slope values may be the reason if this miss-classification.
In this paper, an improved framework based on hieratical classification model was proposed to identify, extract and map the roads networks in a dense urban area. The proposed method was applied to multisource remote sensing data including high-resolution satellite imagery and Lidar points cloud. One of the essential issues in the study was the introduction of some post-processing operations for improving the results. Another issue was the design of a step by step hierarchical method based on analysis and optimization of feature space using discrimination possibility analysis of optimization method.
To obtain more reliable results, for each node, based on the targeted objects, specific features can be extracted from the input data and can be employed for separating them from the rest of objects. Here a question may arise as to why post-processing step is not considered as one of leading steps in this method. The reason can be as the following. First, the foundation of result improvement is the use of objects’ initial classification results and statistical, conceptual and even spatial analyses. Secondly, classification by proposed hierarchical model is entirely dependent on other classes. Therefore, it is impossible to realize an exact and step by step class separation; accordingly a post-processing operation on the results seems to be necessary. Although, the spectral and spatial complexity of urban scenes shows the great potential of combining Lidar data and high-resolution imagery and object-based image analysis for road detection. However, Lidar data may not be available for many urban areas. Utilizing more available ancillary data such as digital surface models extracted from stereo imagery is especially desirable and will be the focus of our future research.
The authors would like to thank the GSRS Data Fusion Contest committee and Digital Globe for providing and making the QuickBird and Lidar data sets from San Francisco area publicly available.
Milan, A. (2018) An Integrated Framework for Road Detection in Dense Urban Area from High-Resolution Satellite Imagery and Lidar Data. Journal of Geographic Information System, 10, 175-192. https://doi.org/10.4236/jgis.2018.102009