Many supervised classification algorithms have been proposed, however, they are rarely evaluated for specific application. This research examines the performance of machine learning classifiers support vector machine (SVM), neural network (NN), Random Forest (RF) against maximum classifier (MLC) (traditional supervised classifier) in forest resources and land cover categorization, based on combination of Advanced Land Observing Satellite (ALOS) Phased Array type L-band Synthetic Aperture Radar (PALSAR) and Landsat Thematic Mapper (TM) data, in Northern Tanzania. Various data categories based on Landsat TM surface reflectance, ALOS PALSAR backscattering and their derivatives were generated for various classification scenarios. Then a separate and joint processing of Landsat and ALOS PALSAR data were executed using SVM, NN, RF and ML classifiers. The overall classification accuracy (OA), kappa coefficient (KC) and F 1 score index values were computed. The result proves the robustness of SVM and RF in classification of forest resource and land cover using mere Landsat data and integration of Landsat and PALSAR (average OA = 92% and F 1 = 0.7 to 1). A two sample t-statistics was utilized to evaluate the performance of the classifiers using different data categories. SVM and RF indicate there is no significance difference at 5% significance level. SVM and RF show a significant difference when compared to NN and ML. Generally, the study suggests that parametric classifiers indicate better performance compared to parametric classifier.
Classification of satellite image is a very significant part of remote sensing image analysis, object and pattern recognition, mapping and monitoring of forest covers and natural resources. The process is commonly utilized for generation of thematic maps like forest, land cover/use maps and spatial pattern maps. Forest and land cover types classification using satellite data has been adopted extensively. Many supervised image classification algorithms have been developed and utilized for forest and land cover mapping, ranging from machine learning algorithms to traditional classifiers [
The main objective of this study therefore, is to evaluate the capability of the widely applied parametric and non-parametric supervised machine learning algorithms for forest resource and land cover mapping in tropical environment using SAR and optical datasets. Specifically to assess which classification algorithm gives better results using independent and integrated Landsat TM and ALOS POLSAR datasets for categorization of forest resource and land cover mapping.
The satellite image utilized for this study is of Bereko and Duru-Haitemba forest reserve in Babati, Tanzania. Lying between latitude 4˚15' and 4˚30' South, and between longitude 35˚35' and 35˚50' East (
Both Optical and SAR satellite images has been utilized. Landsat 5 Thematic Mapper (TM) 30 m spatial resolution of November 4th, 2009 and ALOS PALSAR L band [
for image preprocessing. In addition, a set of points based on Global Positioning System (GPS) and knowledge-based information acquired in October 2009, Normalized Difference Vegetation Index (NDVI) [
Training and validation samples for all land cover classes (i.e. water, shrubs, natural dense forest and moderate forests were) selected based on ground truth data, GPS based point locations and knowledge based information acquired on the site. The collected samples were divided into two groups, first as test sample (70% of the collected sample) and as second validation sample (30% of the collected sample).
ALOS PALSAR HH and HV polarization images were collected in slant range single look complex format. The images were transformed from slant range to ground range resolution using a multi-looking procedure of 9 × 2 (i.e., nine looks in azimuth and two looks in range) [
Several ALOS PALSAR and Landsat TM derivatives were extracted. Especially vegetation indices (VI), Principal Component Analysis (PCA), SAR quotient bands HH/HV and HV/HH [
Various input bands were prepared ready for image classification. A multi-sensor integration image fusion approach was adapted [
For classification of forest covers and land cover mapping of the independent bands and integrated ALOS PALSAR/Landsat data and their derivatives Three non-parametric and one-parametric classifiers were tested on their ability. Random Forest [
Subgroup | Datasets | Selected Input Data or Combination | |
---|---|---|---|
A | A1 | TM surface reflectance | TM bands (234) |
A2 | VI and TM GLCM texture | SLAVI, mea_b1, cor_b3, var_b4, cor_b4, con_b4 | |
A3 | SR and TM derivatives | TM bands (234), SLAVI, cor_b3, var_b4, cor_b4, con_b4 | |
B | B1 | AP bands | HH, HV |
B2 | AP derivatives | RFDI, HH/HV, HV/HH, HH-HV, cor_HH, cor_HV, mea_HH, var_HH, sec_HH, sec_HV | |
B3 | AP bands, VI and quontient bands | HH, RFDI, HH/HV, HV/HH, HH-HV | |
B4 | AP bands, AP GLCM textures | HH, cor_HH, cor_HV, mea_HH, var_HH, se_HH, sec_HV | |
B5 | AP bands and their derivatives | HH, HH/HV, HV/HH, cor_HH, cor_HV, mea_HH, var_HH, sec_HH, sec_HV | |
C | C1 | SR and AP bands | TM bands(2,3,4), HH, HV |
C2 | TM derivatives and AP Bands | SLAVI, mea_b1, cor_b3, var_b4, cor_b4, con_b4, HH, HV | |
C3 | TM derivatives and GLCM textures of AP bands | SLAVI, mea_b1, cor_b3, var_b4, cor_b4, con_b4, cor_HH, cor_HV, mea_HH, var_HH, sec_HH, sec_HV | |
C4 | TM and AP derivatives | SLAVI, cor_b3, var_b4, cor_b4, con_b4, HH/HV, HV/HH, HH-HV, cor_HH, cor_HV, mea_HH, var_HH, sec_HH, sec_HV | |
C5 | SR, AP backscattering and their derivatives | TM bands(2,3,4), HH, HV, SLAVI, cor_b3, var_b4, cor_b4, con_b4, cor_HH, cor_HV, mea_HH, var_HH, sec_HH, sec_HV |
classification algorithm that uses Gaussian distribution principle for data segmentation. The technique is robust and well-known for general classification problems. However, it may have some difficulties in classifying data coming from different sources, such as optical and SAR data. MLC is one of the extensively utilized classifier in the field.
The SVM is basically a binary class classification method based on machine learning and using support vector in the data classification. [
NN classifier has arbitrary decision boundary abilities and could adapt to various data types and input structures easily, fuzzy output values and suitable generalization for use when integrating manifold images [
RF is a machine ensemble approach based on classification and regression trees and can be used for both image classification and regression analysis [
To test the capability of parametric and non-parametric classifiers a validation dataset was used for accuracy assessment. Three terms that describe the classification accuracy were utilized (i.e. overall accuracy (OA), kappa coefficient (κ)) [
F 1 score = 2 × precision × recall precision + recall = 2 × user's accuracy × producer's accuracy user's accuracy + producer's accuracy (1)
To compare the capability of the four classifiers under study, a two-sample t-test [
The classification results attained based on different data groups (A-C) (
Class | Data Group | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A1 | A2 | A3 | B1 | B2 | B3 | B4 | B5 | C1 | C2 | C3 | C4 | C5 | ||
SVM | DF | 1.00 | 0.98 | 0.98 | 0.48 | 0.43 | 0.07 | 0.22 | 0.52 | 1.00 | 1.00 | 1.00 | 0.98 | 0.98 |
MF | 0.97 | 0.95 | 0.97 | 0.63 | 0.46 | 0.68 | 0.37 | 0.53 | 0.96 | 0.97 | 0.98 | 0.97 | 0.97 | |
SH | 0.98 | 0.96 | 0.98 | 0.61 | 0.76 | 0.58 | 0.76 | 0.69 | 0.98 | 0.97 | 0.97 | 0.96 | 0.98 | |
WA | 0.96 | 0.96 | 0.98 | 0.82 | 0.89 | 0.83 | 0.86 | 0.84 | 0.98 | 0.96 | 0.98 | 0.98 | 0.98 | |
BS | 0.98 | 0.96 | 0.98 | 0.43 | 0.48 | 0.39 | 0.53 | 0.56 | 0.94 | 0.96 | 0.96 | 0.95 | 0.98 | |
RF | DF | 1.00 | 1.00 | 1.00 | 0.38 | 0.58 | 0.52 | 0.50 | 0.60 | 1.00 | 1.00 | 0.98 | 0.98 | 0.98 |
MF | 0.97 | 0.93 | 0.96 | 0.47 | 0.53 | 0.52 | 0.53 | 0.61 | 0.96 | 0.94 | 0.94 | 0.94 | 0.97 | |
SH | 0.96 | 0.92 | 0.96 | 0.53 | 0.73 | 0.64 | 0.74 | 0.77 | 0.96 | 0.94 | 0.93 | 0.94 | 0.97 | |
WA | 0.98 | 0.96 | 0.96 | 0.82 | 0.89 | 0.84 | 0.86 | 0.91 | 0.96 | 0.96 | 0.98 | 0.98 | 0.98 | |
BS | 0.96 | 0.95 | 0.96 | 0.38 | 0.53 | 0.46 | 0.54 | 0.67 | 0.96 | 0.96 | 0.95 | 0.96 | 0.96 | |
NN | DF | 1.00 | 0.70 | 1.00 | 0.43 | 0.36 | 0.10 | 0.36 | 0.56 | 0.98 | 0.70 | 0.78 | 0.85 | 0.75 |
MF | 0.92 | 0.73 | 0.93 | 0.29 | 0.33 | 0.52 | 0.37 | 0.53 | 0.94 | 0.71 | 0.71 | 0.68 | 0.70 | |
SH | 0.92 | 0.88 | 0.95 | 0.00 | 0.84 | 0.75 | 0.84 | 0.73 | 0.98 | 0.86 | 0.88 | 0.85 | 0.89 | |
WA | 0.94 | 0.94 | 0.94 | 0.80 | 0.87 | 0.83 | 0.82 | 0.83 | 0.91 | 0.94 | 0.98 | 0.96 | 0.96 | |
BS | 0.98 | 0.86 | 0.98 | 0.36 | 0.49 | 0.49 | 0.48 | 0.49 | 0.98 | 0.86 | 0.88 | 0.88 | 0.90 | |
MLC | DF | 1.00 | 0.93 | 1.00 | 0.49 | 0.53 | 0.48 | 0.36 | 0.50 | 1.00 | 0.92 | 0.93 | 0.82 | 0.85 |
MF | 0.94 | 0.70 | 0.94 | 0.18 | 0.19 | 0.15 | 0.37 | 0.20 | 0.92 | 0.87 | 0.90 | 0.79 | 0.77 | |
SH | 0.95 | 0.84 | 0.97 | 0.72 | 0.75 | 0.70 | 0.84 | 0.70 | 0.96 | 0.90 | 0.98 | 0.93 | 0.81 | |
WA | 0.96 | 0.89 | 0.91 | 0.78 | 0.84 | 0.78 | 0.82 | 0.85 | 0.94 | 0.86 | 0.91 | 0.84 | 0.84 | |
BS | 0.95 | 0.96 | 0.95 | 0.41 | 0.63 | 0.40 | 0.48 | 0.49 | 0.98 | 0.85 | 0.96 | 0.85 | 0.84 |
For Random Forest both data group A and C provides the best classification accuracy in terms of overall accuracy (average OA = 95.7% and 96.9% respectively). Higher F1 score index values are obtained for all land cover types ranging between 0.94 and 1 (
The non-parametric classifiers (RF, SVM and NN) are assessed together with the maximum likelihood classifier (MLC) on different data subgroups.
However, MLC provides the poorest accuracy compared to the machine learning classifiers. In these groups, SVM and RF have better performance at 95% confidence interval compared to NN and MLC classifiers. For group B, SAR
backscattering and derivatives, all classifiers displayed poorer performance (Average KC = 0.50), though in most cases machine learning algorithm performed better compared to MLC (
In this research a comparison of supervised learning algorithm using independent and integrated landsat TM and ALOS PALSAR data has been carried out. The assessment of the performances of the four classifiers under study shows that both parametric and non-parametric classifiers have good performance when using Landsat TM data (
Pair | t-test value | p-value |
---|---|---|
SVM-MLC | 4.173 | 0.001 |
SVM-NN | 2.233 | 0.045 |
SVM-RF | −0.214 | 0.834 |
NN-MLC | −0.505 | 0.622 |
NN-RF | −2.979 | 0.012 |
RF-MLC | 4.391 | 0.001 |
Notes: A p-value ≤ 0.05 indicates the two samples are statistically significant different at 5% significance level. The p-value of greater than 0.05 implies that there is no significant difference between the two samples on comparison.
On the integration of SAR and Landsat data all classifiers indicate good performance, however, SVM and RF has the best performance in relation to NN and MLC at 95% confidence interval. Based on previous studies, parametric classifiers like MLC are not worthy when using multi-source remote sensing data. The superior performance of SVM and RF compared to NN could be due to the fact that SVM and RF has the potential to handle high dimensional data [
Looking on the performance of classifiers based on data category, results in category A, subgroup A1-A3, Landsat surface reflectance and its derivatives indicates that non-parametric classifiers (SVM, RF and NN) as well as MLC performs well (
The performances of all classifiers within each group are compared at 5% significance level. Comparing all classifiers using the two sample test, the results indicates that there is no statistical significant difference between SVM and RF classifiers at 5% confidence interval. Both SVM and RF classifiers indicate a significance difference when compared to NN and MLC. RF and SVM show a statistically significant different at 5% significance level when compared to MLC. NN and MLC indicates that there is no statistically significant different at 5% significance level (
The potential of parametric and non parametric classifiers has been examined based on integration of Landsat TM and ALOS PALSAR data. All classifiers under study performs well in terms of overall accuracy when using Landsat TM and derivatives, however SVM and RF are superior compared to others. For SAR data SVM, RF and NN performs well compared to MLC. On integration of Landsat and PALSAR data SVM and RF seems to be very powerful compared to NN and MLC especially when combining TM derivative, backscattering and GLCM textures. Generally, the overall results indicates the robustness SVM and RF at 5% significance level for land cover classification in tropical area. However, the process of selecting a suitable classifier for a certain task depends much on tradeoffs among classification accuracy, time consumption, and computing resources. Based on the results attained the researcher recommends that, the performance of other classification algorithms, especially object based classification should be tested in tropics and semi-arid environments. This will show their potential ability in terms of differentiating forest resource and land cover mapping. Additionally, since new classification algorithms are developed rapidly it is very essential to evaluate their performance and sensitivity in different environs using various types of remote sensing datasets and high quality samples. If a comprehensive assessment of algorithms on various kinds of environment types were carried out it would be more suitable to select an algorithm for a specific remote sensing application.
The author thanks Dr. Veraldo Liesenberg for facilitating the acquisition of ALOS PALSAR L band data. The data was acquired under Cat.1-Proposal 6242 through the European Space Agency (ESA) Third Party Mission. The Landsat TM data was downloaded from the US Geological Survey (USGS) website.
Deus, D. (2018) Assessment of Supervised Classifiers for Land Cover Categorization Based on Integration of ALOS PALSAR and Landsat Data. Advances in Remote Sensing, 7, 47-60. https://doi.org/10.4236/ars.2018.72004
The following abbreviations are used mostly in this manuscript: