Object classification in high-density 3D point clouds with applications in precision farming is a very challenging area due to high intra-class variances and high degrees of occlusions and overlaps due to self-similarities and densely packed plant organs, especially in ripe growing stages. Due to these application specific challenges, this contribution gives an experimental evaluation of the performance of local shape descriptors (namely Point-Feature Histogram (PFH), Fast-Point-Feature Histogram (FPFH), Signature of Histograms of Orientations (SHOT), Rotational Projection Statistics (RoPS) and Spin Images) in the classification of 3D points into different types of plant organs. We achieve very good results on four representative scans of a leave, a grape bunch, a grape branch and a flower of between 94 and 99% accuracy in the case of supervised classification with an SVM and between 88 and 96% accuracy using a k-means clustering approach. Additionally, different distance measures and the influence of the number of cluster centres are examined.
The automatic analysis of 3D point clouds generated from plant data is an important step on the way to automatic phenotyping, where phenotypes refer to the observable attributes of a plant. Manual phenotyping is widely recognized to be labour-intensive and highly time-consuming, also known as the “phenotyping bottleneck” [
3D scanners allow the generation of 3D data from plants in a non-invasive way, simply by moving the scanner around the object. Today, there are relatively cheap scanners providing a sufficient resolution to scan even fine stalks and being independent of the illumination, giving rise to the possibility to scan directly in the field.
In this work, we concentrate on the classification of points into different plant organs, like stalks, leaves or berries. Based on this, it is possible to estimate yield or reconstruct the plant organs for the final phenotyping [
There are several studies examining the performance of descriptors in different contexts. In urban environments, 3D descriptors like the Signature of Histograms of Orientations (SHOT) were found to deliver the best results [
1) Descriptors have to be able to deal with a high intra-class variance, i.e. different classes of plant organs need to be distinguished, but not instances of the same plant organ. E.g. a grape bunch usually includes berries of different sizes, all of which have to be assigned to the same class.
2) Plants have such fine structures that it is usually not possible to obtain a perfect scan. Therefore, descriptors have to be robust to noise and holes in the data.
3) While in other applications the objects can be expected to be well-separated from each other, a plant consists of several, smoothly connected components. The descriptor must be able to deal with regions with neighbouring points from different plant organs.
In precision farming, different illumination conditions have to be expected, depending on whether the scan was taken inside or outside, day or night and under what weather conditions. This changes the colours of the plant organs. In the case of grape bunches, depending on the cultivar, the berries often have a very similar colour as the leaves or, in early development stages, even the stem skeleton. Additionally, including colour information will likely influence all descriptors in the same way, leading to the same relations. Therefore, while colour information is generally available, we decided not to include it in the descriptors.
In summary, this paper presents an experimental evaluation of five of the most prominent local shape descriptors for the classification of 3D point clouds in precision farming. In more detail, we examine the Point-Feature Histogram (PFH), Fast-Point-Feature Histogram (FPFH), Signature of Histograms of Orientations (SHOT), Rotational Projection Statistics (RoPS) and Spin Images with respect to their suitability to assign points in a 3D point cloud to different plant organs. Results on four scans including one representative scene each are presented. For the classification part, a supervised approach using an SVM is compared to an unsupervised k-means clustering.
In the literature, most descriptors can be divided into two classes: histogram and signature descriptors. Descriptors falling into the first class build histograms of the properties of neighbouring points. The second class uses the values of such properties directly as features.
A variety of descriptors representing shape properties of surfaces in 3D point clouds were introduced in the context of object recognition [
All descriptors require the computation of normals. They are derived for each point using a Principal Component Analysis (PCA) based on the neighbours in a radius r n around the point. A local reference frame makes the computation of the features invariant of the viewpoint.
We used implementations from the Point Cloud Library [
The PFH [
For the PFH, the properties of the surface spanned by a point p and its neighbours in a support radius r ∈ ℝ are derived by computing the Darboux frame as local reference frame between all pairs of points in the neighbourhood. The difference between the normals is then expressed as a set of three angular features. The PFH is derived by binning every combination of these angular features into a histogram with a number of b 3 with b ∈ ℕ bins representing a fully correlated feature space.
The computation of the FPFH is similar, but instead of computing the angular features for every combination of points in the neighbourhood, a so called Simplified-Point-Feature Histogram (SPFH) is created containing the set of angular features computed only between the point and each of its neighbours. The FPFH is then derived by collecting the SPFHs of the point itself and those of its neighbours in a support radius r weighted by their Euclidean distance. Additionally, instead of using a fully correlated feature space, the angular features are binned into three separate histograms and concatenated. This significantly reduces the histogram size for the FPFH.
Both PFH and FPFH are parametrized by the number of bins b and the support radius r.
The SHOT descriptor [
We leave the parameters concerning the division of the support structure fixed, relying on the suggestion of the authors that this is a robust choice. This leaves the support radius r and the number of bins b in the histograms as remaining parameters.
Other than the descriptors used so far, RoPS [
For each query point and its neighbours in a radius r, a local reference frame is computed to achieve rotational invariance. Several steps are applied to each of the axes of this reference frame:
1) The local surface is rotated around the current axis;
2) All points in the local surface are projected onto the XY, XZ and YZ planes;
3) For each plane, statistical information about the distribution of the projected points is computed and concatenated in the final descriptor.
The available parameters are the support radius r and the number of bins b in the final descriptor.
A Spin Image [
The Spin Image is parametrized by the support radius r and the size of the Spin Image b (representing width and height of the image).
An experimental quantitative evaluation is performed on four different scans depicted in
The first one (
The second scan (
The third scan (
Finally, the last scan (
and the background, including leaves and stalk of the flower. This is the most challenging data set, as the chosen descriptors rely on the shape to differentiate the objects, but petals and leaves are rather similar. The only difference is a small curvature in the petals, while the leaves are more smooth. In many cases, the number of blossoms in a field already can be used as an early estimate of the yield. Therefore, differentiating between them and the background would be an important step.
We will refer to the scans as “Leaves”, “Grape”, “Branch” and “Flower” sets in the rest of the paper. All data sets were generated with the Artec Spider 3D scanner with a resolution of 0.1 mm and an accuracy of up to 0.05 mm [
The performance of the descriptors is compared based on one supervised and one unsupervised approach.
For the supervised classification a Support-Vector-Machine (SVM) from the freely available svm-light library [
The unsupervised classification is based on a k-means++ approach [
We vary the number of cluster centres k as former examination showed that in some cases, even when only two types of objects are present in the data (like in our case in the grape and leaves data sets) using more cluster centres can be beneficial, as it allows for a finer clustering [
We optimize the radius parameters for normals (rn) and support region (r) separately for each data set using a grid search on the SVM approach. The resulting best parameter combinations can be seen in
In most cases, the tendency is the same for all parameters on the same data set. On branch, grape and leaves combinations of small rn and great r lead to the best results. The flower data set is a special case, as the petals only differ from the leaves on a small scale (petals are slightly curved while leaves are rather smooth). Therefore, some descriptors tend to deliver better results if r is chosen relatively small. Only FPFH and SHOT remain stable with small rn and great r over all data sets.
Former studies showed that as long as the bin sizes are set to sufficiently great values and a fitting distance measure is chosen, the bin sizes do not significantly influence the result [
To judge the quality of the classification, the accuracy is computed on each data set, in the case of the SVM averaged over the 10 validation sets.
The accuracy achieved with a supervised classification is presented in
On all data sets, the descriptors achieved results of more than 85% and taking out the challenging flower data set even more than 90%. Still, differences are visible.
The best results are achieved using the FPFHs. Even on the flower data set they yield more than 94% accuracy and up to over 99% on the branch data set.
Descriptor | Leaves | Grape | Branch | Flower |
---|---|---|---|---|
PFH | 1.5/6.0 | 1.0/5.0 | 0.5/5.0 | 0.5/2.5 |
FPFH | 2.5/6.0 | 1.0/5.0 | 1.5/5.0 | 0.5/5.0 |
SHOT | 1.5/6.0 | 1.0/5.0 | 1.5/6.0 | 0.5/5.5 |
RoPS | 0.5/6.0 | 1.0/4.0 | 0.5/4.5 | 1.0/1.5 |
Spin Images | 2.5/7.0 | 1.5/5.0 | 1.5/5.0 | 1.0/1.5 |
Descriptor | Leaves | Grape | Branch | Flower |
---|---|---|---|---|
PFH | 96.00 ± 0.23 | 94.68 ± 0.05 | 95.91 ± 0.19 | 88.90 ± 0.20 |
FPFH | 96.87 ± 0.20 | 97.27 ± 0.06 | 99.20 ± 0.01 | 94.74 ± 0.15 |
SHOT | 96.38 ± 0.20 | 95.93 ± 0.03 | 96.71 ± 0.06 | 86.05 ± 0.26 |
RoPS | 94.57 ± 0.17 | 92.64 ± 0.08 | 93.79 ± 0.15 | 86.87 ± 0.16 |
Spin Images | 90.04 ± 0.21 | 94.35 ± 0.05 | 96.40 ± 0.04 | 88.29 ± 0.13 |
The other descriptors perform comparably, with the RoPS descriptor and the Spin Images leading to the worst results, but still close to the others.
The accuracy achieved using a k-means clustering approach with either Euclidean or χ2-distance is depicted in
The PFHs come close to the FPFHs in most cases, delivering better results on the flower data set, but worse on the branch data set.
Both RoPS and SHOT descriptor and Spin Images as well show bad results when using few cluster centres, only for more than four to six they stabilize. But even then, they do not achieve the same quality as PFHs and FPFHs. An exception is the branch data set, where the Spin Images perform almost as good as the FPFHs, but requiring more cluster centres.
All descriptors beside FPFHs and RoPS show a less robust behaviour when using the Euclidean distance compared to the χ2-distance. This suggests that for histogram descriptors with a greater number of bins (between 125 and 352) it is important to use a distance metric specific for histograms.
On the leaves and grape data set, all descriptors achieve good results of more than 85%. The branch data set is more challenging, but FPFHs and Spin Images both achieve more than 90%. On the flower data set, SHOT and RoPS descriptor and Spin Images as well fall below 80%. Only PFHs and FPFHs are able to get close to 90%.
Both leaves and grape data set emerge to be rather simple classification problems, containing rather differently shaped plant organs. The branch data set provides a combination of the classes in the other data sets, being more challenging as more types of objects are included. The most problematic case is the flower data set.
The choice of the radius parameters proves to be dependent on the application. While the normal radius rn can usually simply be set to a value equal or greater than the resolution, the support radius r has to be adjusted to the special requirements. If the plant organs that are to be distinguished vary only on a small scale, like on the flower data set, this has to be reflected by a smaller choice of r.
As expected, choosing an SVM as more sophisticated classification method makes the choice of the descriptor almost irrelevant, as all of them achieve very good results. But for an SVM, a gold standard has to be prepared and depending on the use case, this can be hard (e.g. the labelling of the stem skeleton inside a grape bunch is almost impossible to do manually). Fortunately, even in the case of unsupervised classification FPFHs yield very good results.
The evaluation on the representative sets shows a clear ranking for the SVM-based classification: FPFHs perform best, while the other descriptors all yield results similar to each other. In the case of the k-means clustering, we have on average the following ranking: FPFHs > PFHs > Spin Images > RoPS, SHOT. There are slight deviations, e.g., the Spin Images show the worst results of all descriptors on the Leaves data set, but reach almost the same quality of results as FPFHs on the Branch data set. The same effect can be seen in the SVM results. This suggests that the resolution chosen for the Spin Images in this paper is better suited to distinguish between round and flat or cylindrical shapes than between flat and cylindrical shapes only.
All in all and despite the exemplary character of the evaluation the results clearly suggest using FPFHs as descriptor of choice when compared with SHOT, RoPS and Spin Images.
In applications like scan registration, both RoPS and SHOT descriptor were found to outperform PFHs and FPFHs [
In this work, the performance of different descriptors and classification methods in the context of precision farming is shown, represented by four typical settings, including the distinction between leaves, stalks and berries. When using a supervised classification with an SVM, the FPFHs lead to the best result in a tight ranking with the other descriptors. The results achieved with unsupervised k-means clustering show an even more distinct tendency: while the performances of the other descriptors drop, FPFHs still yield results comparable to supervised classification.
So far, we presented experimental results on one representative scan for each type of scan data. To validate our conclusions, the same experiments should be done on a much higher number of data sets.
Furthermore, a reconstruction of plant organs with geometric primitives could be applied to the classified data to derive phenotypes e.g. for yield estimation directly from 3D input data.
This work was done within the project “Automated Evaluation and Comparison of Grapevine Genotypes by means of Grape Cluster Architecture” which is supported by the Deutsche Forschungsgemeinschaft (funding code: STE 806/2-1). We thank the DFG for supporting our work.
Mack, J., Trakowski, A., Rist, F., Herzog, K., Töpfer, R. and Steinhage, V. (2017) Experimental Evaluation of the Performance of Local Shape Descriptors for the Classification of 3D Data in Precision Farming. Journal of Computer and Communications, 5, 1-12. https://doi.org/10.4236/jcc.2017.512001