Segmenting Salient Objects in 3D Point Clouds of Indoor Scenes Using Geodesic Distances

doi:10.4236/jsip.2013.43B018

Paper Menu >>

Journal Menu >>

Journal of Signal and Information Processing, 2013, 4, 102-108

doi:10.4236/jsip.2013.43B018 Published Online August 2013 (http://www.scirp.org/journal/jsip)

Segmenting Salient Objects in 3D Point Clouds of Indoor

Scenes Using Geodesic Distances

Shashank Bhatia, Stephan K. Chalup

School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan 2308, NSW, Australia.

Email: shashank.bhatia@uon.edu.au, stephan.chalup@newcastle.edu.au

Received May, 2013.

ABSTRACT

Visual attention mechanisms allow humans to extract relevant and important information from raw input percepts.

Many applications in robotics and computer vision have modeled human visual attention mechanisms using a bottom-

up data centric approach. In co ntrast, recent studies in cognitive scien ce h ighligh t adv antages of a top-do wn ap pro ach to

the attention mechanisms, especially in applications involving goal-directed search. In this paper, we propose a top-

down approach for extracting salient objects/regions of space. The top-down methodology first isolates different objects

in an unorganized poin t cloud, and compares each object for uniqueness. A measure of saliency using the properties of

geodesic distance on the object’s surface is defined. Our method works on 3D point cloud data, and identifies salient

objects of high curvature and unique silhouette. These being the most unique features of a scene, are robust to clutter,

occlusions and view point changes. We provide the details of the proposed method and initial experimental results.

Keywords: Saliency Detection; 3D Image Analysis; Image Segmentation

1. Introduction

Traditionally, methods of landmark extraction for the

purpose of robot localization have been dependent on the

type of environment and nature of landmarks. Such

methods follow the standard procedure of sequentially

scanning the input percept and aim to match pre-defined

patterns for recognition of landmarks. The necessity of

pre-defining patterns associated with landmark locations,

limits the use of the rob ot to a specific environment. Fur-

thermore sequential processing of data necessitates high

computational power to be ported on a mobile plat- form.

This sequential processing of pix els or image windows is

in contrast to human visual mechanism. The latter incor-

porates an attention mechanism that helps humans to fo-

cus on the most relevant stimuli based on the task at hand

[1-2]. Incorporation of similar strategy in computational

vision systems, especially for application of robotics, can

have many advantages. Computational Attention (CA),

commonly known as “Saliency Detection” or “Interest

Point Detection”, aims to identify the regions of sensory

input that stand out from their neighbors and attract the

attention of the subject [3-4]. Due to the conv enience that

CA offers, it has been adapted in many applications re-

quiring either judicial selection of inputs [5], minimizing

computational cost [6], or to achieve invariance to clutter

[7].

Computational attention in the 2D image domain has

been investigated from past five decades. C. Koch and S.

Ullman [8] were the first to provide theoretical founda-

tions of visual attention mechanisms. The authors pro-

posed creation of different conspicuity maps each select-

ing locations in visual space, which d iffer from their sur-

roundings in terms of color and orientation. Further, a

Winner Take All (WTA) neural network was proposed to

combine different conspicuity maps and select the most

salient region. Most of the current visual attention sys-

tems are based on the implementation of the WTA net-

works, developed by L. Itti et al. [9]. In their implemen-

tation, the authors extended the existing theoretical con-

cepts by adding intensity as another feature for comput-

ing the conspicuity map. However, it should be noted that

most of the existing approaches use the intrinsic proper-

ties of an input image. These properties depend on factors

like the presence of ambient light, amount of re- flections,

visibility of colors and presence of occlusions and are

therefore unstable. This factor has motivated re- search-

ers to utilize 3D depth information (which is inde- pen-

dent of ambient light) in the process of determining sali-

ent regions. Furthermore, the availability of the low- cost

3D capturing devices in recent years, has motivated the

usage of 3D depth informatio n, especially for systems on

mobile robotic agents.

To id entif y salie nt reg ions in a scene, mechanisms that

evaluate intrinsic properties of raw data elements to spot

Segmenting Salient Objects in 3D Point Clouds of Indoor Scenes Using Geodesic Distances 103

the regions of potential interest are said to follow bot-

tom-up approaches [10]. Such methods explore the

neighborhood of each data point present in the input, and

assemble points into small clusters having salient char-

acteristics. In this process, the clusters of higher saliency

thus obtained, may comprise of arbitrary points that may

belong to multiple objects. On the other hand, methods

that instead of operating on individual data elements

evaluate a collection (with all data elements belonging to

same object) are known as top-down approaches. Here,

the evaluation is performed after isolation of the points

into different obj ects. Many cognitiv e science studies and

experimental evaluations described in [11] have shown

that bottom-up methods are well suited to applications

involving explorative tasks, but may not be suitable for

goal-directed searching. These studies encourage ideas of

taking an “object” as a unit for attention selection and

support suitability of top-down approaches for applica-

tions involving goal directed search. However, since most

of the existing methods of 3D saliency detection are

based on bottom-up approaches, their usage has re-

mained limited. Only the most simplistic methods like

[12] and [6] have been used in robotic applications.

In this paper, we propose a simple top-down approach

to extract salient regions from raw 3D point cloud data.

The top-down nature of our approach segments the scene

into physically disconnected regions and then compares

properties of each region for saliency. We define saliency

measures that capture variations in curvature and silhou-

ette (an outline of an object/scene consisting of feature-

less interior) of the corresponding regions, and compare

them with other objects present in the surroundings. We

report the initial experiments and results of by testing it

in an environment containing objects of different shapes

and degrees of curvature. Section 2 of the paper provides

a short review of related work, followed by the motiva-

tion behind this research. Details of the proposed ap-

proach are provided in section 3 and 4. Section 5 de-

scribes the measures and initial experiments conducted,

and concludes the paper.

2. Related Work

Available methods in majority either cannot handle large

size point clouds, or to achieve computational efficiency,

reduce the dimensionality of the point cloud. In [13], a

multi-scale filtering operator is derived by the convolu-

tion of a Gaussian kernel with the operating surface. The

operator has the property of being proportional to the

curvature of the local area at which it is applied. In effect,

it is directly applicable to 3D point clouds and captures

the variation in shape of the neighborhood of point of

application. However, the need of processing over multi-

ple scales renders its utility limited to very small point

clouds. To overcome this disadvantage, J. Stuckler and S.

Behnke [7] extended the interest operator to work on

depth images. Similar to the historical intensity-driven

visual attention algorithms, their approach builds a multi-

scale pyramid representation of the depth image to be

used with the operator. However, approximation of the

depth image limits the factual description of the interest

points limited to detection of blobs and corner-like fea-

tures. It is important to note that in goal-driven applica-

tions, salient regions are useful only if they can be rec-

ognized as important features. Common to most applica-

tions, extracted regions should be invariant to noise, scale,

and viewpoint transfor m. Following the same conven tion

of bottom-up mechanisms provided by [7] and [12] limits

the input space to depth images, and pro- vide a simplis-

tic method to extract boundary regions as areas of interest.

While this approach is computationally efficient, it does

not take account of the 3D shape of the objects in deter-

mining the saliency values.

Cole and Harrison were one of the first to incorporate

the 3D curvature information directly to identify regions

of interest for the application of robot Simultaneous Lo-

calization and Mapping (SLAM) [6]. The auth ors utilized

an information-theoretic entropy measure to identify the

regions of maximum random curvature. In contrast to [7],

authors of [12] used spherical regions to define the scale

space. The degree of saliency was based on the entropy

of normals in each spherical region and its variation over

multiple scales. Using a spherical shape for defining a

scale space leads to the selection of points with the high-

est variation in curvature as the most salient regions.

These points in general may correspond to more than one

object, and therefore do not provide any recognizable

information of the selected region.

Flint et al. [14] defined an interest point to be the one

having the largest principal curvature in all three Euclid-

ian axes. The magnitude of all three principal curvatures

was estimated by calculating the Hessian matrix con-

volved with a Gaussian kernel. Finally, the areas having

the highest determinant of the Hessian matrix were

deemed to be most salient regions. The experimental re-

sults revealed that the proposed method extracts all cor-

ner and edge points, representing the areas with the high-

est variations in curvature. Usin g spatial properties of the

cloud data, Akman and Jonker [4] proposed to include

the depth as a criterion for saliency. In their approach,

two different saliency maps were used in combination to

obtain the final saliency map. The first saliency map was

calculated using values of curvature. The second map

was taken as being inversely proportional to the depth of

a region. The farther a region, the lower its saliency

would be. In the application of saliency for object classi-

fication, Potapova and Zillich [15] proposed the extrac-

tion of the orientation of objects in point clouds relative

Segmenting Salient Objects in 3D Point Clouds of Indoor Scenes Using Geodesic Distances

104

to the surface on which they are located. This relative

orientation was used to create a saliency map, and was

combined with the traditional 2D saliency approach to

obtain a complete saliency map.

As evident from our review, most of the present re-

search on visual attention mechanism on 3D data has

been focused on bottom-up approaches. These ap-

proaches assume homogeneity of input data, and the de-

tection of saliency is mostly based on the intrinsic prop-

erties. There is no mechanism that suggests the grouping

of raw data into identifiable objects/artifacts present in

the input space. Some of these methods require multi-

scale processing, while others approximate 3D data to

depth images. For this research, a particular highlighted

drawback is the grouping of multiple objects into a re-

gion of interest. This potentially can degrade the effi-

ciency of goal directed search applications. Research in

cognitive science has shown the suitability of top-down

attention mechanisms in goal directed applications [11].

Further, top-down mechanisms are more computationally

efficient, as they do not necessarily process all the data

sequentially. In view of these findings, in this paper we

propose a top-down approach for salient region detection.

Our approach first clusters objects present in an input

percept, and then evaluates each separated object for the

degree of attention that it may acquire. With the applica-

tion of our approach, goal directed search applications

have the possibility of increasing computational per-

formance, and time efficiency. The maj or contribution of

our approach is its top-down nature of saliency estima-

tion, which considers an object as an elemental unit for

attention selection.

3. Planar Region Extraction

Indoor scenes constitute planes as a major part of their

point cloud input. The planar part of the 3D point cloud

does not contribute to any variations in curvature, and

therefore become distractions and increase computational

cost. Additionally, removal of planar regions is required

for isolation of salient objects from other artifacts present

in the scene. Ther efore, while seeking to identify regions

with higher curvature, we choose to remove these planar

regions using the RANSAC method as described in [16].

3.1. Local Surface Normals

The planar regions have low or zero curvature associated

with them. Leveraging on these regions can be identified

if the surface normal to each point is known. These LSNs

are estimated using the method described in [17]. The

surface normal to a query point can be estimated using

the eigenvalues and eigenvectors of the of the covariance

matrix comprising the k-nearest neighbors of the query

point (equation 1).

C1

k

i1



p



p

p





(1)













,j{0,1,2}















(2)

Here

is the 3D centroid of the nearest neighbors.



is the eigenvalue, and



its corresponding

eigenvector. The eigenvector



corresponding to smal-

lest eigenvalue



is the approximation of the normal

at the query point, and the ratio of the eigenvalues (equa-

tion 2) provides an estimate of variation in curvature at

the query point.

3.2. Iterative RANSAC

Plane extraction using Random Sample Consensus

(RANSAC) method described in [16], identifies best

fitting planar region in the input point cloud. As a result,

only one largest p lanar region is extracted fro m the input

point cloud, leaving the rest intact. In order to remove

subsequent planar regions, we adapted recursive use of

RANSAC. The input point cloud is processed multiple

times, separating one best planar surface at a time.

Recursive RANSAC works by feeding back the residual

cloud obtained in previous iteration. The extraction is

executed until 95% of the point cloud is processed. As a

result, a list of all extracted planar regions is obtained.

This list contains all possible planar reg ions present in the

input percept. In addition, the list may also contain parts

of planar regions embedded on objects present in the

environment. The resultant cloud may thus contain

occlusions and holes. These holes also lead to incorrect

object based clustering. To overcome this disadvantage,

we employ a statistical high pass filter, which removes

only significantly large planar region s present in the input

cloud. The high pass threshold value is set to







where



and



are the mean and standard deviation

of the total number of points contained in all the

candidate planes. Finally the candidate planes with

number of points above the threshold are removed from

the original cloud.

Figure 1 displays raw point cloud data of The

Newcastle Robotics Lab, with different objects placed on

the ground. The objects comprise of a toy bear, a

humanoid robot, a carton box, and a basketball. These

objects have different properties of curvature and were

chosen to demonstrate the behaviour of the proposed

method with different types of objects. The extracted

planar region points are marked different shades of grey.

Objects that remain after plane extraction are marked in

black. The data was collected using the Kinect RGB-D

camera [18]. More details of the experimental setup are

Segmenting Salient Objects in 3D Point Clouds of Indoor Scenes Using Geodesic Distances 105

Figure 1. Complete point cloud (top), corresponding

detected planar regions marked in different shades of grey

(bottom), residual objects after plane extraction marked in

red (bottom).

provided in Section 4.

4. Salient Region Extraction

In this section we provide details of how saliency is

measured, and relevant areas with high saliency extracted.

There are two major phases involved. First, the Euclid-

ean clustering, and second the ranking of the extracted

clusters for saliency. Euclidean clustering divides the

point cloud into smaller objects/regions to be considered

as elements for saliency computation. Finally, the sepa-

rated objects are evaluated for saliency. The following

subsections pr ov ide detail s of eac h ph ase.

4.1. Euclidean Clustering

The residual cloud obtained after removal of planar re-

gions contains unlabeled points, some belonging to iso-

lated objects and others being noisy residual of planar

extraction. In order to perform a top-down object-level

comparison of uniqueness, these objects have to be iden-

tified as separate entities. In other words, the point cloud

needs to be divided in multiple parts, each containing one

isolated object. This is achieved by comparing the

Euclidean distance between neighboring points. Cluster-

ing is performed using a k-nearest neighbor search [19].

Nearest neighbor search starts by selecting a random

point from the residual cloud, and computes the Euclid-

ean distance of the point from its nearest neighbors.

Points that fall under a threshold value are labeled to be

part of the object. The search stops when no nearest

neighbor falls under the threshold distance. At this stage

the points found so far are labeled into one group, an d the

search starts again by removing the object from the cloud,

and randomly selecting another un-labeled point. The

clustering stops when all poin ts are labeled. For the ne ar-

est neighbor search, we make use of a binary KdTree

implementation as in [20]. This approach divides the re-

sidual cloud into a binary tree structure, enabling easy

and fast nearest neighbor searches. The result of the clus-

tering can be seen in Figure 2. The distance threshold

used here is 0.2 m and as evident, the method identifies

four different objects present in the scene. In the figure,

each object is presented by different color of the points it

contains.

4.2. Saliency Ranking

Multiple point clouds obtained as a result of clustering

are evaluated for uniqueness in two aspects: 1) Variance

of curvature on the object’s surface, 2) Shape of silhou-

ette formed from the object. These properties are cap-

tured together in one measure, defined using the differ-

ence between geodesic and Euclidean distances between

all sets of points in the object point cloud. The geodesic

distance between any two points of a cloud is the length

of the shortest curve on the surface, connecting these

points. Due to the embedding of the curve on the surface,

in Euclidean space the geodesic distance between two

points having non-zero curvature is always greater than

or equal to their Euclidean distance. Additionally, sur-

faces with high amount of variation in the curvature of

their boundary/silhouette may also have their geodesic

Figure 2. Clusters obtained after removal of planar regions,

and performing Euclidean Cluster ing on the residual cloud.

Segmenting Salient Objects in 3D Point Clouds of Indoor Scenes Using Geodesic Distances

106

distance between any two points on their boundary

granter than corresponding Euclidean distance. Figure 3

illustrates these facts by means of two simple examples.

First, a half sphere is presented with the geodesic dis-

tance (green curve) and Euclidean distance between two

points on the surface of the half sphere. The grater value

of geodesic distance is evident from the figure. Secondly,

a curved silhouette of an arbitrary object is presented.

Again, the geodesic distance between two points on the

boundary is grater than their corresponding Euclidean

distance.

Figure 3 conveys that for any object with curved sil-

houette and higher curvature on the surface, the values of

sum of geodesic distance between all points would be

high. More precisely, the difference between geodesic

and Euclidean distances between all points of the surface

can be used to identify objects with higher curvature and

complex shape of silhouette. Exploiting the properties of

geodesic distance, we formulate a saliency measure that

captures variation in the curvature as well as the silhou-

ette of the object under study. Consider an object point

cloud

comprising of a total

points, for each

point in we define the following:

p

, ij,i,jC

 p

, ij,i,jC

G

E





VkPkPk



nk1



V



(3)

Gij

denotes the geodesic distances from point

in the

object point cloud.

denotes corre-

sponding Euclidean distance between the same points.

Figure 3. Comparison of geodesic distance (dashed line) and

Euclidean distance (solid line) in presence of curvature (left),

on a curved silhouette (right). Note that in both cases, the

geodesic distances are higher than Euclidean distances.

Since the geodesic distances are always grater or equal to

the Euclidean distances for surfaces with higher curva-

tures, any object having higher variability in the differ-

ence between

and

will stand out from its sur-

roundings. This variance is captured by

and finally,

the geodesic saliency

is defined as the normalized

value of

(normalized over all clusters). The graph in

Figure 4 presents the quantity

normalized by the

size of the cluster. This normalization factor also ensures

that the value of saliency does not depend on the size and

number of points contained in the point cloud of the ob-

ject. The graph illustrates the variation in the values of

the proposed saliency measure against changes in dis-

tance. There are four different objects present in the input

cloud namely a toy bear, Nao humanoid robot, a basket-

ball, and a flat box object. It can be noticed that as the

distance increases, the value of saliency reduces. This

happens due to the addition of noise. Moving away from

objects, the curvature is less observable. This particular

noise addition is sensor dependent, and current experi-

ments report the results obtained using Microsoft Ki-

nect-RGBD Sensor.

Geodesic distances between all pairs of points in each

object point cloud are computed using Floyd-Warshalls

algorithm [21]. The point cloud of each object is con-

verted into a fully connected graph, with each point

treated as a vertex. The algorithm compares distance-

minimizing paths between two vertices in the given con-

nected graph and incrementally improves the estimate of

geodesic distance between two vertices iteratively. A

more detailed explanation of the algorithm can be found

in [21].

Figure 4. Values of the quantity k

GE, normalized over

the size of cloud. Note that as the distance increases, the

saliency values of ball and box converge. This is due to

increasing noise in the calculation of curvature with

increase in distance.

Segmenting Salient Objects in 3D Point Clouds of Indoor Scenes Using Geodesic Distances 107

5. Experimental Evaluation

In order to evaluate our approach we captured point

clouds from different view-angles in the laboratory. Four

objects of varying shapes and curvature were placed on

the floor. 3D point clouds were recorded with 3D

Time-Of-Flight (TOF), Microsoft Kinect Sensor [18] that

was mounted on a tr ipod. The viewpo int was varied fr om

-25 to 25 degrees, in steps of 5 degree. The distance was

varied betw een 1.5 m to 2 m, in steps of 0.1 m. The input

clouds were processed on a Dell workstation equipped

with Intel Xeon® 3.40 GHz processor and 16 GB of

RAM.

5.1. Performance Measures

We utilized the existing measures of repeatability and

overlap, as described in [7] to evaluate our approach.

These measures are known to evaluate qualitative and

quantitative performance of saliency extraction method.

The repeatability of detection of salient regions is de-

fined as the frequency with which the same cluster is

ranked with a si milar level of salien cy. This measures the

stability of the approach with variations in distance and

viewpoint changes. Overlap rate on the other hand, is

calculated by comparing the location of salient regions

found with variations in distance and viewpoint. If the

salient regions belong to same location, overlap is incre-

mented and vice versa. Location of the salient region was

calculated as the centroid of the point cloud clus ter. Sin ce

the position of the sensor was changed, these cen- troids

were transformed from the local frame of reference into

the global frame of reference of the environment. Heat-

maps are used to gr aphically represent this perfor- mance

measure. The heatmap used consists of cells, with each

cell representing the value of repeatability/overlap

(scaled between 0 and 1). The rows of the heatmap rep-

resent the distance of evaluation, and the columns repre-

sent the angle in degrees. All together, presented heat-

maps are a visual representation of robustness of pro-

posed method.

5.2. Discussion

Figure 5 shows the resultant heat map reflecting the

overlap and repeatability of salient object detection. The

values of repeatability and overlap are scaled (between 0

and 1) to provide an accurate account of the performance.

The figure demonstrates the repeatability of the toy bear

is higher than that of the rob ot. This is due to complexity

of silhouette of the toy bear. Reason behind higher sali-

ency values of the toy bear are depicted in Figure 3. Ad-

ditionally Figure 4 conforms to results in Figure 5,

where the toy Bear has attained highest values of sali-

ency. The humanoid robot, having highly curved surface

Figure 5. Bear (top), Ball (row 2), Robot (row 3), and Box

(bottom) performance (left: overlap and right: repeatability)

wrt. view angle and distance. This performance measure

was adapted from [7]. We can see that the proposed method

is robust to viewpoint and scale changes.

follows next in saliency ranking. Despite being smaller as

compared to the flat box, it has higher values of sali-

ency. Finally, the robustness is demonstrated by the high

values of repeatability, which in most cases ranges be-

tween 0.7 to 1. It should be noted that lower values of

overlap and repeatability in case of bear and ball are due

to the restricted exposure of the objects with change in

angle of the sensor. Moving beyond 10 degree, the bear

was not completely visible in the Field of View (FOV) of

the sensor. Similarly the ball, that was not visible in the

FOV while changing the angle of the sensor below 0 de-

gree. Apart from the missing values, all other observa-

tions presented high values of the two measures, which

are most desirable characteristics of salient region ex-

traction methods [7].

6. Conclusions

In this paper, we present a top-down approach for ex-

Segmenting Salient Objects in 3D Point Clouds of Indoor Scenes Using Geodesic Distances

108

tracting salient regions/objects from indoor environments.

Our method segregates significant planar regions, and

extracts isolated objects present in the residual point

cloud. Each object is then ranked for saliency based on

higher curvature complexity of the silhouette. These

properties are captured together using the proposed geo-

desic distance measure (Figure 4). The paper has re-

ported initial experiments and demonstrates capacity of

the method in identifying objects/regions of higher cur-

vature. Further, testing with variations in viewpoint and

distance, reveal stability of propo sed saliency criterion.

These initial experiments demonstrate the advantages

of adapting top-down clustering for the purpose of sali-

ency ranking. A possible limitation of the method could

be identified as lack of using RGB informatio n to suppo rt

the selection of salient regions, and future developments

of this research aim to include variations in color for sa-

liency computation.

REFERENCES

[1] T. N. Vikram, M. Tscherepanow and B. Wrede, “A Sali-

ency Map Based on Sampling an Image Into Random

Rectangular Regions of Interest,” Pattern Recognition,

Vol. 45, No. 9, Sep. 2013, pp. 3114-3124.

doi:10.1016/j.patcog.2012.02.009

[2] S. Frintrop, E. Rome and H. I. Christensen, “Computa-

tional Visual Attention Systems and Their Cognitive

Foundations,” ACM Transactions on Applied Perception,

vol. 7, no. 1, Jan. 2010, pp. 1-39.

doi:10.1145/1658349.1658355

[3] N. Riche, M. Mancas, B. Gosselin and T. Dutoit, “3D

Saliency for Abnormal Motion Selection: The Role of the

Depth Map,” in J. Crowley, B. Draper, and M. Thonnat,

Eds. Computer Vision Systems, Springer Berlin/Heidel-

berg, 2011, pp. 143-152.

doi:10.1007/978-3-642-23968-7_15

[4] O. Akman and P. Jonker, “Computing Saliency Map from

Spatial Information in Point Cloud Data,” Advanced

Concepts for Intelligent Vision Systems, Vol. 6474, 2010,

pp. 290-299. doi:10.1007/978-3-642-17688-3_28

[5] D. Simon, “Fast and Accurate Shape-Based Registration,”

PhD thesis, Robotics Institute, Carnegie Mellon Univer-

sity, Pittsburg, PA, 1996.

[6] D. Cole and A. Harrison, “Using Naturally Salient Re-

gions for SLAM with 3D Laser Data,” In Proceedings of

International Conference on Robotics and Automation,

Workshop on SLAM, 2005

[7] J. Stückler and S. Behnke, “Interest Point Detection in

Depth Images Through Scale-Space Surface Analysis,” In

2011 IEEE International Conference on Robotics and

Automation (ICRA), 2011, pp. 3568-3574.

doi:10.1109/ICRA.2011.5980474

[8] C. Koch and S. Ullman, “Shifts in Selective Visual Atten-

tion: Towards the Underlying Neural Circuitry,” Human

Neurobiology, Springer-Verlag, Vol. 4, No. 4, 1985, pp.

219-227.

[9] L. Itti, C. Koch and E. Niebur, “A Model of Saliency

Based Visual Attention for Rapid Scene Analysis,” IEEE

Transactions on Pattern Analysis and Machine Intelli-

gence, Vol. 20, No. 11, 1998, pp. 1254-1259.

doi:10.1109/34.730558

[10] C. E. Connor, H. E. Egeth and S. Yantis, “Visual Atten-

tion: Bottom-Up Versus Top-Down,” Current Biology,

Vol. 14, No. 19, 2004, pp. R850-R852.

doi:10.1016/j.cub.2004.09.041

[11] M. Begum and F. Karray, “Visual Attention for Robotic

Cognition: A Survey,” IEEE Transactions on Autonomous

Mental Development, Vol. 3, No. 1, 2011, pp. 92-105.

doi:10.1109/TAMD.2010.2096505

[12] B. Steder and R. Rusu, “NARF: 3D Range Image Fea-

tures for Object Recognition”, In Workshop on Defining

and Solving Realistic Perception Problems in Personal

Robotics at the IEEE/RSJ International Conference on

Intelligent Robots and Systems (IROS), 2010.

[13] R. Unnikrishnan and M. Hebert, “Multi-Scale Interest

Regions From Unorganized Point Clouds”, In IEEE

Computer Society Conference, Computer Vision and Pat-

tern Recognition Workshops, 2008, pp. 1-8.

[14] A. Flint, A. Dick and A. v. d. Hengel, “Thrift: Local 3D

Structure Recognition,” In Society on Digital Image

Computing Techniques and Applications, 9th Biennial

Conference of the Australian Pattern Recognition, 2007,

pp. 182-188.

[15] E. Potapova, M. Zillich and M. Vincze, “Calculation of

Attention Points Using 3D Cues,” 35th Annual Workshop

of the Austrian Association for Pattern Recognition

(OAGM/AAPR), May 2011.

[16] R. B. Rusu, “Semantic 3d Object Maps for Everyday Ma-

nipulation in Human Living Environments,” PhD Thesis,

Computer Science Department, The University of Roch-

ester, Oct. 2009.

[17] D. Holz, S. Holzer and R. Rusu, “Real-Time Plane Seg-

mentation Using RGB-D Cameras,” In Proceedings of the

15th RoboCup International Symposium, Isntanbul, Tur-

key, 2011, pp. 306-317.

[18] Microsoft kinect sensor,

http://www.xbox.com/en-au/kinect?xr=shellnav," 2013.

[19] E. Fix and J. L. Hodges, “Discriminatory analysis. Non-

parametric discrimination: Consistency properties,” In-

ternational Statistical Review/Revue Internationale de

Statistique, 1989, pp. 238-247.

[20] R. Rusu and S. Cousins, “3D Is Here: Point Cloud Library

(PCL),” In IEEE International Conference on Robotics

and Automation (ICRA), 2011, pp. 1-4.

[21] R. W. Floyd, “Algorithm 97: Shortest path”, Communica-

tions of the ACM, Vol. 5, No. 6, Jun. 1962, p. 2.

doi:10.1145/367766.368168