Apply GPCA to Motion Segmentation

doi:10.4236/jilsa.2011.31006

Paper Menu >>

Journal Menu >>

Journal of Intelligent Learning Systems and Applications, 2011, 3, 45-54

doi:10.4236/jilsa.2011.31006 Published Online February 2011 (http://www.SciRP.org/journal/jilsa)

Apply GPCA to Motion Segmentation

Hongchuan Yu, Jian Jun Zhang

National Centre for Computer Animation, The Media School, Bournemouth University, Poole, UK.

Email: {hyu, jzhang}@bournemouth.ac.uk

Received October 14th, 2010; revised November 25th, 2010; accepted December 7th, 2010

ABSTRACT

In this paper, we present a motion segmentation approach based on the subspace segmentation technique, the genera-

lized PCA. By incorporating the cues from the neighborhood of intensity edges of images, motion segmentation is

solved under an algebra framework. Our main contribution is to propose a post-processing procedure, which can

detect the boundaries of motion layers and further determine the layer ordering. Test results on real imagery have con-

firmed the validity of our method.

Keywords: Ge ner al i zed PCA , Motion Segme nt ati o n, Layer Ordering

1. Introduction

An important problem in computer vision is to segment

moving objects of a scene from a video source, and part-

ly recover the structure or motion information, such as

foreground and background. With widespread demands

on video processing, motion segmentation has found

many direct applications. Video surveillance systems

seek to automatically identify people, objects, or activi-

ties of interest in a variety of environments with a set of

stationary cameras. Motion segmentation can provide

low level motion detection and region tracking cues.

Another relatively new application is markerless motion

capture for computer animation. It aims to estimate the

human body configuration and pose in the real world

from a video by locating the joint positions over time and

extracting the articulated structure.

Motion segmentation is expected to partly recover the

structure and motion information of moving objects from

a mutually occluded scene. This includes the following

main tasks, (1) labeling the regions of a motion layer

segmentation, i.e. pixels are assigned to several motion

layers; (2) finding their motion, e.g. each layer has its

own smooth flow field while discontinuities occur be-

tween layers; (3) determining the layer ordering, as the

different layers might occlude each other. But motion

segmentation is not equivalent to object tracking.

Roughly speaking, object tracking is to track the seg-

mented objects over an image sequence, although the

extension of the rigidity constraint to multiple frames is

nontrivial. Motion segmentation aims at the motion lay-

ers of a scene rather than the moving objects. For exam-

ple, if a moving object contains multiple motions at a

moment, it may be divided into several motion layers.

When these motion layers share the same motion, they

could be merged into a single layer. Hence, motion seg-

mentation usually uses the information from a few suc-

cessive frames. In contrast, object tracking focuses on a

moving object in a scene. It utilizes the information from

an image sequence. Motion segmentation plays a role of

fundamental module in motion analysis and tracking. [1]

presented a subspace segmentation method to estimate

the motion models of the motion layers based on two

successive frames. Built on this subspace segmentation

method, this paper will further aim at two other basic

problems of motion segmentation, i.e. the detection of

motion layer boundaries and depth ordering based on two

successive frames. The basic idea is to refine a global

segmentation to solve these two problems. We first ad-

dress this subspace segmentation approach for motion

model estimation. We then incorporate it with the inten-

sity edge information into a post-processing procedure,

which refines the layer boundaries and infers the layer

order between two successive frames. These two proce-

dures form a complete algorithm for motion segmenta-

tion. Our specific contributions in this paper include 1)

the Polysegment algorithm (a special case of the genera-

lized PCA [2]) is employed to detect the layer boundaries

in our post-processing procedure, and 2) the cues from

the intensity edges of images are utilized in the detection

of the layer boundaries and depth ordering.

Previous Works

Although motion segmentation has long been an active

Apply GPCA to Motion Segmentation

area of research, many issues remain open in computer

vision, such as the layered motion descriptions [3,4],

occlusion detection and depth ordering [5-7], and estima-

tion of multiple motion models [8,9].

Most popular approaches to motion segmentation re-

volve around parsing the optical flow field in an image

sequence. Because of the well-known aperture problem,

the motion vector from optical flow computation can

only be determined in the direction of the local intensity

gradient. For the sake of completeness of optical flow

field, it is assumed that the motion is locally smooth.

Obviously, depth discontinuities and multiple indepen-

dently moving objects usually result in discontinuities of

the optical flow. The usual approaches are to parameter-

ize the optical flow field and fit a different model (e.g.

2D affine model) to each moving object, such as the

layered representation of the motion field [3]. The chal-

lenges of the optical flow-based techniques involve iden-

tifying motion layers (or pixel grouping), detecting layer

boundaries, and depth ordering. Previous research can

mostly be grouped into two categories. The first category

is to determine all of the motion models simultaneously.

This can be achieved by parameterising the motions and

segmentation, and using sophisticated statistical tech-

niques to predict the most probable solution. For example,

Smith et al. in [6] presented a layered motion segmenta-

tion approach under a Bayesian framework by tracking

edges between frames. In the implementation of their

proposed scheme, the region edge labels were not direct-

ly applied to the Bayesian model. They were implicitly

determined by the foreground-background orders of the

motion layers and the motion layer labels for each region.

Kumar et al. in [10] presented the learning approach of a

generative layered representation of a scene for motion

segmentation. In order to get the initial estimates of

model, they utilized the loopy belief propagation, and

further refined the initial estimate by using αβ-swap and

α-expansion algorithms. The large number of undeter-

mined parameters in their Bayesian models leads to the

difficult tracking problem in a high dimensional parame-

ter space. The second category is the dominant motion

approach [11-13]. A single motion is first fitted to all

pixels, and then to test for pixels that agree with that mo-

tion. This process can be repeated recursively on the out-

lier pixels to provide a full set of layers [12]. The central

problem faced by this kind of approaches is that it is ex-

tremely difficult to determine the occluded edges of the

moving regions (or motion layers). Furthermore, this

problem can result in the failure of depth ordering of

motion layers. However, analytically reasoning such

complex cases is impractical. The main reasons are three

fold. First, the smoothing required by the optical flow

algorithms makes it difficult to localize the layer boun-

daries. Second, the optical flow field is usually parame-

terized by some 2D motion models (e.g. 2D affine),

which is the first order approximation of the perspective

model. It is unreliable to apply a 2D model to the boun-

daries of moving regions. Third, pixels in a neighbor-

hood of the boundaries are in the areas of high intensity

gradient. Slight errors or image noise can result in pixels

of a very different intensity, even under the correct mo-

tion estimate [6]. In this paper, we will simplify the

problem of motion segmentation based on an algebraic

framework. We will first obtain a rough global segmen-

tation and then refine it afterwards.

Our work is partially inspired by the subspace seg-

mentation approach to motion model estimation pro-

posed in [1]. This approach can provide a non-iterative

and global estimation of motion layer segmentation. But

it is incomplete, since the depth ordering and the detec-

tion of layer boundaries are ignored. In this paper we

provide a complete solution by developing a novel post-

processing procedure using the intensity structures of

edges for the detection of (1) motion layer boundaries

and (2) the layer order.

In the remainder of this paper, we first briefly review

the subspace segmentation approach to motion model

estimation [1] in section 2. In section 3, a post-processing

procedure is presented for the detection of the layer

boundaries and depth ordering. The experimental results

and analysis are given in section 4. Our conclusion and

future work are given in section 5.

2. Motion Segmentation by GPCA-PDA

The core of our proposed motion segmentation approach

is the scheme of segmenting hyperplanes in

R, which

is called the generalized PCA (GPCA) in [2]. Applying

the GPCA method to motion model estimation has been

proposed in [1]. But the resulting motion model estima-

tion can only yield coarse motion segmentation, i.e. the

boundary of the motion layers is very blurry. Our basic

idea is to further refine the boundary of the resulting mo-

tion layers by a post-processing procedure. Before intro-

ducing our post-processing procedure, we firstly review

the motion model estimation approach in [1] briefly. The

two used algorithms, GPCA-PDA Alg. and Polysegment

Alg., can be found in [2]. (We also briefly introduce

these two algorithms in Appendix.)

The first problem to motion segmentation is to obtain

the layered motion models corresponding to indepen-

dently moving regions in a scene, (i.e. layer segmenta-

tion). We address an algebra approach in terms of a

known optical flow field which has been presented in [1].

Its distinct advantage over the other approaches is that it

can determine all motion layers simultaneously.

Given N measurements of the optical flow

Apply GPCA to Motion Segmentation





uv  at the N pixels







12 1

xx , we can de-

scribe them through a affine motion as follows,

11 112213

21 122223

ax axau

ax axav







.

In terms of the hyperplane representation in the Ap-

pendix, the solution to the multiple independent affine

models can be rephrased as follows. Let



,,1,,T

xx uvR and hyperplane i

S be spanned

by the basis of



111121314

,,,,0

baaaa and



221222324

,,,0, T

baaaa. We need to segment a mixture

of the hyperplanes of dimension 3d in 5

R, which is

expressed as,





:, 0

SxRbbx .

The original equations of optical flow have finished

the projection from 5

R to two individual subspaces

of 4

R in a natural way, i.e. each new hyperplane in 4

can be expressed as,



11 1213141234

,,,,,, 0aaaa xxxx.

Applying the scheme of Equations (A1)-(A4) in Ap-

pendix can yield the desired basis



()

Bbb for

each hyperplane i

S in 3

Up to now, one can obtain the initial estimation of all

of the motion layers simultaneously. This is insufficient

for motion segmentation, since we also need to determine

the layer boundaries and the occlusion relationship. Be-

side that, it can be observed that each segmented layer

contains some small and isolated spurious regions, and

the resulting layer boundaries wander around the real

ones. This makes the detection of the layer boundaries

difficult. The occluded regions take place in the neigh-

borhood of the layer boundaries. If the occluding edges

can be determined correctly, the occluded regions can be

segmented correctly. Furthermore, the resulting motion

layers can also be linked to the occluded regions in terms

of the occluding edges for the depth ordering. Hence, it is

a crucial step to determine the occluding edges. Our de-

velopment is based on the following observations (1) the

intensity edges include the boundaries of motion layers;

(2) the layer boundaries are not always the occluding

edges; (3) determining the occluding edges and inferring

the occlusion relationship can be fulfilled by testing the

neighborhood of edges.

We will introduce the intensity edges of images into

the potential occlusion areas for the detection of occlud-

ing edges in the next section.

3. Post-Processing Procedure

Let us consider a single viewpoint. The central problem

is to detect the occluding edges, because the erroneous

edge labeling can cause incorrect depth ordering. Most of

the techniques considered so far employed only the mo-

tion field information for motion segmentation. For each

frame, all edges, including edges of motion layers and

textured edges of objects, are presented in the image in-

tensity structure, which can provide the wealth of addi-

tional information to motion estimation. Due to their ex-

treme length, a number of measurements might be taken

along (or around) them. This leads to a more accurate

estimation of motion.

Recent applications have motivated a renewal of mo-

tion segmentation by tracking edges [6,14]. Ogale et al.

[7] classified the occlusions into three classes. In order to

deduce the ordinal depth, they had to fill the occluded

regions. This is to implicitly approximate the occluding

edges by filling the neighborhood of the layer boundaries.

[6] provides three fundamental assumptions of the rela-

tionship between regions and edges to identify the edges

of moving regions. We add an extra assumption (i.e. the

4th below) and emphasize these four assumptions as fol-

lows.

1) As an object moves all of the edges associated with

that object move, with a motion, which may be approx-

imately described by some motion model.

2) The motions are layered, i.e. one motion takes place

completely in front of another, and the layers are strictly

ordered. Typically the layer farthest from the camera is

referred to as the background, with nearer foreground

layers in front of this.

3)An arbitrary segmented image region only belongs

to one motion model, and hence any occluding boundary

is visible as a region edge in the image.

4) For each frame, the intensity edges involve the

edges of motion layers.

An important conclusion from these four assumptions

is that the layer ordering can be uniquely determined if

the layer of each moving region is known and the oc-

cluding edges are known. [7] presented the relationship

of motion layers and occluded regions, and further em-

phasized that the motion layer involving the occluded

region must be behind another one. Even when the layers

of motion regions are known, ambiguities may still be

presented in the layer boundary labeling, as shown in

Figure 1. In Figure 1(a), due to the occluded region C,

we can infer the occlusion relationship between the mo-

tion regions A and B, while, in Figure 1(b), we cannot

find out the layer order according to the distinct edges of

the motion layers. The layer boundaries are not the same

as the occluding edges. The layer boundaries involve the

occluding edges, but the layer boundaries are not always

the occluding edges. It is infeasible to infer the layer or-

der only by the layer boundaries. We can therefore con-

Apply GPCA to Motion Segmentation

(a) (b)

Figure 1. Illustration of moving re gions (A,B) and occluded regi on (C). (a) The probable layer boundar ies are determined by

extending the moving region to the occluded region; (b) ther e is no occlusion r e g ion be tween layers A and B.

clude that the occlusion relationship hides behind the

occluded regions, and identifying the occluding edges

can reveal the occlusion relationship. The optical flow

computation can usually identify the coarse occlusion

regions as a by-product [15], which will be adopted in

this paper.

The subspace segmentation approach described in sec-

tion 2 is carried out on a given optical flow field instead

of the image intensities. Due to the errors from the opti-

cal field (e.g. aperture problem etc.), each resulting mo-

tion layer contains two kinds of artifacts: (1) small iso-

lated regions with texture and (2) dark holes over the

image plane. It can be observed that a single hole in the

middle of a foreground layer runs through to the back-

ground layer. Similar problems also exist in the occluded

regions. Moreover, the resulting boundaries of motion

layers and their neighborhood are, in general, highly un-

reliable areas. Therefore, the segmentation by the sub-

space segmentation method and the occluded regions

detected by the optical flow computation cannot offer a

valid solution to the above two problems.

Consider the neighborhood of the layer boundaries. It

can be observed that the occluded regions are involved in

the neighborhood of the layer boundaries as shown in

Figure 1(a). The edges’ neighborhood contains the

wealthy intensity structures of image. This can provide

us sufficient cues to find the layer boundaries and oc-

cluding edges. We rephrase the problem of layer edge

detection and depth ordering, and present our postpro-

cessing procedure as follows.

The motion models of the layers are determined by the

subspace segmentation approach described in section 2,

while the layer boundaries and the layers of the occluded

regions are undetermined. The problem we face here is

how to determine the layer boundaries and infer the oc-

clusion relationships. In order to do that, we will consider

the intensity structures of each frame, the relevant occlu-

sion region map (obtained by [15]) and the relevant

boundary map of the initial motion layers (obtained by

the subspace segmentation approach). Let us denote in-

tensity edge map as

, occlusion region map as O

and layer edge map as

thereafter. The motion of

intensity edges dominates that of their neighborhood. It is

straightforward to utilize the intensity structures of the

neighborhood of the edges for detecting the layer boun-

daries and inferring the occlusion relationship. The pro-

posed post-processing procedure given below is per-

formed over two successive frames, but evidence could

be accumulated over an image sequence for a more ro-

bust segmentation.

3.1. Construct Pending Areas

For each frame, we first determine some pending areas,

which should involve all potential layer boundaries. Then,

the detection of layer boundaries is carried out on the

resulting pending areas accordingly. To this end, we

place a set of windows w of size .nn

 along the edges

These small windows might be overlapped to

each other. Usually each window i

w is determined by

the O

and

without a fixed size, i.e. it is ex-

pected to be so large that the resulting set of windows

can cover the occlusion regions O

and layer edge

map

on the current frame. In our experiments, the

minimal size n of i

w is set to 10 pixels.

3.2. Match Scores

Consider the resulting pending areas i

Ww, which

contains many intensity edges

lM. The potential

layer boundaries are involved in .

in terms of the

assumption (4). Thus, for each window i

w, we can

compute the profile of every point p, which is defined as

a vector





pfp by sampling the intensity derivative in

the positive and negative directions of the intensity gra-

dient at p. This is illustrated in Figure 2. The point pro-

file is then normalized as,

 



pf p

pf ppf p

. (1)

According to the optical flow field, one can get a pair

of corresponding points p and p respectively on two

Possible layer boundaries Layer boundary

Apply GPCA to Motion Segmentation

frame 1 frame 2

p

l

Normal line

Refining & match score

Match scores

Figure 2. Illustration of point profiles and refining the

matching point. Refining procedure is carried out as 1D

search along the normal line on the frame 2.

successive frames. The match score is taken as the resi-

dual error of their profiles as follows,

 



()

pfp pfp

ep Exp





















where



is a reference distance and is determined em-

pirically. When the point p is far away from layer

boundaries,



should approach one, i.e. the

neighborhood of p obeys a single motion. Otherwise its

neighborhood contains multiple motions. Furthermore,

we can obtain other two match scores respectively along

either side profile of the current point p, denoted as

 

,epep

. If point p belongs to a layer boundary, one

of these two scores should approach one while the other

should approach zero, otherwise both of them should

approach one.

3.3. Matching

Because of the aperture problem, the motion of edges can

only be determined in the direction normal to the edge.

This means that the corresponding point p of the next

frame (1i) lies on the normal line, which is the normal

at point p on the current frame (i). This is useful as it

restricts matching isointensity contour on the frame (1i



)

along the edge normal. In order to enhance the intensity

edge matching, we add a new match score that is the re-

sidual error



 of the profiles of p and p



along the edge tangent line, which is shown in Figure 2.

Refining the matching point p



on the next frame (1i



)

is thus implemented as a 1D search based on the match

score of

 



epe p



 along the direction of point

p’s gradient (i.e. the normal line) instead of point p



’s

gradient, which is also illustrated in Figure 2. After that,

one can re-compute the match scores 012

,,eee of the

points p and p in terms of their individual intensity

gradients rather than the normal line.

3.4. Segmentation by Polysegment Alg.

Based on the match scores 012

,,eee in the pending areas

W, we apply the Polysegment algorithm as described in

the Appendix respectively to the match scores of

012

,,eee for the layer edge detection. There are two

groups here, one is the group of layer edge points and the

other is that of non-layer edge points. For each match

score, we can thus get two cluster centers





()() ()

,,0,1,2

iii



. Moreover, to the points



, there are eight cluster centers. The layer edge

points should cluster around the two centers of





(0) (1)(2)

1min,min, maxcent





and





(0)(1) (2)

2min,max, mincent



.

The segmentation of W is obtained as follows,



1, ,8

arg minj

ie pcent





, (2)

where













012

,,epeee p. On this basis, the points of

W can be classified into two groups, layer edge points

and non-layer edge points.

3.5. Region Merging

For the group of non-layer edge points, one can merge

most of small spurious regions in a big motion layer, i.e.

merging small regions with a motion layer by comparing

their areas with their individual neighbors’. This can lead

to the connected layer. But it can be observed that the

detected layer boundaries usually have discontinuities

with the group of layer edge points, i.e. a set of layer

edge segments. This is due to the fact that some layer

edge points are incorrectly classified into the group of

non-layer edge points. Based on the areas of the seg-

mented layer regions, it is impossible to make a correct

decision of region merging when these small regions

may contain layer edge segments. This is because the

layer edge segments indicate that the both sides should

respectively occupy different motion layers and could not

be merged into a single layer at anytime. These regions

are thus left as the undetermined regions temporarily.

On the other hand, a connected layer has a continuous

boundary

. These layer edge segments only prune

the region of the layer, but do not form new closed re-

gions within the layer. In our experiments, we simply

replace some parts of

with the new layer edge

segments according to the nearest neighbor criterion.

Then, the area comparison strategy is employed to those

undetermined regions nearby the layer boundary for re-

gion merging. A layer edge segment separates one region

into two motion layers. When merging two or more un-

determined regions which share a layer edge segment,

Apply GPCA to Motion Segmentation

the merging procedure should be terminated.

3.6. Depth Ordering

After region merging, one can obtain the desired bounda-

ries of motion layers

. If the occluded regions be-

long to a motion layer, this layer must be behind another

one. Our problem can now be rephrased as HOW to as-

sign the occluded regions to the known motion layers.

With the occlusion region map O

, one can first de-

termine which points of the layer boundaries belong to

the occluding edges, since the layer boundaries involve

the occluding edges. The worst case is that the points of

the occluding edges are not within O

. But both side

profiles of these points should overlap with O

at that

moment. On this basis, one can determine the points of

the occluding edges by checking if they are within O

or their profiles overlap with O

. Since some points of

layer boundaries may not belong to the occluding edges,

such as in Figure 1(b), the depth ordering can only carry

out on the detected occluding edges. Then, for the points

of the occluding edges, one can extend their profiles in

the direction of the intensity gradient to the known mo-

tion layers for their profile labeling, i.e. inferring which

layers both sides of the occluding edges respectively be-

long to.

Furthermore, inferring the occlusion relationship can

be fulfilled by comparing the match scores

 

,epep of each point p of the occluding edges.

This is because an occluded region only shares the same

motion layer with one of the profiles of an occluding

edge. The smaller match score corresponds to the real

occlusion region. This implies that one side of an oc-

cluding edge with the smaller match score is behind the

other side, since it involves the occlusion region. In terms

of the profile labeling of the occluding edge points, we

can therefore find ordinal depth.

The Post-processing procedure is summarized as

follows:

1) Extracting the pending areas W on each frame;

2) Refining the corresponding points p on the next

frame (1i), and then re-compute the match scores

 

012

,,epepep;

3) Applying the polysegment algorithm to the match

scores of W for detecting the points of layer boundaries;

4) Merging the spurious regions for the continuous

boundaries of the motion layers;

5) Determining the occluding edges in terms of O

;

6) Extending the profiles of the occluding edge points

to the known motion layers for the profile labeling;

7) Comparing the match scores

 

,epep of the

occluding edge points p for depth ordering, i.e.

 



min ,epep corresponds to the occluded region.

This post-processing procedure and the subspace seg-

mentation approach described in section 2 constitute a

complete algorithm of motion segmentation. Note that in

our algorithm, the estimation of all the motion models in

a scene is undertaken at the first procedure (i.e. subspace

segmentation method), and the detection of the layer

boundaries and depth ordering are carried out at the

second procedure (i.e. post-processing procedure). This

is different from the previous approaches. Usually the

motion model estimation was mixed with the later

processing. This makes the algorithms complicated and

the implementation difficult.

4. Experiments and Analysis

Our algorithm was tested on several image sequences. In

this section, the two results of the “flower garden” and

“Susie calling” are presented. All programs have been

implemented on the MatLab platform using a publicly

available package—GPCA-PDA [16] and the optical

flow code in [15]. All the image sequences used in our

experiments are available at [17].

4.1. Flower Garden

In this experiment, we applied our motion segmentation

approach to the flower garden sequence of resolution

175×120 pixels. The tree trunk in front of a garden is

taken by a camera undergoing translation from the left to

the right. Our goal is to determine the boundaries of the

motion layers, and find the layer order over two succes-

sive frames. Our approach found out two motion layers,

the tree trunk and garden background.

Figure 3 gives the segmentation of the affine model

using the subspace segmentation approach described in

section 2. It can be noted that the occlusion regions from

the optical flow fields are crude, and contain many spu-

rious small regions. The red arrows illustrate the possible

occlusion regions in the successive frame 1 and 2. We

show the results of motion segmentation of frame 1 in

Figure 3(1)-(3). Note that the occlusion regions in Fig-

ure 3(2) and (3), which are not the layer boundaries, are

the interim areas between the foreground and background.

It is impossible to determine the depth ordering using the

obtained motion layers before determining the layers of

the occluded regions. In addition, the resulting layer

boundaries are also unreliable. Similar to the occluded

regions, there are many small and isolated spurious re-

gions on the obtained layers. We need to refine the layer

boundaries and find out the layer order.

Figure 4 gives the segmentation results of the sub-

space segmentation approach followed by the post-

processing procedure described in section 3. The boun-

daries of motion layers can go across the occluded re-

gions and converge to the desired locations. But we can

also note that a patch of ground is classified as the fore-

Apply GPCA to Motion Segmentation

frame 1 frame 2

(1). Occlusion regions (2). (3).

Figure 3. Motion segmentation results by the subspace segmentation approach only. (1) The occlusion regions from the opti-

cal flow field betw een frame 1 and 2; (2) and (3) are the segmentation results only using the subspace segmentation approach

described in section 2.

(1). Foreground (2). Background

(3). Intensity edge map (4). Occluding edges (red)

Figure 4. Refined and layered motion segmentation. (1) and (2) are the segmentation results; (3) the intensity edges of the

image; (4) occluded region and occluding edges.

ground as shown in Figure 4(1). This is due to the fact

that the motion variance of this patch between the suc-

cessive frames 1 and 2 is close to that of the “tree”, away

from the background. If going through over multiple

frames, the motion of this patch should be distinguished

from the “tree”, since the motion of the ground is prone

to be modeled by a single affine model. The intensity

edge map of frame 1 is obtained by the Canny edge de-

tector, and also shown in Figure 4(3). It can be observed

that the boundaries of motion layers are involved in the

intensity edge map, e.g. the red arrows illustrate the cor-

responding edges between the layer boundaries and the

intensity edges of the image. Moreover, we also show the

occlusion edges (red) in Figure 4(4). Partial occluding

edges are not involved in the initial occlusion regions.

But, the profiles of these edge points overlap with the

occlusion regions. These points can thus be joined with

the occluding edges. Additionally, it can be noted that the

layer boundaries involve the occluding edges, but the

layer boundaries are not always the occluding edges.

Locating the occluding edges can help us find the depth

ordering.

4.2. Susie Calling

This sequence presents a hand holding a phone while the

head is rising slightly. The image resolution is 170120

Apply GPCA to Motion Segmentation

frame 1 frame 2 (1). Background (2). Phone regions

(3). Head region (4). Background (5). Phone regions (6). Head region

(7). Occluded region

Figure 5. The segmentation of the Susie calling. (1)-(3) are the results of the subspace segmentation method; (4)-(6) are the

results of the post-processing proc e dur e; (7) the layer boundary goes thro ugh the occluded regions.

pixels. It can be observed that the region of the phone is

enwrapped by the head region. Our segmentation appro-

ach aims at separating the phone region from the head

region. The segmentation results are shown in Figure 5.

The region of phone is in front of the regions of head and

background. The background region is behind the head

region.

Due to the rich texture of the hair, the segmented head

region contains many small holes, particularly in the hair

area. It is difficult to determine the boundaries of the hair.

For example, in Figure 5(2), a patch of hair image is

incorrectly classified into the group of the phone region.

The post-processing could not merge this patch into the

head region either, as shown in Figure 5(4)-(6). This is

because the detected layer edge segments goes through

the occluded regions as shown in Figure 5(7). Moreover,

it has been judged that the regions of the phone and the

hair patch are in front of the head region. This seems to

be a bit strange. In general, the hair patch belongs to the

head region. All the hair should be regarded as a whole

body on the head and there is no occlusion to each other

(unless the hairlines are considered). But it can be ob-

served that the occluded regions overlying on the layer

boundary appear at the bottom right of the image, i.e.

around the boundaries between the shoulder and the hair.

The motion of the hair is independent of that of the

shoulder. The shoulder region is classified into the head

region. Thus, it is acceptable to preserve this patch as an

independent layer as shown in Figure 5(5).

5. Conclusions

In this paper, we proposed a novel approach for motion

segmentation based on the subspace segmentation tech-

niques. The novelty is that by incorporating the intensity

structures of images, our proposed approach can effec-

tively detect the motion layer boundaries and the depth

ordering. Different from the previous motion segmenta-

tion approaches, our approach provides a non-iterative

and global solution to motion segmentation under a uni-

fied algebra framework, i.e. the generalized PCA [2,18].

However, it can be noted that our algorithm relies on a

given optical flow field. In our experiments, many avail-

able optical flow algorithms do not seem suitable for the

scenarios with a salient rotation element. This will re-

strict the applications of our algorithm. It is crucial to

further develop a robust optical flow algorithm. Our fu-

ture work will aim to tackle this challenge.

REFERENCES

[1] R. Vidal and Y. Ma, “A Unified Algebraic Approach to

2-D and 3-D Motion Segmentation,” Journal of Mathe-

matical Imaging and Vision, Vol. 25, No. 3, 2006, pp.

403-421. doi:10.1007/s10851-006-8286-z

[2] R. Vidal, Y. Ma and S. Sastry, “Generalized Principal

Component Analysis (GPCA),” IEEE Transactions on

Pattern Analysis and Machine Intelligence, Vol. 27, No.

12, 2005, pp. 1-15. doi:10.1109/TPAMI.2005.244

[3] J. Y. A. Wang and E. H. Adelson, “Layered Representa-

tion for Motion Analysis,” IEEE Conference on Comput-

er Vision and Pattern Recognition, New York, 15-17

June 1993, pp. 361-366. doi:10.1109/CVPR.1993.341105

[4] R. Szeliski, S. Avidan and P. Anandan, “Layer Extraction

from Multiple Images Containing Reflections and Trans-

Apply GPCA to Motion Segmentation

parency,” IEEE Conference on Computer Vision and

Pattern Recognition, Hilton Head, 13-15 June 2000, pp.

246-253.

[5] M. J. Black and D. J. Fleet, “Probabilistic Detection and

Tracking of Motion Boundaries,” International Journal of

Computer Vision, Vol. 38, No. 3, 2000, pp. 231-245.

doi:10.1023/A:1008195307933

[6] P. Smith, T. Drummond and R. Cipolla, “Layered Motion

Segmentation and Depth Ordering by Tracking Edges,”

IEEE Transactions on Pattern Analysis and Machine In-

telligence, Vol. 26, No. 4, 2004, pp. 479-494.

doi:10.1109/TPAMI.2004.1265863

[7] A. S. Ogale, C. Fermuller and Y. Aloimonos, “Motion

Segmentation Using Occlusions,” IEEE Transactions on

Pattern Analysis and Machine Intelligence, Vol. 27, No.

6, 2005, pp. 988-992. doi:10.1109/TPAMI.2005.123

[8] D. J. Fleet, M. J. Black, Y. Yacoob and A. D. Jepson,

“Design and Use of Linear Models for Image Motion

Analysis,” International Journal Computer Vision, Vol.

36, No. 3, 2000, pp. 171-193. doi:10.1023/A:100815620

2475

[9] W. Yu, G. Sommer and K. Daniilidis, “Multiple Motion

Analysis: In Spatial or in Spectral Domain,” Computer

Vision and Image Understanding, Vol. 90, No. 2, 2003,

pp. 129-152. doi:10.1016/S1077-3142(03)00011-0

[10] M. P. Kumar, P. H. S. Torr and A. Zisserma, “Learning

Layered Motion Segmentation of Video,” Proceedings of

the 10th IEEE International Conference on Computer Vi-

sion, Beijing, 17-20 October 2005, pp. 33-40. doi:10.110

9/ICCV.2005.138

[11] M. Irani, P. Anandan, J. Bergen, R. Kumar and S. Hsu,

“Efficient Representations of Video Sequences and Their

Representations,” Signal Processing: Image Communica-

tion, Vol. 8, No. 4, 1996, pp. 327-351. doi:10.1016/0923-

5965(95)00055-0

[12] M. Irani, B. Rousso and S. Peleg, “Computing Occluding

and Transparent Motions,” International Journal of

Computer Vision, Vol. 2, No. 1, 1994, pp. 5-16. doi:10.

1007/BF01420982

[13] G. Csurka and P. Bouthemy, “Direct Identification of

Moving Objects and Background from 2D Motion Mod-

els,” Proceedings of IEEE International Conference of

Computer Vision, Kerkyra, 20-27 September 1999, pp.

566-571. doi:10.1109/ICCV.1999.791274

[14] T. Papadimitriou and K. I. Diamantaras, et al., “Video

Scene Segmentation Using Spatial Contours and 3D Ro-

bust Motion Estimation,” IEEE Transactions on Circuits

and Systems for Video Technology, Vol. 14, No. 4, 2004,

pp. 485-497. doi:10.1109/TCSVT.2004.825562

[15] A. S. Ogale and Y. Aloimonos, “A Roadmap to the Inte-

gration of Early Visual Modules,” International Journal

of Computer Vision, Vol. 72, No. 1, 2007, pp. 9-25. doi:

10.1007/s11263-006-8890-9

[16] Generalized Principal Components Analysis matlab

codes available at http://perception.csl.uiuc.edu/gpca/

[17] Video sequences available at http://www.cipr.rpi.edu/

resource/sequences/

[18] R. Vidal, “Generalized Principal Component Analysis

(GPCA): An Algebraic Geometric Approach to Subspace

Clustering and Motion Segmentation,” Ph.D. Thesis,

Electrical Engineering and Computer Sciences, Universi-

ty of California at Berkeley, 2003.

Appendix

Segmenting Hyperplanes of Dimension K–1 in

Given a set of points



()

Xx R



 in a homoge-

neous coordinate system, and linear hyperplanes





 of dimension



dim 1

kSK, we

need to identify i

S. Usually the subspace is given as,

() ()

iiK

bx b R . Then, this hyperplane can be re-

presented as,



()

KTi

SxRxb .

Furthermore, an arbitrary point x lies on one of the

hyperplanes if and only if,



()

nTi

xSxS xb









()

nTi

px xb









,

where,





00 11000

121 211

,,, T

nn nm

KKKK

yxxxxxx xxxR



 and

cR is a coefficient vector consisting of a set of

monomials of ()

{}

b and



,1!!

mnK



. For a

given point set

, we have a linear system on n

c as

follows,

(1)

()

nn n

Lcc R











 (A1)

where,



. When the number of hyperplanes n

is known, n

c can be obtained from the null space of

L. In practice, n is always determined in terms of n

For a unique solution of the coefficient vector n

c, it is

expected that







rankLmnK



, which is a func-

Apply GPCA to Motion Segmentation

tion of variable n. In the presence of noise, let



rank Lr when ni and 1









, where

L is the data matrix with rank r,



is the jth sin-

gular value of i

L and  is a given threshold.

For any

, we have

 

px cyx. Each normal

vector ()i

b can be obtained from the derivatives of n

Consider the derivative of





px as follows

 

()()( )

nTiiT j

iji

pxxbb xb

xx 



 

 



For a point ()ll

S, () ()0

jT l





 for li. It can

be noted that there is only one non-zero term in





px,

i.e.



()()( )()0

lijTl

nji

pxbb x



 

 for li. Then, the

normal vector of i

S is yielded as,







()

bpx



 (A2)

In order to get a set of points lying on each hyperplane

respectively, so as to determine the corresponding nor-

mal vectors ()i

b, we can choose a point in the given X

close to one of the hyperplanes as follows,



()0

arg min

px n

ipx



, where

X (A3)

After given the normal vectors



()i

b, we can classify

the whole point set X into n hyperplanes in

R as fol-

lows,



()

arg min,1

labelx bjn (A4)

This algorithm is called GPCA-PDA Alg. in [2,18].

Polynomial Segmentation Algorithm

Consider a special case of piecewise constant data. Given

N data points



, we hope to segment them into an

unknown number of groups n. This implies that there

exist n unknown cluster centers 1n



, so that,









 ,

which can be described in a polynomial form as follows,



nnk

nik

px xcx





 



 (A5)

To N data points, we have,

Lc c





















(A6)

where (1)Nn





 and 1n



. Usually, the group

number is estimated as,





min: ii









 (A7)

where i



is the ith singular value of i

L, which is the

collection of the first 1i



columns of n

L, and  is a

given threshold that depends on the noise level.

After solving the coefficient vector c of Equation (A6),

we can compute the n roots of



px, which corres-

pond to the n cluster centers 1

{}



. Finally, the seg-

mentation of the date is obtained by,



1, ,

arg minj







 (A8)

The scheme of Equations (A5)-(A8) is called as the

Polysegment algorithm in [18].