Journal of Signal and Information Processing, 2011, 2, 205-210
doi:10.4236/jsip.2011.23028 Published Online August 2011 (
Copyright © 2011 SciRes. JSIP
Efficient Image Stitching in the Presence of
Dynamic Objects and Structure Misalignment
Chao Tao1, Hanqiu Sun2, Changcai Yang1, Jinwen Tian1
1The Institute for Pattern Recognition and Artificial Intelligence, Huazhong University of Science and Technology, Wuhan, China;
2The Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong (China).
Received March 9th, 2011; revised April 20th, 2011; accepted April 29th, 2011.
This paper presents a new method for simultaneously eliminating visual artifacts caused by moving objects and struc-
ture misalignment in image stitchin g. Given that the input images ar e roughly aligned, our approach is implemented in
two stages. In the first stage, we discover motions between input images, and then extract their corresponding regions
through a multi-seed based region growing algorithm. In the second stage, with prior information provided by the ex-
tracted regions, we perform a graph cut optimization in gradient-domain to determine which pixels to use from each
image to achieve seamless stitching. Our method is simple to implement and effective. The experimental results illus-
trate that the proposed approach can produce comparable or superior results in comparison with state-of-the-art me-
Keywords: Image Stitching, Motion Estimate, Region Growing, Graph Cut
1. Introduction
Image stitching refers to creating a high-resolu tion pano-
rama that seamlessly combines two or more images with
overlapping fields of view [1]. There are many good ex-
isting methods for creating pleasing panoramas. However,
they usually have a number of requirements to produce
satisfactory results: limited camera translation, limited
motion of objects in the scene and similar exposure set-
tings between images.
Generally, there are three problems which often occur
in the field of image stitching, namely exposure differ-
ence, structure misalignment and ghosting artifacts. In
this paper, we are concentrating with two of them, one is
ghosting effect caused by moving objects within a scene,
and the other is structure misalignment in the overlapped
region, which is usually due to inaccuracy of registration
methods, motion parallax and etc.
For the problem of de-ghosting, Uyttendaele etc pro-
pose to use regions of difference [2]. This technique
identifies dynamic objects by checking the source images
to see where pixels differ by more than a threshold, and
then decide which objects to keep and which ones to
erase using a weighted vertex cover algorithm. This al-
gorithm works well, but it may fail when images are not
prior well-registered and exposure correction. In addition,
it does not establish the correspondence for regions re-
lating to the same object. Thus, ambiguous situation may
occur when multiple moving objects exist in the scene.
Another technique with excellent results is proposed by
Agarwala [3]. Its main idea is to place a seam along the
edges of objects in the picture, and then pick pixels from
one photo or another based on which side of the seam
they fall. Similar works have also been done in [4,5].
However, these techniques require the user manually
labels all regions of moving objects, therefore is not
suitable for our goal of a fully automated solution.
Several methods have been proposed to alleviate the
structure misalignment problem in image stitching. The
most famous method in this field could be using optimal
seam [6]. These methods first compute the color dif- fe-
rence in the overlapped area between the two input im-
ages. Alternative way includes computing the differ-
ence in gradient or texture feature domain [7,8]. Then the
task of finding th e optimal boundary is formulated as the
minimization of an energy function, which is usually
solved by graph cut [9]. More sophisticated approach can
be found in [10], which is based on structure deformation
and propagation. Moreover, this method can simultane-
ously achieve structure consistency as well as color cor-
rection within the same framework.
Efficient Image Stitching in the Presence of Dynamic Objects and Structure Misalignment
After extensively reviewing the previou s work on mis-
alignment correction and de-ghosting, we can find that
existing approaches have its own advantages as well as
disadvantages. Moreover, we did not find any work in
the literature, which can simultaneously deals with both
of them. In this paper, we propose a novel technique to
address them together. After detecting feature corre-
spondence between two images, the first step is to dis-
cover motions by interactively applying Random Sample
Consensus (RANSAC) [11] in a divide and conquer
manner. Once motions are found, we take feature pixels
belonging to them as seed points, and use a region grow-
ing method to roughly extract moving objects. However,
to completely remove ghosting artifacts, we have to ac-
curately determine which regions in the input images are
not static. To this end, we formulate it as a labeling
problem, and remove ghosting artifacts and structure
misalignment together via graph cut. The paper is or-
ganized as follows. Section 2 gives the detail of the pro-
posed algorithm. The experimentation results are pro-
vided in Section 3, followed by conclusions in Section 4.
2. Our Image Stitching Algorithm
For clarity, in this paper, we consider the basic case of
stitching two roughly aligned images. Our algorithm is
implemented in two stages. In the first stage, we discover
the motions between two input images, and then extract
their corresponding regions. With prior information pro-
vided by the extracted regions, the second stage is to find
an optimal seam, which can simultaneously eliminate
visual artifacts caused by the moving objects and struc-
ture misalignment. We formulate the task as a labeling
problem, and solve it by graph cut. The following sec-
tions describe the details of the algorithm.
2.1. Motion Discovery
Let us denote the two images to be stitched as S
, and assume that feature correspondence between
them has been already established through Scale-invari-
ant feature transform (SIFT) matching [12]. To extract
motions between them, we cluster the correspondence by
interactively applying RANSAC in a divide and conquer
manner. More specifically, we first select the homogra-
phy with largest no. of coincident key points. Then we
remove these points and apply RANSAC again in the
remaining points until the no. of points to be matched is
below some threshold.
The above process is similar to the work of [13] and
[14]. Different from them, before implementing RANSAC,
we remove mismatches by our previous proposed method
[15]. The reason is, from our experiment, we found that
the stability of result is very poor if there exists lots of
mismatches in the correspondences, especially when we
try to find multiple motion models in them. The output of
this stage is a set of motions, in which th e one with larg-
est no. of coincident image points corresponds to the
background (the global background motion), and the
others correspond to the moving objects (the local mo-
tion). Figure 1 shows an example scene, where two mo-
tions are detected by fitting two homography transfor ma-
tions to the feature matches. One for the background
(Figure 1(a)), the other for the chair (Figure 1(b)).
2.2. Locating Moving Region
After finding a set of motions, the next step is to find
their corresponding regions, so as to using pixel values
from only one of the contributing images for them to
eliminate ghosting artifacts. This task can be solved by
multiple-seeds based region growing algorithm [16].
Formally, assuming there are N local motions found in
the last step and each local motion i is defined by a
homography transformation i
with associated feature
y, where 1. Similarly,
the global background motion iN
m is denoted by a ho-
mography transformation
with associated feature
y. Next, for each i, we take m
y as seed points, and gradually grow them by
including neighboring pixel , which obey one of the
following criteria: p
Criteria 1: Motion cues
We assume that the background is dominant in the
scene. Based on this assumption, motion cues are defined
as the discrepancy between the local motion and the
(a) (b)
Figure 1. Fitting motion models using SIFT matches. In this
scene, the camera is translating and the chair moves. We
detect both motions by fitting two homography transforma-
tions to the feature matches. One for the background (Fig-
ure 1(a)), the other for the chair (Figure 1(b)).
Copyright © 2011 SciRes. JSIP
Efficient Image Stitching in the Presence of Dynamic Objects and Structure Misalignment207
global background motion, which is written as:
 
 
pIfp IpIfp (1)
p and
p are mapping functions gen-
erated by location motion i and global motion m
which map the pixel in S
, to the
fp and
g in
respectively. Their definitions are as fol-
fp pi
H (2)
fp pHg
Criteria 2: Color similarity
() ()
IpI seed
Here, 1
stands for the intensity threshold.
The advantage of using both cues is two fold: on one
hand, motion cues can effectively determine the neigh-
boring pixels which have the consistent motion with the
seed. On the other hand, the color similarity ensures the
extracted region smooth and complete, in case using ob-
ject motion information alone can only detect a part of
the moving object. We run the region growing algorithm
twice. The first time is to generate a set of region
RR R in S
. Similarly, the second time is to
generate a set of
RR R in T
with the in-
verse motion filed relating T
and S
. Accordingly,
the applied region growing criteria are changed to (5) and
(6) as follows:
 
 
ppIIf IpIf
 (6)
where i
. To conclude,
the result of this stage is a set of region pair
RRR , where and
RR are related to the
regions which are consistent with motion in two
images. i
2.3. Optimal Seam Selection
With the information provided by the extracted regions,
we can now create a seam which is able to eliminate
structure inconsistence between images as well as being
avoided passing through moving objects. To do so, we
formulate it as a labeling problem, and solve it by graph
cut. In the following,
represent the final composite
image with the overlapped region .
denotes the
label for every pixel
pI, which is assigned either 0
or 1. If
= 0, the value pixel p comes form image
otherwise, it comes from t
In order to make the final composite image
like as
if the image were captured without the moving objects in
the scene, we use information from only one image for
each extracted region pair
RR , and ignore corre-
sponding information in the other images. In other words,
we preassign the same label
for pixels in the
R as follows:
if RR
Next we consider the label assignment for the remain-
ing pixels in
, and define a objective f unctio n,
as the sum of two terms: a data term over all pixel
i and an interaction term ,pq
V over all pairs of neigh-
boring pixels :
,) ,,,
dp pq
V pfpqffEf
where the data term encourages transitions between the
extracted regions and their nearby pixels to be natural
and seamless. We use the following cost function to ex-
press this desired pr o perty :
is the Gaussian scale and is set to 1 in the
f is the distance between the pixel
and its nearest region that is pre-assigned with label
. This can be calculated by distance transformation
The interaction term takes gradient smoothness into
account and penalizes pixel dissimilarity in the gradient
domain, which makes the seam favor smooth area in
and in turn reduces the structure complexity along the
seam. Specifically, we define it as follows [10]:
 
,, ,
1, ,
Spqiff f
if ff
 
 (11)
 
Spq Ipp
 (12)
where m and d are two costs measuring the gradi-
ent smoothness and similarity between the neighboring
pixels and q.
denotes the norm of the gradi-
ent for each pixel.
is a weight used to balance the
relative influence of the two costs, which is set to 0.3 in
our experiments. We use the graph cut algorithm to find
a labeling to minimize the objective function.
Copyright © 2011 SciRes. JSIP
Efficient Image Stitching in the Presence of Dynamic Objects and Structure Misalignment
3. Experiment Results
Having already illustrated the proposed image stitching
algorithm by an example, we proceed by further demon-
strating the performance of our method in two examples
captured in different conditions. Comparison with other
methods using our implementation is also given.
3.1. Result Analysis
We first show a simple case, where only one moving
object exists in the scene. The two sources images, as
shown in Figures 2(a) and (b) are provided as input. Fig-
ure 2(c) is the initial alignment, where the visual artifact
(indicated in the red box) is obvious because of moving
objects and inaccuracy registration. Figures 2(d) and (e)
show the result of Auto Stitching [18] and Panorama
Maker [19], respectively. As can be seen, although the
structure misalignment is alleviated, it does not help in
solve the proble m of ghost effect. Figure 2(f) is obtained
by our method, in which only one instance of the moving
object is kept in the final composite image and structures
are properly aligned. This demonstrates that our method
can handle moving object and structure misalignment
within the same framework.
(a) (b)
(c) (d)
(e) (f)
Figure 2. (a) and (b) are the two registered images. (c) The
initial alignment. (d) Auto stitching (e) Panorama factory (f)
our method.
Next, we apply our algorithm to a more complicated
example. This scene is challenging because it contains
multiple moving objects, and also exists occlusion be-
tween them. Figure 3 shows the process. Figure 3(a)
and (b) are the original images. Our algorithm first de-
tects motions between images, and then roughly extracts
their corresponding region (one for the red box (Figure
3(c)), and the other for the bag (Figure 3(d)). After that,
by taking gradient similarity and transition smoothness
into account, we obtain the final panoramic image by
selecting an optimal seam in an intelligent manner (indi-
cated by a red curve in Figure 4(c)). As can been seen,
our seam favors smooth area in the overlapped region
and being avoided passing through moving object. Thus,
the final composite image is pleasing and structure con-
sistent, while the artifacts caused by moving objects and
misregistration still exist in AutoStitching (Figure 4(a))
and Panorama Factory (Figure 4(c)).
3.2. Computation Times
We finally discuss the computation time needed for our
(a) (b)
(c) (d)
Figure 3. (a) and (b) are the two registered images. (c) (d)
Detected two moti ons, one for the red box, the other for the
Copyright © 2011 SciRes. JSIP
Efficient Image Stitching in the Presence of Dynamic Objects and Structure Misalignment209
Figure 4. (a) AutoStitching (b) Panorama Factory (c) Out
method. To give an idea for the possible reader, we con-
sider the Figures 2(a) and (b) (640 × 480 pixels) as a
benchmark. In reporting this result, we use a PC with an
Intel Core(TM)2 Duo processor with a 2.4 GHz clock
speed 1 GB RAM, and use matlab as our coding platform
to perform the algorithm.
We tabulate the computation times for each step in
Table 1. As can be seen, the total time needed for image
stitching is 18.65 s. In the same setting, the time needed
for AutoStitching and Panorama Factory was 12.16 s and
19.1 s, respectively. Therefore, the computation time of
our method is acceptable.
4. Conclusions
A novel technique has been presented to achieve seam-
less image stitching without producing visual artifacts
caused by moving objects and structure misalignment.
The proposed method includes two major components: 1)
Table 1. Computation time for each step of the proposed
Step Computation time (second)
Motion discovery 12.52
Locating the moving region 1.78
Optical seam selection 4.35
Total time 18.65
Motion discovery 2) a graph cut based optimization
framework for seamless stitching. We create data cost to
ensure that transition between moving objects and their
nearby pixels to be natural and seamless, and smooth
cost to encourage the seam favor smooth area. Thus,
moving object removal and structure correction are si-
multaneously achieved within the same framework.
There are also some minor limitations for our method.
First, our framework relies heavily on feature matches to
extract every independent motion in the scene. Thus, it
may fail when correspondence between motions are not
detected. Second, our framework can not handle the ex-
posure difference, which is another challenge problem in
the filed of image stitching. Therefore, future work aims
at making our framework robust to these phenomena, and
also extending it to videos and multiple images.
5. Acknowledgements
The wo rk was supported by the UGC grants for research
(nos. 2050423, 2050454).
[1] R. Szeliski, “Image Alignment and Stitching: A Tutorial,”
Microsoft Research, Technical Report MSR-TR-2004-92,
[2] M. Uyttendaele, A. Eden and R. Szeliski, “Eliminating
Ghosting and Exposure Artifacts in Image Mosaics,”
Proceedings of the 2001 IEEE Computer Society Confer-
ence on Computer Vis io n a nd P attern Recognitio n, Kauai,
8-14 December 2001, pp. 509-516.
[3] A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A.
Colburn, B. Curless, D. Salesin a nd M. Cohen, “Interactive
Digital Photomontage,” Proceedings of ACM SIGGR-
APH’04, Vol. 23, No. 3, 2004, pp. 294-302.
[4] A. Eden, M. Uyttendaele and R. Szeliski, “Seamless Im-
age Stitching of Scenes with Large Motions and Exposure
Differences,” 20 06 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, 2006, pp.
[5] A. Mills and G. Dudek, “Image Stitching with Dynamic
Elements,” Image and Vision Computing, Vol. 27, No. 10,
2009, pp. 1593-1602. doi:10.1016/j.imavis.2009.03.004
[6] A. A. Efros and W. T. Freeman, “Image Quilting for
Copyright © 2011 SciRes. JSIP
Efficient Image Stitching in the Presence of Dynamic Objects and Structure Misalignment
Copyright © 2011 SciRes. JSIP
Texture Synthesis and Transfer,” Proceedings of ACM
SIGGRAPH’01, 2001, pp. 341-346.
[7] V. Kwatra, A. Schodl, I. Essa, G. Turk and A. Bobick,
“Graphcut Textures: Image and Video Synthesis Using
Graph Cuts,” Proceedings of ACM SIGGRAPH’03, Vol.
22, No. 3, 2003, pp. 277-286.
[8] A. Levin, A. Zomet, S. Peleg and Y. Weiss, “Seamless
Image Stitching in the Gradient Domain,” Proceedings of
European Conference on Computer Vision, 2004.
[9] Y. Boykov, O. Versker and R. Zabih, “Fast Approximate
Energy Minimization via Graph Cuts,” IEEE Transactions
on Pattern Ana lysis and Machin e Intelligence , Vol. 23, No.
11, 2001, pp. 1222-1329. doi:10.1109/34.969114
[10] J. Y. Jia and C. K. Tang, “Image Stitching Using Structure
Deformation,” I EEE Transactions on Pattern Analysi s and
Machine Intelligence, Vol. 30, No. 4, 2008, pp. 617-631.
[11] M. A. Fischler and R. C. Bolles, “Random Sample Con-
sensus: A Paradi gm for Model Fitting with Appli cations to
Image Analysis and Automated Cartography,” Commu-
nications of the ACM, Vol. 24, No. 6, 1981, pp. 381-395.
[12] D. G. Lowe, “Distinctive Image Features from Scale-In-
variant Keypoints,” Inte rnational Journal of Computer Vi-
sion, Vol. 60, No. 2, 2004, pp. 91-110.
[13] J. Wills, S. Agarwal and S. Belongie, “What Went
Where,” Proceedings of IEEE Conference on Computer
Vision and Pattern Recognition, 2003, pp. 37-44.
[14] P. Bhat, K. C. Zheng, N. Snavely, A. Agarwala, M.
Agrawala, M. Cohen and B. Curless, “Piecewise Image
Registration in the Presence of Multiple Large Motions,”
Proceedings of IEEE Conference on Computer Vision and
Pattern Recognition, 2006, pp. 2491-2497.
[15] C. Tao, Y. H. Tan, Y. T. Wang and J. W. Tian, “Discard
Wide-Baseline Mismatch Using Contour Fragments,”
Electronic Let te rs, Vol. 46, No. 12, 2010, pp. 834-835.
[16] R. Adams and L. Bischof, “Seeded Region Growing,”
IEEE Transactions on Pattern Analysis and Machine In-
telligence, Vol. 16, No. 6, 1994, pp. 641-654.
[17] G. Borgefors, “Distance Transformations in Digital Im-
ages,” Computer Visi on, Graphics, and Imag e Proce ssing,
Vol. 34, No. 3, 1986, pp. 344-371.
[18] M. Brown and D. G. Lowe, “Automatic Panoramic Image
Stitching Using Invariant Features,” International Journal
of Computer Vision, Vol. 74, No. 1, 2007, pp. 59-73.
[19] The Panorama Factory, 2011.