Journal of Signal and Information Processing, 2013, 4, 80-85
doi:10.4236/jsip.2013.43B014 Published Online August 2013 (http://www.scirp.org/journal/jsip)
Computer Tomography and Ultrasonography Image
Registration Based on the Cooperation of GPU and CPU
Ying-Chih Lin1, Chien-Liang Huang1, Chin-Sh eng Chen1, Wen-Chung Chang2, Yu-Jen Chen3,
Chia-Yuan Liu4
1Graduate Institute of Automation Technology, National Taipei University of Technology; 2Department of Electrical Engineering,
National Taipei University of Technology; 3Department of Radiation Oncology, Mackay Memorial Hospital; 4Department of Internal
Medicine, Mackay Memorial Hospital.
Email: t100618014@ntut.edu.tw, t97669026@ntut.edu.tw, saint@ntut.edu.tw, wchang@ee.ntut.edu.tw, chenmdphd@gmail.com,
liu.chiayuan@gmail.com
Received April, 2013.
ABSTRACT
Image registration is wildly used in the biomedical image, but there are too many textures and noises in the biomedical
image to get a precise image registration. In order to get the excellent registration performance, it needs more complex
image processing, and it will spend expensive computation cost. For the real time issue, this paper proposes edge gra-
dient direction image registration applied to Computer Tomography (CT) image and Ultrasonography (US) image based
on the cooperation of Graphic Processor Unit (GPU) and Central Processor Unit (CPU). GPU can significantly reduce
the computation time. First, the CT image slice is extracted from the CT volume by the region growing and the interpo-
lation algorithm. Secondly, the image pre-processing is employed to reduce the image noises and enhance the image
features. There are two kinds of the image pre-processing algorithms invoked in this paper: 1) median filtering and 2)
anisotropic diffusion. Last but not least, the image edge gradient information is obtained by Canny operator, and the
similarity measurement based on gradient direction is employed to evaluate the similarity between the CT and the US
images. The experimental results show that the proposed architecture can distinctively improve the efficiency and are
more suitably applied to the real world.
Keywords: Image Registration; Graphic Processor; Computer Tomography; Ultrasonography; Canny Operator
1. Introduction
Image registration is wildly used in the biomedical image.
The image registration between computed tomography
(CT) and ultrasonography (US) has benefits for a variety
of clinical application. Especially, the key challenge in
surgery of abdominal organ is the registration of
pre-operative planning and intra-operative navigation,
because the CT and US images of abdominal organ im-
ages are obtained from the same patient but at different
times and with different imaging modality. Due to the
speckle noise and distortion in US images, accurate reg-
istration of CT and US is still a challenging problem.
There are two categories of image registration: 1)
area-based and 2) feature-based. In the area-based me-
thod, the Normalized Cross Correlation (NCC) is the
most popular one. NCC is used to evaluate the similarity
between the template image and the scene image. Be-
sides, NCC often calculates the similarity pixel by pixel
to deal with the image registration problems, which will
increase the computation cost. On the contrary, the fea-
ture-based method is proposed to reduce the computation
cost. Kown et al [1] proposed a robust hierarchical dis-
tance to compare edge maps in a multi-level pyramid
structure. Steger [2] presented the edge gradient direction
as similarity measure to partially recognize occluded
objects which is very robust against illumination frustra-
tion. The similarity measure in this literature uses the
inner product of normalized edge gradients to obtain the
similarity value.
For the real time issue, the data size of the CT and US
images are abundant, so the area-based and feature-based
methods may work well in biomedical image registration
applications. However, the computation request is not
efficient in biomedical image registration problems.
Furthermore, CPU is not suitable to process large
amount of data. The parallel processor architecture is
proposed to call GPU. It is proposed to deal with the
huge amount data. The performance of GPU is more ef-
ficient to dispose in biomedical image registration appli-
cation. Thus, the cooperation of GPU and CPU architec-
ture can increase the efficiency. In this paper, CT and US
Copyright © 2013 SciRes. JSIP
Computer Tomography and Ultrasonography Image Registration Based on the Cooperation of GPU and CPU 81
image registration based on the hybrid architecture of
GPU and CPU is proposed.
The signal to noise ratio is different because imaging
principle of CT and US technique are different. In this
study, the image pre-processing for US image includes
median filter and anisotropic diffusion algorithm are used
to reduce the noises and enhance the image quality.
The contributions of this paper include: 1) using the
hybrid architecture of GPU and CPU to save the compu-
tation time, 2) applying the anisotropic diffusion algo-
rithm to reduce the noise and keep the feature of image
edge, 3) defining two kinds of the indices, Fiducial regis-
tration error (FRE) and target registration error(TRE) to
verify the accuracy of image registration.
The rest of the paper is organized as follows: Section 2
provides architecture of proposed image registration.
Section 3~6 shows the detail of each procedure in the
proposed algorithm. Section 7 discusses the experimental
results of the proposed algorithm. Section 8 discusses
analysis of registration error. Finally, conclusions are
presented in Section 9.
2. Architecture of the Proposed Image
Registration Algorithm
The architecture of the proposed algorithm is shown in
Figure 1. There are two phases in this architecture: 1)
training phase and 2) matching phase.
Figure 1. Architecture of the proposed algorithm.
In the training phase, we used CPU to deal with region
growing, slice and canny algorithm. In this study, CT
data is three-dimensional information and the specified
organ, liver organ, from a set of CT images which had
been extracted by using the region growing method.
Through extraction on the specified organ image, there is
a virtual slice that we need. Consequently, the Canny
edge detection is used to extract the edge gradient. Those
gradients information will be fed into the matching phase,
and gradients information which had been extracted from
CT image will be a template in the matching phase.
In the matching phase, we used GPU to deal with me-
dian filter, anisotropic diffusion and registration algo-
rithm. The two preprocessing operators, median filter and
anisotropic diffusion are used to enhance the feature and
reduce the noise on the US image. Sobel edge detection
is employed to get the gradients US image. Finally, the
similarity is calculated by the gradients of template and
search image.
3. Image Pre-Processing
3.1. Median Filter
Median filter is often used to reduce the impulse noise on
an image. Since there is the speckle noise in US images,
it can be mostly improved through the median filter.
3.2. Anisotropic Diffusion
The anisotropic diffusion algorithm [3] is adapted to en-
hance the images and remove noise. Anisotropic diffu-
sion process image based on a linear diffusion from prin-
ciple of heat conduction is given as
(,,)
[(,,)(,,)(,,)]
Ixyt
t
divCxytIxytCxyt ICI
 
(1)
where div is the divergence operator, and Δ are
respectively the gradient and Laplacian operators.
The linear diffusion can be discretely implemented
using eight-neighbors, the formula is shown in
(, ,1)
8
1
(, ,)(|(,,)|)(, ,)
81
Ixyt
I
xytC IxytIxyt
ii i
i
 
(2)
where represent the gradients of eight
neighbors, C is diffusion coefficient, the diffusion coeffi-
cient function is represented as
(,,) 18Ixyti
i

1
)
2
1( )
(II
k
C
(3)
The parameter k is a constant; it acts as an edge strength
threshold. If the k value is too large, the diffusion process
will over-smooth and result in a blurred image. On the
contrary, if the k value is too small, the diffusion process
will stop the smoothing in early iterations and will yield a
restored image similar to the original image.
4. Feature Extraction
4.1. Region Growing Algorithm
This paper adopts the study of Chen et al. [4], who pro-
poses a segmentation method based on the improved
region growing method combined with centroid detection
Copyright © 2013 SciRes. JSIP
Computer Tomography and Ultrasonography Image Registration Based on the Cooperation of GPU and CPU
82
and intensity distribution analysis to dismember volume
shape of liver from CT volume images. Through the re-
gion growing method, the volume region growing
method is mainly to combine with sets of continuous CT
images to construct its volume character. It is assumed
that CT data has m slices. The seed point location,
cc
, is selected and set in the first slice by the
doctor. In general, this seed point will be set in charac-
teristic of the region, and will start growing from the seed
point. When the intensity of a seed point and its neighbor
points are similar, those neighbor ones are added into the
region. Subsequently, the center of gravity in the previ-
ous slice is inherited to be the seed point in the next slice.
Through this way, continuous images are able to estab-
lish CT volume. The center of gravity of the growing
region in nth slice can be calculated as
11
(, )SPSx Sy1
0
0
, 1,
, 1,
Nn
i
n
c
Nn
i
n
c
Rx
Sxn m
N
Ry
Syn m
N


(4)
where represents the coordinates of region
growing result in n slice. N represents the area of region
range. The result of volume region growing is shown in
Figure 2.
(,
nn
ii
RxRy )
4.2. Extraction of Any Angle Slice
Hu [5] proposed a slice algorithm, using the normal vec-
tor and the interior point to form the virtual clipping
plane. As shown in Figure 3, the plane can be repre-
sented by the point
n
000
(, ,)
P
xyz and a unit normal vec-
tor as
123
, )a a(,na
102 030
()()()ax xayyaz z 0 (5)
In 3D space, the arbitrary plane of unit normal vector
can be determined by the direction angle
,, . The
direction cosine of vectorOPcan be calculated as
11
222
123
22
222
123
33
222
123
cos
cos
cos
aa
OP aaa
aa
OP aaa
aa
OP aaa






(6)
5. Image Registration Algorithm
The gradient based similarity measure is used to evaluate
the similarity between the CT image and the US image.
Figure 2. Result of volume region growing.
),,(
321
aaan
A
P
O
Figure 3. Any angle of slice.
In the matching process, the template model should be
compared to the search image at all locations using a
similarity measure. The idea behind similarity measure is
to take the sum of all normalized dot products of gradient
direction of the template image and search the image over
all points in the model data set. Extracting the edges of
specified organ in CT image, the gradient of the selected
edges along with the coordinate information are stored as
the template model which contains a set of points
(,)
T
iii
p
xy 
. These points are relative to center of grav-
ity and the edge points of the object and corresponding
gradients ,
(, )
T
iii
dtu 1, , in
(, )
T
w
. The points and the
corresponding gradients are respectively denoted as
and ,,
ii ii
ir
crc
, , in candi-
date objects. If there is no rotation angle between tem-
plate and search image, the similarity measure of gradi-
ent based is defined as follows
(, )
T
iii
qrcev 1, , in

,,
222 2
1,,
1
,ii ii
ii ii
nixryci xryc
iixrycixryc
xy n
 
 

tv uw
tuvw (7)
where n is amount of the edge point.
In contrast, if there is a rotation angle between tem-
plate and search image, the similarity measure will be
revised as
,,
22 22
1,,
1
(, , )ii ii
ii ii
ni xrycixryc
eg
iixrycixryc
xy n

 
 
 
 



tv uw
tu vw
(8)
where
is the rotation angle; the is the gra-
dients after rotation; the points (, are the rotated
location of points in search image.
(, )
T
ii

tu
)
T
ii

rc
Copyright © 2013 SciRes. JSIP
Computer Tomography and Ultrasonography Image Registration Based on the Cooperation of GPU and CPU 83
cos( )sin( )
sin( )cos( )
cos( )sin( )
sin()cos()
ii
ii
ii
ii
rr
cc




 

 

 
 

 


 
tt
uu
(9)
If it perfectly matched between the template model and
the search image, this correction coefficient will be 1.
6. GPGPU MODEL-CUDA
6.1. CUDA Introduction
CUDA (Compute Unified Device Architecture) is a
novel technology of general-purpose computing on the
GPU, which are SIMD (Single Instruction, Multiple Data)
device that is inherently data-parallel. We only take the
float-point operation as an example, and GPU’s compu-
tation speed is several times faster than CPU’s.
For the Programming Framework, CUDA also pro-
vides the magnitude of performance and simpler software
development by using the standard C language [6].
6.2. Features of CUDA GPU
CUDA GPU has the following advantages:
(1) General programming environment: CUDA uses
C programming tools and C compiler, which make the
program have better compatibility.
(2) Higher bandwidth: We take GeForce GTX470 as
the developing platform in this paper, the bandwidth is
96.4GB/s between GPU and device memory, and 2GB/s
between host and device memory.
(3) More powerful parallel computing capability:
GeForce GTX470 has 448 stream processors (1.4GHz).
For hardware, CUDA device contains many stream
multi-processors (SM), and each SM also has several
stream processors. Each SM can operate four types of
memories: constant memory, texture memory and global
memory which can communicate with host memory ex-
cept on-chip shared memory. These cores use on-chip
shared memories and cache to accelerate memory access,
as shown in Figure 4.
For software, in CUDA program any call to a GPU
function from a CPU function must include an execution
configuration [7], which determines the block count,
thread count in a thread-block, and the amount of shared
memory. Each block can only run on one multi-processor,
but each multi-processor can allocate multiple blocks. In
addition, those blocks can concurrently run on a mul-
ti-processor depending on the occupancy of the registers
and shared memory of each SM.
6.3. Implementation of Parallel Computing
As shown in Figure 1, the median filter, anisotropic dif-
fusion and registration algorithm will be implemented
Figure 4. CUDA hardware (GPU) architecture.
with parallel processing technique.
6.3.1. M ed ian Filter
The US image as input image is divided into subimages
and assigned to SM. Each SM stores assigned blocks,
consisting a lot of threads, to do the parallel computation
of convolution of median filter in global memory. After
that, the result of image is kept in global memory.
6.3.2. Ani sotropi c Diffusio n
Using the image of median filter in section 6.3.1, we
make thread and pixel to be in one-to-one correspon-
dence. At first, each thread calculates gradients of eight
neighbors
I
for any located pixel, and then calculates
the diffusion coefficient C. Subsequently, each thread is
going to operate the equation (2) of anisotropic diffusion.
The operation of anisotropic diffusion is shown in Fig-
ure 5.
6.3.3. Re g i stration
The edge data of template and search image divided into
subimages are fed as input data and input image in this
step. We assign each subimage to SM, and the each pixel
is processed by the corresponding thread. Then these
threads play the role of center of gravity to operate the
value of similarity measure in parallel. Subsequently, we
use the compact operation to eliminate the low similarity
value, and then thread checks whether the value of simi-
larity measure is maximum or not. The maximal value is
the target location in the registration process. The opera-
tion of registration is shown in Fig ure 6.
7. Experimental Results
Experiments were carried out with two kinds of test im-
ages for the performance verification. The size of these
two CT and US images are 512512, and 680
512,
respectively. The experiments were performed with Vis-
ual C++ on Intel core i3 530 2.93 GHz with 4GB of
memory.
Copyright © 2013 SciRes. JSIP
Computer Tomography and Ultrasonography Image Registration Based on the Cooperation of GPU and CPU
84
)( IC
Figure 5. Operation of anisotropic diffusion.
Figure 6. Procedure of registration in CUDA.
In this study, two cases, shown in Figures 7(a)-(d),
are used to verify the registration performance. The re-
sults of registration are shown in Figures 8(a)-(b). Here,
the performance of efficiency is listed in Table 1. Ac-
cording to the experimental results, the object is suc-
cessful recognized in the US image, and computation cost
is dramatically reduced using GPU. The computation
advantage is 23.36 times.
8. Analysis of Registration Error
Fiducial registration error (FRE) and target registration
error (TRE) [8-10] are adapted to evaluate the accuracy
of registration results.
1) FRE: distance of corresponding fiducial points after
registration.
2) TRE: distance of corresponding target points after
registration.
Therefore, the location of fiducial points and target
points are indicated by doctors. Doctors laid position of
vessel, bifurcations, and the lesions to be the target point.
Besides, Doctors put other locations to be fiducial points,
mostly the abdominal aorta and inferior vena cava. Then
the fiducial points and target points are used to calculate
Root Mean Square Deviation (RMSD) by
2
,,
1
()
CT US
n
ii
i
xx
RMSD n
(7)
where n is number of point pairs, ,,CT iUS i
x
xis the dis-
tance between the corresponding point from two different
data sets.
Figures 9(a)-(b), 9(c)-(d) show the location of the tar-
get points and fiducial points from different cases.
The fiducial points and target points are marked as
triangle and circle, respectively. Then the registration
error is listed in Table 2.
(a) (b)
(c) (d)
Figure 7. Test images: (a) Case1 of CT, (b) Case1 of US, (c)
Case2 of CT, (d) Case2 of US.
(a) (b)
Figure 8. Registration results: (a) Case1, (b) Case2.
Table 1. The efficiency of the proposed method.
Execute time
Algorithms
CPU GPU/CPU
Processing in US
(Median + Anisotropic) 1.33 (s) 22.09 (ms)
Feature Extraction in CT 30 (ms) 30 (ms) (CPU)
Feature Extraction in US 6 (ms) 6 (ms) (CPU)
Registration 510.8 (ms) 18.01 (ms)
Total 1.776 (s) 76.1 (ms)
9. Conclusions
It is a tremendous challenge in registration of CT and US
images because the CT and US images of abdominal
organ images are obtained from the same patient but at
different times and with different imaging modality. In
this paper, through the proposed method, CT and US can
Copyright © 2013 SciRes. JSIP
Computer Tomography and Ultrasonography Image Registration Based on the Cooperation of GPU and CPU
Copyright © 2013 SciRes. JSIP
85
(a) (b)
(c) (d)
Figure 9. (a) Case1-Target and fiducial points of US (b)
Case1-Target and fiducial points of CT (c) Case2-Target
and fiducial points of US (d) Case2-Target and fiducial
points of CT.
Table 2. Registration error in Case1 and Case2.
Registration error
Case
TRE (mm) FRE (mm)
Case1 2.41 2.341
Case2 3.139 2.054
register accurately to each other. According to experi-
mental results, the computation consuming is improved
by using the configuration of cooperated GPU and CPU
to get computation advantage around 23.36 times. That
implies the architecture of GPU and CPU can be applied
in time critical applications.
REFERENCES
[1] K. Kwon, D. G. Sim and R. H. Park, “Robust Hausdorff
Distance Matching Algorithms using Pyramidal Struc-
tures,” Pattern Recognition, Vol. 34, No.10, 2001, pp.
2005-2013. doi:10.1016/S0031-3203(00)00132-1
[2] C. Steger, “Similarity Measures for Occlusion, Clutter,
and Illumination Invariant Object Recognition,” Lecture
Notes in Computer Science, Vol. 2191, 2001, pp.
148-154.
[3] K. Krissian, R. Kikinis, C. F. Westin and K. Vosburgh,
“Speckle-Constrained Filtering of Ultrasound Images,”
Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, Vol. 2,
2005, pp. 547-552. doi:10.1109/CVPR.2005.331
[4] Y. Chen, Z. Wang, W. Zhao and X. Yang, “Liver Seg-
mentation from CT Images Based on Region Growing
Method,” Bioinformatics and Biomedical Engineering
Beijing, 11-13 June, 2009.
doi:10.1109/ICBBE.2009.5163018
[5] Z. Hu, “Extraction of Any Angle Virtual Slice on 3D CT
Image,” Intelligent Information Technology Application,
Vol.2, 2008, pp. 356-360.
doi:10.1109/IITA.2008.399
[6] NVIDIA Corporation, “NVIDIA CUDA Programming
guide version 1.0. Available,” 2007.
http://docs.nvidia.com/cuda/index.html
[7] M. Houston: “General-Purpose Computation on Graphics
Hardware”, Proceedings of SIGGRAPH, 2007.
[8] C. R. Maurer, J. M. Fitzpatrick, M. Y. Wang, R. L. Gal-
loway, R. J. Maciunas and G. S. Allen, “Registration of
Head Volume Images Using Implantable Fiducial Mark-
ers,” IEEE Transactions Medical Imaging, Vol. 16, No.
4, 1997, pp. 447-462. doi:10.1109/42.611354
[9] A. D. Wiles, A. Likholyot, D. D. Frantz, and T. M. Peters,
“A Statistical Model for Point-Based Target Registration
Error with Anisotropic Fiducial Localizer Error,” IEEE
Transactions Medical Imaging, Vol. 27, No. 3, Mar.,
2008, pp. 378-390. doi:10.1109/TMI.2007.908124
[10] W. Liu, H. Ding, H. Han, Q. Xue, Z. Sun and G. Wang,
“The Study of Fiducial Localization Error of Image in
Point-based Registration,” 31st Annual International
Conference of the IEEE EMBS Minneapolis, Minnesota,
USA, September 2-6, 2009, pp. 5088-509.