Engineering, 2013, 5, 199-202 Published Online October 2013 (
Copyright © 2013 SciRes. ENG
Matching DSIFT Descriptors Extracted fro m CS LM
Stefan G. Stanciu1,2*, Dinu Coltuc3, Denis E. Tranca1, George A. Stanciu1
1Center for Microscopy-Microanalysis and Information Processing, University Politehnica of Bucharest, București, Romania
2Light Microscopy and Screening Center, Swiss Federal Institute of Technology, Zurich, Switzerland
3Electric Eng ineering Department, Valahia University of Târgoviște, Târgovişte, Romania
Email: *
Received May 2013
The matching of local descriptors represents at this moment a key tool in computer vision, with a wide variety of me-
thods designed for tasks such as image classification, object recognition and tracking, image stitching, or data mining
relying on it. Local feature description techniques are usually developed so as to provide invariance to photometric var-
iations specific to the acquisition of natural images, but are nonetheless used in association with biomedical imaging as
well. It has been previously shown that the matching of gradient based descriptors is affected by image modifications
specific to Confocal Scanning Laser Microscopy (CSLM). In this paper we extend our previous work in this direction
and show h ow s pe c ific acquisition or post-processing methods alleviate or accentuate this problem.
Keywords: Local Features; Local Descriptors; Feature Matching; SIFT; CSLM
1. Introduction
The detection and description of affine-invariant regions
have been regarded as high interest topics during the past
decade. Image matching using local invariant features
represents a key method used in many computer vision
tasks such as image retrieval [1], recognition [2,3], wide
baseline matching [4], building panoramas [5], micro-
scopy image stitching [6], image based localization [7,8]
or medical image classification [9,10]. In these applica-
tions, local invariant features are detected independently
in each image and then the features of one image are
matched against the features of other images by direct or
indirect comparisons of their respective feature descrip-
tors. The matched features can subsequently be used to
indicate presence of a particular object, to vote for a par-
ticular image, to establish correspondences for epipolar
geometry estimation, or to classify an image as belonging
to a specific class. For all the above tasks, the core of the
application is based on interest point correspondences
between individual image pairs or between an image and
a class of images. Among various methods reported in
the literature, the Scale-Invariant Feature Transform
(SIFT) [11] became one of the most preferred choices for
local feature detection/description because of its high
accuracy, relatively low computation time and the avail-
ability of open-sou rce implemen tatio ns.
Confocal scanning laser microscopy (CSLM) re-
presents an essential imaging tool for many research
fields. It provides the possibility to acquire in-focus im-
ages from selected depths (optical sections) from both
living and fixed specimens in a non-invasive manner.
The optical sectioning capability is given by the presence
of a pinhole aperture which acts as a spatial filter at the
conjugate image plane, rejecting out of focus light [12].
The dimension of the pinhole aperture is responsible for
the thickness of the imaged optical section. A stack of
optical sections, imaging 2D confocal planes collected at
different volume depths can be used to create 3D recon-
structions of the imaged specimen.
In CSLM the illumination light is scanned onto the
specimen point by point by a mirror on galvano-motor-
driven scanner and the light that is emitted from the spe-
cimen is likewise collected and de-scanned. The in-focus
light that passes the pinhole reaches a photomultiplier
tube (PMT), which detects light and converts photon hits
into an analogue electron flow. Raising gain (voltage) on
the PMT can amplify a weak signal but also amplifies the
noise. It is usual that pinhole changes are accompanied
by PMT Gain adjustments for reaching a balance be-
tween the signal intensity and the background noise.
Narrowing the pinhole aperture leads to a reduced vo-
lume contributing to the image, resulting in lower image
intensity and the need for higher signal amplification.
Reciprocally, increasing the pinhole aperture leads to
*Corresponding a uthor.
Copyright © 2013 SciRes. ENG
higher signal and the PMT gain is modified in order to
avoid pixe l saturat ion.
It was previously shown that image modifications as-
sociated with pinhole aperture or PMT gain adjustments
pose problems to gradient based techniques designed for
the detection and description of affine-invariant regions
[6,13]. The experiment presented in this paper extends
our previous investigations in this direction, showing
how three usual CSLM image enhancement methods
alleviate or accentuate this problem. These three tech-
niques are line averaging, spatial filtering and deconvo-
2. Methods
2.1. Image Acquisition
The image set that we use has been collected on a mouse
kidney section, labeled by Alexa Fluor 488 WGA (Invi-
trogen, Molecular Probes) by using a Zeiss LSM 510
CSLM system. We have imaged the same field of view
under five combinations of the pinhole aperture and PMT
gain, resulted from concomitantly decreasing the PMT
gain when increasing the dimension of the pinhole aper-
ture. The pinhole aperture was varied between 1 and 2
Airy Units (AU) in steps of 0.2 AU, while the PMT gain
was varied between 450 and 400 Zeiss LSM 510 Units
(ZU). For each of the six pinhole-PMTgain combina-
tions, we have imaged 20 optical sections of 450 µm ×
450 µm, collected at 0.750 µm steps along the z axis by
using a 20x 0.8 NA objective. Higher pinhole aperture
corresponds to higher optical section thickness. The pre-
sented results have been achieved by using as support a
reference image of the stack automatically detected by
using the reference frame estimator introduced [14].
For excitation we have used a 488nm Ar laser line.
The fluorescence signal was collected by passing the
emitted light through a 530 - 595 nm band pass filter. In
Figure 1, we present the brightest image of the stack col-
lected at highest pinhole aperture/lowest pmt gain com-
2.2. Descriptor Extraction
The SIFT keypoint descriptor is a histogram representa-
tion that combines local gradient orientations and mag-
nitudes from a certain neighborhood around a keypoint.
More precisely, the descriptor is in fact a 3D histogram
of gradient location and orientation, where location is
quantized into a 4 × 4 location grid and the gradient an-
gle is quantized into 8 orientations, one for each of the
cardinal directions. The resulting descriptor is a norma-
lized vector with the dimension of 128 elements [11].
The SIFT technique provides solutions for both key-
point detection and description. In this experiment we
Figure 1. Confocal optical section of mouse kidney tissue
collected at 1 AU pinhole aperture/450 ZU PMT gain.
concentrate our attention to the description capabilities of
SIFT, extracting descriptors from fixed locations corres-
ponding to a grid. In this purpose we employ the vl_dsift
function of the VL-Feat library [15] for calculating
DSIFT descriptors at fixed grid locations, which accord-
ing to the authors is “roughly equivalent to running SIFT
on a dense grid of locations at a fixed scale and orienta-
We use a10 pixel grid spacing, resulting in 10,404
features per image. The evaluated sizes for the SIFT bins,
are 4, 6 and 8 pixels .
2.3. Evaluated Methods
Line averaging is a usual CSLM acquisition method that
is used for compensating low SNR at the expense of
bleaching. It consists in scanning the same line for a spe-
cified number of times before adding an averaged in-
stance to the image and moving on to the next line. The
averaged instance that is added to the image is the arith-
metic mean of the summed pixel values from a specified
number of scans. By averaging, persistent image content
is preserved while fluctuated image content (usually
noise) is attenuated.
Median Filtering is a common nonlinear digital fil-
tering technique that is used to remove noise while pre-
serving edges [16]. It evaluates in turns each image pixel
and decides whether it is representative for its surround-
ings or not. The pixel values are replaced by the median
of the pixels lying in a specified neighborhood. If the
specified neighborhood contains an even number of pix-
els, the average of the two middle pixel values is used.
Median filtering is demonstrably better than Gaussian
blur at removing noise whilst preserving edges for a
Copyright © 2013 SciRes. ENG
given, fi xed wind o w si z e .
Deconvolution techniques are routinely used in mi-
croscopy imaging for compensating the effect of the un-
avoidable convolution with the Point Spread Function
(PSF) of the optical signal gener ated by the sample [17].
This process can be mathematically expressed by the
following equation: g = f × h
where g represents the collected image which generated
through the convolution of the real optical signal (f) ob-
ject) and the system’s PSF (h). Deconvolution consists in
solving Equation (2) in order to find o ut f, knowing both
g and h. For deconvolving the image we have used a
used a Classic Maximum Likelihood Estimation (CMLE )
method available in the Huygens Professional (SVI,
Netherlands) software platform.
3. Results
We consider all two-fold pairs of images in the set. The
first image of a pair is always the image collected at a
higher pinhole aperture and lower PMT gain. Each of the
descriptors extracted from the first image in the pair are
matched against the descriptors extracted from the other
image by using a nearest-neighbour approach. The dis-
tance that we use is Euclidean. If the matched nearest-
neighbor is the descriptor extracted from the same x, y
coordinates we consider to have found a true positive”,
otherwise a false positive. The performance of the
nearest-neighbor matching of the descriptors is evaluated
in terms of precision ( Equation (1)):
( )
True positivesTrue positivesFalse Positives= +
In Table 1 we show the calculated precision in case of
the nearest-neighbor matching of DSIFT descriptors ex-
tracted from the image set collected without line averag-
ing and not post-processed—“RAW. In Table 2 we
refer to the precision associated to the three other eva-
luated image sets: image set collected without line aver-
aging and post-processed by median filtering (3 × 3 me-
dian filter)—“MF; image set collected without line av-
eraging and deconvolved by a CMLE approach available
in Huygens Professional—“DEC”; image set collected
with line averaging (4 time averaging)—“AV4”.
In the case of the RAWimage set we observed a
precision increase with higher bin size. Median filtering
provides a slight improvement ranging from 4% to 7%
depending on the considered bin size. The image set re-
sulted after deconvolution is associated a massive de-
crease of precision when compared to the RAW image
set. The precision decrease varies with bin size and the
lowest value is observed in the case of the lowest consi-
dered bin size 4, going as low as 48% in this case. In the
case of the image set collected under lie averaging we
Table 1. Precision of nearest-neighbor matching calculated
for the image set collected without averaging and not post-
processed (“RAW”).
Image set Bin size
4 6 8
RAW 0.43 0.56 0.63
Table 2. Neare st-neighbor matching precision difference for
image sets MF, “DEC”, AV4in respect to the RAW”
image set.
Precision difference
Image set Bin size
4 6 8
MF 104% 106% 107%
DEC 48% 56% 61%
AV4 115% 108% 105%
can observe increased precision when compared to the
RAW image set. This increase is more consistent in the
case of lower bin sizes, going as high as 15% for the
lowest considered bin size. It should be noted that in the
case of this image set the increase comes at the cost of
light exposure, since each image is scanned four times
before being added to the image.
4. Conclusion
Image modifications associated with combined pinhole
aperture dimensionPMT gain changes raise problems
to gradient based local feature description. These prob-
lems can be alleviated or accentuated by specific CSLM
image acquisition or image post-processing methods. By
the experiment that we present in this paper we place a
first step in the direction of identifying the methods that
affect feature description and the ones that could be used
to increase the performance of gradient based description
techniques. We have evaluated three usual techniques
that are commonly used for CSLM image enhancement.
We have observed that median filtering and line averag-
ing are associated with an increase in the precision of
DSIFT descriptor based matching, while deconvolution
yields negative effects in this regard. We consider that
research efforts placed in this direction are important as a
wide variety of biomedical computer vision applications
rely on local feature description and matching and their
efficient optimization cannot be achieved without identi-
fying specific methods that need to be avoided and ones
that need to exploited for enhancing the results.
5. Acknowledgements
The presented work was supported by the UEFISCDI
Copyright © 2013 SciRes. ENG
PN-II-PT-PCCA-2011-3.2-1162 Research Grant and the
CRUS SCIEX NMS-CH Fellowship nr. 12.135. The
corresponding author thanks Dr. Gábor Csúcs, Dr. To-
bias Schwarz and Dr. Joachim Hehl, of the Light Micro-
scopy and Screening Center of ETH Zurich for their
support and advice .
[1] L. J. Zhi, S. M. Zhang, D. Z. Zhao, H. Zhao, S. K. Lin, D.
Z. Zhao and H. Zhao, Medical Image Retrieval Using
SIFT Feature,” Proceedings of the 2009 2nd International
Congress on Image and Signal Processing, Vol. 1-9,
2009, pp. 2252-2255.
[2] G. Kordelas and P. Daras, “Viewpoint Independent Ob-
ject Recognition in Cluttered Scenes Exploiting Ray-
Triangle Intersection and SIFT Algorithms,” Pattern Re-
cognition, Vol. 43, 2010, pp. 3833-3845.
[3] M. Brown and S. Susstrunk, “Multi-Spectral SIFT for
Scene Category Recognition,” 2011 IEEE Conference on
Computer Vision and Pattern Recognition (Cvpr), 2011,
pp. 177-184.
[4] J. Matas, O. Chum, M. Urban and T. Pajdla, “Robust
Wide-Baseline Stereo from Maximally Stable Extremal
Regions,” Image and Vision Computing, Vol. 22, 2004,
pp. 761-767.
[5] M. Brown, and D. G. Lowe, Automatic Panora mic Image
Stitching Using Invariant Features,” International Jour-
nal of Computer Vision, Vol. 74, 2007, pp. 59-73.
[6] S. G. Stanciu, R. Hristu and G. A. Stanciu, Influence of
Confocal Scanning Laser Microscopy Specific Acquisi-
tion Parameters on the Detection and Matching of Speeded-
Up Robust Features,” Ul tramicroscopy, Vol. 111, 2011,
pp. 364-374.
[7] P. Piccinini, A. Prati and R. Cucchiara, Real-Time Ob-
ject Detection and Localization with SIFT-Based Clus-
tering,” Image and Vision Computing, Vol. 30, 2012, pp.
[8] M. Dawood, C. Cappelle, M. E. El Najjar, M. Khalil and
D. Pomorski, “Harris, SIFT and SURF Features Compar-
ison for Vehicle Localization Based on Virtual 3D Model
And Camera,” 2012 3rd International Conference on
Image Processing Theory, Tools and Applications, 2012,
pp. 307-312.
[9] J. C. Caicedo, A. Cruz and F. A. Gonzalez, “Histopa-
thology Image Classification Using Bag of Features and
Kernel Functions,” Artificial Intelligence in Medicine,
Proceedings, Vol. 5651, 2009, pp. 126-135.
[10] T. Tamaki , J. Yoshimuta, M. Kawa kami, B. Raytchev, K.
Kaneda, S. Yoshida, Y. Takemura, K. Onji, R. Miyaki
and S. Tanaka, “Computer-Aided Colorectal Tumor Clas-
sification in NBI Endoscopy Using Local Features,”
Medical Image Analysis, Vol. 17, 2013, pp. 78-100.
[11] D. G. Lowe, Distinctive Image Features from Scale-
Invariant Keypoints,” International Journal of Computer
Vision, Vol. 60, 2004, pp. 91-110 .
[12] J. B. Pawley,Handbook of Biological Confocal Micro-
scopy,Springer, New York, 2006.
[13] S. G. Stanciu, R. Hristu, R. Boriga and G. A. Stanciu,
On the Suitability of SIFT Technique to Deal with Im-
age Modifications Specific to Confocal Scanning Laser
Microscopy,” Microscopy and Microanalysis, Vol. 16,
2010, pp. 515-530.
[14] S. G. Stanciu, G. A. Stanciu and D. Coltuc, “Automated
Compensation of Light Attenuation in Confocal Micro-
scopy by Exact Histogram Specification,” Microscopy
Research and Technique, Vol. 73, 2010, pp. 165-175.
[15] A. Vedaldi and B. Fulkerson, “VLFeat: An open and
Portable Library of Computer Vision Algorithms,” 2008.
[16] R. C. Gonzalez and R. E. Woods, “Digital Image Proces-
sing,” Addison-Wesley Longman Publishing Co., Inc.,
Boston, 2001.
[17] W. Wa llace, L. H. Schaefer and J. R. Swedlow, “A Wor-
kingperson’s Guide to Deconvolution in Light Microsco-
py,” Biotechniques, Vol. 31, 2001, p. 1076.