Journal of Sensor Technology
Vol.2 No.1(2012), Article ID:17934,15 pages DOI:10.4236/jst.2012.21005

Multidimensional Median Filters for Finding Bumps in Chemical Sensor Datasets

Jeffrey C. Miecznikowski1, Kimberly F. Sellers2, William F. Eddy3

1Department of Biostatistics, Roswell Park Cancer Institute, SUNY University at Buffalo, Buffalo, USA

2Department of Mathematics and Statistics, Georgetown University, Washington DC, USA

3Department of Statistics, Carnegie Mellon University, Pittsburgh, USA

Email: jcm38@buffalo.edu

Received October 14, 2011; revised November 14, 2011; accepted December 6, 2011

Keywords: Bump Hunting; Image Analysis; Spatial Smoothing; Feature Detection; Mathematical Morphology

ABSTRACT

Feature detection in chemical sensors images falls under the general topic of mathematical morphology, where the goal is to detect “image objects” e.g. peaks or spots in an image. Here, we propose a novel method for object detection that can be generalized for a k-dimensional object obtained from an analogous higher-dimensional technology source. Our method is based on the smoothing decomposition, Data = Smooth + Rough, where the “rough” (i.e. residual) object from a k-dimensional cross-shaped smoother provides information for object detection. We demonstrate properties of this procedure with chemical sensor applications from various biological fields, including genetic and proteomic data analysis.

1. Introduction

Numerous chemical sensor platforms and technologies require image analysis techniques to isolate the signal from the associated noise in the sensor. In a one-dimensional chemical sensor setting, for example, several technologies produce spectra where scientists can gain information from associated peaks, or grayscale images where the features appear as streaks or lines. Meanwhile, in a two-dimensional setting, associated technologies produce images whose features are spots. Such image analyses usually involve methods where the goal is to identify and quantify the size of an image feature or object, i.e. feature detection and quantification.

Feature detection in multi-dimensional images is an area of great interest in a variety of applications, ranging from astronomy to proteomics [1-7]. Proposed methods employ image segmentation techniques such as watershed methods, thresholding operators, and wavelet reconstruction methods to locate the features contained in a one-dimensional or two-dimensional image. Further, feature detection has a growing body of research in larger high-dimensional datasets, as well; see, for example, [8, 9]. The algorithms and methods proposed, however, usually apply solely to the application and technology of interest and may not be applicable to images of other forms or varying dimensionality.

Determining the locations and boundaries associated with various chemical sensor features has been a problem considered by computer scientists and engineers (under the guise of image analysis), as well as mathematicians and statisticians (via mathematical morphology). Mathematical morphology (MM) is the science of analyzing and processing geometric structures (e.g. local maxima) in digital images via various processing techniques (e.g. local maxima) in digital images via various processing techniques [10-15]. Examples of common MM functions include opening, closing, thinning, binning, thresholding, and watershed methods, and have been employed in numerous applications including pedestrian detection [16], tumor mass detection [17], and facial feature detection [18,19]. A key component in MM lies in the choice of structuring element, i.e. the shape used to interrogate the image; its two main descriptive characteristics are its shape and size. In digital images, the structuring element scans the image and alters the pixels in its window content using basic operators similar to Minkowski addition. Since the goal is commonly to smooth images by removing the statistical noise, the usual practice is to choose a window which is (hyper-) cubical or (hyper-) spherical. Since our goal is feature detection rather than data smoothing, we instead propose a MM technique with a “cross” shaped structuring element in conjunction with residual analysis to aid in bump finding in chemical sensoring images. We have found that, by choosing the window to be (hyper-) crossical (i.e. shaped like a multi-dimensional cross), the resulting residual image also contains crosses whose centers identify the locations of local maxima.

This paper combines aspects of feature detection, data smoothing, and residual analysis to develop a new bump detection method for not only oneor two-dimensional images, but k-dimensional images for any. Thus, not only is this method straightforward, but it can also be applied universally to higher-dimensional images, providing researchers with a detection and quantification method for any chemical sensor technology whose features of interest are bumps.

2. Theoretical Model

In our method, a specialized median (referred to hereafter as an s-median) smoother is developed, where the s-median determines the median associated with the intensity values that lie spatially in the cross-shaped structuring element. Consider a k-dimensional (kD) image represented by grid or “box” shaped window sequence is used. Here, we now obtain a residual image that looks like a starburst instead of a cross. As a result, the spot center is now potentially more difficult to identify. The shape of the smoothing window (cross vs. box) and the summary statistic used (median versus mean) thus affect the R image and the ability to detect the mountains in an image.

The issue of rotation invariance is an important concept within mathematical morphology operators used in image detection. Rotation invariance implies that the resultant image does not change when arbitrary rotations are applied to its input argument. In general, our spot finding method is rotation invariant for the Gaussian spots with zero correlation (e.g., spots of the type shown in Figure 8). Interestingly, if we induce any nonzero correlation in the spot, the spot finding method is no longer rotation invariant. Figure 16(a) displays a bivariate normal density with a correlation of 0.50 between the two variables. Figure 16(b) is the residual image from our proposed method. Meanwhile, Figure 16(c) is the result when employing a rotated version (45 degrees) of the structuring element used in Figure 16(b). Similarly, Figure 16(d) is the rotated version (90 degree) of Figure 16(a) with the corresponding images shown in Figures 16(e)-(f). Our proposed spot finding method is not rotation invariant since the images in Figure 16(b) and Figure 16(e) are clearly different. Although our proposed method is not rotation invariant, it is possible to rotate our structuring element (cross) to align with the major and minor axes of a correlated spot as in Figures 16(c) and (f). Both versions of the residual images clearly show a cross shape and provide utility in terms of locating the spots in the image. Future work will further explore the characteristics of the cross in each residual image in order to detect spots in correlated images. Note, however, in our biological applications (e.g. 2D-DIGE), it is reasonable to assume that there is negligible correlation within a spot. For example in a DIGE image, the spots are created by electrophoresis in two dimensions where the electrophoresis for each dimension is performed separately. Similarly in pin-based microarray images, it is reasonable to assume that there is negligible correlation within a given spot.

When using the s-median operator for spot finding, the major consideration is the arm-length size associated with the smoothing window, or alternatively the number of pixels included in the smoothing window (structuring element). The s-median smoother naturally removes noise from, hence the size of the smoothing window essentially decides the amount of smoothing to apply to the dataset. From Figures 11 and 17, the choice of is critical, since choosing too large will oversmooth the image and blend spots together, while choosing too small will undersmooth the image and cause spurious spots due to noise to appear as real spots. Since the choice of is essentially choosing a smoothing parameter, there are several available methods to consider when choosing an optimal value for c. The general method for choosing smoothing parameters is based on cross validation algorithms described in [22].

The optimal choice of c is related to the larger statistical subject of bias-variance tradeoff. Choosing c too small leads to a largely variable residual image (missing small spots), while choosing c too large leads to a residual image with a large bias term (too many spurious spots). Similarly, the optimal choice of c is related to several other problems in statistics, the optimal choice of bandwidth in kernel density estimation [38], and the amount of times to smooth a dataset [39]. Various strategies that estimate error quantities (risk) can determine “optimal” smoothing strategies, while other procedures determine smoothing parameters from examining figures such as mode trees [40] or estimates of the mean squared error [41]. To improve the ability of our MM operator in the presence of noise, we have explored applying standard image smoothing techniques to the image prior to applying the MM filter. Future work will examine the utility of applying “pre-smoothers” to images before applying MM operators. In addition to examining pre-smoothers, we will also examine data driven cross validation schemes for choosing an optimal value of c for specific image applications. In the same way we use presmoothers to smooth the image prior to analysis, we will also explore smoothing the resulting residual image.

A major concern in proposing image analysis software algorithms involves performing the comparisons among competing methods. Unfortunately, due to the cost of these technologies and the lack of a gold standard for measuring the signal of the chemical sensor, it is difficult to design statistically appropriate benchmarks or quality control studies to assess these image analysis techniques for a given chemical sensor. Although it is relatively simple to simulate “bumps” or mountains in an image, the difficulty arises in deciding the type of noise to impose upon the simulated images. In the presence of most noise distributions, the success of our proposed method will be dependent on the choice of smoothing parameter, c. It is outside the scope of this manuscript to perform a thorough comparison of competing spot finding algorithms against a set of noise distributions. For future work, we propose performing comparisons such as those in [42, 43] to establish conditions in simulated and real datasets where our methods are superior to competing methods. The main goal of this manuscript is to establish a new method for spot finding in images and demonstrate its performance on a variety of different biological images derived from chemical sensors.

Figure 16. Rotational invariance: (a) A scaled bivariate normal density with a correlation of 0.50; (b) The resulting residual image using a R2,4 operator; (c) The resulting residual image when the structuring element in the residual operator used in (b) is rotated 45 degrees to align with the major axis of the spot in (a); (d) a 90 degree rotation of the spot in (a), i.e. a spot with a correlation of –0.50; (e) The residual image using a R2,4 operator. The resulting residual image when the structuring element used in (e) is rotated 45 degrees.

Figure 17. Two nearby mountains: (a) Perspective plot showing two relatively close mountains; (b) The R2,5 operator image associated with the image in (a). The two crosses indicate the presence of two relative maxima in the image; (c) The R2,27 image obtained from the image in (a). In this situation, the two mountains are “blurred” into one cross.

4. Conclusion

This manuscript develops a new method for spot finding and illustrates the technique’s great utility and applicability within several chemical sensor datasets such as mass spectrometry spectra, gel electrophoresis images, and microarray images. This method can be easily extended to mountains in k dimensions and can be extended to further quantify the amount of signal present in other emerging chemical sensors with Gaussian profiles.

5. Acknowledgements

The authors are grateful to the Roswell Park Cancer Institute Proteomics laboratory and the Minden laboratory at Carnegie Mellon University for generously providing their data to illustrate our method. We also thank the reviewers of this manuscript for their valuable feedback and insights.

REFERENCES

  1. D. Agard, R. Steinberg, and R. Stroud, “Quantitative Analysis of Electrophoretograms: A Mathematical Approach to Super-Resolution,” Analytical Biochemistry, Vol. 111, No. 2, 1981, pp. 257-268. doi:10.1016/0003-2697(81)90562-5
  2. E. Bertin and S. Arnouts, “Sextractor: Software for Source Extraction,” Astronomy and Astrophysics, Vol. 14, No. 4, 1996.
  3. T. Lindeberg, “Feature Detection with Automatic Scale Selection,” International Journal of Computer Vision, Vol. 30, No. 2, 1998, pp. 79-116. doi:10.1023/A:1008045108935
  4. K. Coombes, H. Fritsche Jr., C. Clarke, J. Chen, K. Baggerly, J. Morris, L. Xiao, M. Hung and H. Kuerer, “Quality Control and Peak Finding for Proteomics Data Collected from Nipple Aspirate Fluid by Surface-Enhanced Laser Desorption and Ionization,” Clinical Chemistry, Vol. 49, No. 10, 2003, pp. 1615-1623. doi:10.1373/49.10.1615
  5. P. Cutler, G. Heald, I. R. White and J. Ruan, “A Novel Approach to Spot Detection for Two-Dimensional Gel Electrohporesis Images Using Pixel Value Collection,” Proteomics, Vol. 3, No. 4, 2003, pp. 392-401. doi:10.1002/pmic.200390054
  6. D. S. Lalush, “Effects of Spot and Background Defects on Quantitative Data from Spotted Microarrays,” Proceedings of the 25th Annual International Conference of the IEEE, Vol. 4, 2003, pp. 3563-3566.
  7. K. Coombes, S. Tsavachidis, J. Morris, K. Baggerly, M. Hung and H. Kuerer, “Improved Peak Detection and Quantification of Mass Spectrometry Data Acquired from Surface-Enhanced Laser Desorption and Ionization by Denoising Spectra with the Undecimated Discrete Wavelet Transform,” Proteomics, Vol. 5, No. 16, 2005, pp. 4107-4117. doi:10.1002/pmic.200401261
  8. A. Jain and D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 2, 1997, pp. 153-158. doi:10.1109/34.574797
  9. B. Guo, R. Damper, S. Gunn and J. Nelson, “A Fast Separability-Based Feature-Selection Method for HighDimensional Remotely Sensed Image Classification,” Pattern Recognition, Vol. 41, No. 5, 2008, pp. 1653-1662. doi:10.1016/j.patcog.2007.11.007
  10. J. Serra, “Image Analysis and Mathematical Morphology 1982,” Academic Press, New York, 1986, pp. 370-382.
  11. P. Maragos, “Tutorial on Advances in Morphological Image Processing and Analysis,” Optical Engineering, Vol. 26, No. 7, 1987, pp. 623-632.
  12. P. Maragos and R. Schafer, “Morphological Filters Part I: Their Set-Theoretic Analysis and Relations to Linear Shift-Invariant Filters,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 35, No. 8, 1987, pp. 1153-1169.
  13. P. Maragos and R. Schafer, “Morphological Filters Part II: Their Relations to Median, Order-Statistic, and Stack Filters,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 35, No. 8, 1987, pp. 1170-1184. doi:10.1109/TASSP.1987.1165254
  14. P. Maragos, R. Schafer and M. Butt, “Mathematical Morphology and Its Applications to Image and Signal Processing,” Springer, New York, 1996. doi:10.1007/978-1-4613-0469-2
  15. P. Soille, “Morphological Image Analysis: Principles and Applications,” Springer-Verlag, New York, 2003.
  16. D. Gavrila, J. Giebel, M. Perception, D. Res and G. Ulm, “Shape-Based Pedestrian Detection and Tracking,” IEEE Intelligent Vehicle Symposium, Vol. 1, 2002, pp. 8-14.
  17. L. Tarassenko, P. Hayton, N. Cerneaz and M. Brady, “Novelty Detection for the Identification of Masses in Mammograms,” Fourth International Conference on Artificial Neural Networks, Cambridge, 26-28 June 1995, pp. 442-447.
  18. E. Saber and A. Murat Tekalp, “Frontalview Facedetection and Facial Feature Extraction Using Color, Shape and Symmetry Based Cost Functions,” Pattern Recognition Letters, Vol. 19, No. 8, 1998, pp. 669-680. doi:10.1016/S0167-8655(98)00044-0
  19. Y. Wang, C. Chua and Y. Ho, “Facial Feature Detection and Face Recognition from 2D and 3D Images,” Pattern Recognition Letters, Vol. 23, No. 10, 2002, pp. 1191- 1202. doi:10.1016/S0167-8655(02)00066-1
  20. E. Lehmann and G. Casella, “Theory of Point Estimation,” Springer, New York, 1998.
  21. T. Apostol and I. Makai, “Mathematical Analysis,” Addison-Wesley, Reading, 1974.
  22. L. Wasserman, “All of Statistics: A Concise Course in Statistical Inference,” Springer, New York, 2004.
  23. G. Casella and R. L. Berger, “Statistical Inference,” Duxbury Press, Belmont, 1990.
  24. J. Miecznikowski, K. Sellers and W. Eddy, “Multidemensional Median Filters for Finding Bumps,” Technical Report 907, SUNY University at Buffalo, Buffalo, 2009.
  25. M. Karas, U. Bahr, A. Ingendoh, E. Nordhoff, B. Stahl, K. Strupat and F. Hillenkamp, “Principles and Applications of Matrix-Assisted UV-Laser Desorption/Ionization Mass Spectrometry,” Analytica Chimica Acta, Vol. 241, No. 2, 1990, pp. 175-185. doi:10.1016/S0003-2670(00)83645-4
  26. K. Sellers, J. Miecznikowski, S. Viswanathan, J. Minden and W. Eddy, “Lights, Camera, Action: Quantitative Analysis of Systematic Variation in Two-Dimensional Difference Gel Electrophoresis,” Electrophoresis, Vol. 28, No. 18, 2007, pp. 3324-3332. doi:10.1002/elps.200600793
  27. J. Miecznikowski, S. Damodaran, K. Sellers, D. Coling, R. Salvi and R. Rabin, “A Comparison of Imputation Procedures and Statistical Tests for the Analysis of TwoDimensional Electrophoresis Data,” Proteome Science, Vol. 9, No. 14, 2011, p. 66.
  28. J. Mergliano and J. Minden, “Caspase-Independent Cell Engulfment Mirrors Cell Death Pattern in Drosophila Embryos,” Development, Vol. 130, No. 23, 2003, pp. 5779- 5789. doi:10.1242/dev.00824
  29. L. Gong, M. Puri, M. Unlu, M. Young, K. Robertson, S. Viswanathan, A. Krishnaswamy, S. Dowd and J. Minden, “Drosophila Ventral Furrow Morphogenesis: A Proteomic Analysis,” Development, Vol. 131, No. 3, 2004, pp. 643-656. doi:10.1242/dev.00955
  30. J. Miecznikowski, D. Wang, S. Liu, L. Sucheston and D. Gold, “Comparative Survival Analysis of Breast Cancer Microarray Studies Identifies Important Prognostic Genetic Pathways,” BMC Cancer, Vol. 10, No. 1, 2010, p. 573. doi:10.1186/1471-2407-10-573
  31. M. Schena, D. Shalon, R. Davis and P. Brown, “Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray,” Science, Vol. 270, No. 5235, 1995, pp. 467-470. doi:10.1126/science.270.5235.467
  32. R. Gentleman, “Bioinformatics and Computational Biology Solutions Using R and Bioconductor,” Springer, New York, 2005. doi:10.1007/0-387-29362-0
  33. C. Schröder, A. Jacob, S. Tonack, T. Radon, M. Sill, M. Zucknick, S. Rüffer, E. Costello, J. Neoptolemos, T. Crnogorac-Jurcevic, et al., “Dual-Color Proteomic Profiling of Complex Samples with a Microarray of 810 Cancer-Related Antibodies,” Molecular & Cellular Proteomics, Vol. 9, No. 6, 2010, pp. 1271-1280. doi:10.1074/mcp.M900419-MCP200
  34. M. Eisen, P. Spellman, P. Brown and D. Botstein, “Cluster Analysis and Display of Genomewide Expression Patterns,” Proceedings of the National Academy of Sciences, Vol. 95, No. 25, 1998, pp. 14863-14868. doi:10.1073/pnas.95.25.14863
  35. A. Alizadeh, M. Eisen, R. Davis, C. Ma, I. Lossos, A. Rosenwald, J. Boldrick, H. Sabet, T. Tran, X. Yu, et al., “Distinct Types of Diffuse Large b-Cell Lymphoma Identified by Gene Expression Profiling,” Nature, Vol. 403, No. 6769, 2000, pp. 503-511. doi:10.1038/35000501
  36. J. Khan, J. Wei, M. Ringnér, L. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. Antonescu, C. Peterson, et al., “Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks,” Nature Medicine, Vol. 7, No. 6, 2001, pp. 673-679. doi:10.1038/89044
  37. G. Fink, P. Spellman, G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein and B. Futcher, “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, Vol. 9, No. 12, 1998, pp. 3273-3297.
  38. B. Silverman, “Density Estimation for Statistics and Data Analysis,” Chapman & Hall/CRC, 1986.
  39. J. Tukey, “Exploratory Data Analysis,” Addison-Wesley, New York, 1977.
  40. M. Minnotte and D. Scott, “The Mode Tree: A Tool for Visualization of Nonparametric Density Features,” Journal of Computational and Graphical Statistics, Vol. 2, No. 1, 1993, pp. 51-68. doi:10.2307/1390955
  41. J. Miecznikowski, D. Wang and A. Hutson, “Bootstrap Mise Estimators to Obtain Bandwidth for Kernel Density Estimation,” Communications in Statistics Simulation and Computation, Vol. 39, No. 7, 2010, pp. 1455-1469. doi:10.1080/03610918.2010.500108
  42. Y. Kang, T. Techanukul, A. Mantalaris and J. Nagy, “Comparison of Three Commercially Available DIGE Analysis Software Packages: Minimal User Intervention in Gel-Based Proteomics,” Journal of Proteome Research, Vol. 8, No. 2, 2009, pp. 1077-1084. doi:10.1021/pr800588f
  43. Y. Chao, H. Zengyou and Y. Weichuan, “Comparison of Public Peak Detection Algorithms for Maldi Mass Spectrometry Data Analysis,” BMC Bioinformatics, Vol. 10, No. 4, 2009.