p of interval selection for a unique peak around 1660 cm−1.
Although the highest value of each peak usually occurs on a single frequency point, this single frequency of the data was not exactly the same among all the signals within a class. For example, one signal might have ten points between frequencies of 500 to 504 cm−1,
Figure 5. Raman Spectroscopy for different cancer tissue samples: (a) Breast; (b) Kidney; (c) Testis; and (d) Normal breast tissue.
Figure 6. Flow chart of the normalization process.
Figure 7. Raman Spectroscopy results (a) before and (b) after the normalization.
while another had just four. Therefore a small interval was selected for identifying each unique peak, rather than choosing a single point. For example, if the frequency 1660 cm−1 was selected as a unique peak, the points between 1660 − δ and 1660 + δ had to be selected and studied. The study of each peak was based on the average of signal’s magnitude on a frequency range of 2δ (identified as PKAvgVal, centered at the preselected unique peak’s frequency. The selection of δ was also critical. On one hand, it had to be some value that could include more than two points to give more accuracy; on the other hand, it had to be not too large to make a range wider than the peak’s width, causing a decrease in the overall value of the peak. The value that was selected in this work was smaller or equal to the width of half magnitude of the smallest unique peak (2.5 frequency units).
In the next step, each signal was converted to an n-dimensional point, where n indicates the number of unique peaks. The frequency of each peak was selected as a dimension and the magnitude of each signal’s PKAvgVal was selected as the value of the according point with respect to that dimension. As an example, in the last part of Figure 8, if the frequency of 1660 cm−1 is selected as a unique peak, it represents one of the dimensions in n-dimensional space of peaks. It is clear from that figure that the PKAvgVal of the Quarts signal in this dimension would be around zero, while the PKAvgVal of Kidney Tumor would be around 1. The classification of the data was done by studying the location of training classes’ points on n-dimensional space. Figure 9 shows an example of such points when n equals 3.
In this work, two sets of peaks were chosen, one for detecting tumors over quarts and the other for distinguishing breast cancer tumor from other studied types. The first set included frequencies of 465, 853, 1003, and 1657 cm−1 and the second set had the frequencies of 420.3, 640.5, and 1778.5 cm−1. It should be noted that if the selected unique peak does not show enough difference between classes including this peak might increase the classification error because including such a peak will make the points of the classes to get closer to each other in that dimension and reduce the overall distances compared to the Parzen Window length.
Parzen Window Classifier was used to classify a testing data by converting it to an n-dimensional point (Figure 10). In order to show the accuracy of this method, the Parzen Window was applied on each single signal of either Quarts or Tumor class. Three hypercube edge lengths (h) of 2, 1, and 0.5 were selected for Parzen window. Table 1 shows the average difference between the calculated PDF of the signals with respect to the right and wrong class for truly detected signals. Although this table shows that the smaller h the higher difference in the probabilities to be achieved, decreasing h however may not be desirable. Suppose there is a point that should be classified in class B. If this point is close to some points of class A, decreasing h will ensure more that this data should be classified in class A, which is not right.
To show an example of how much adding a “unique peak” could affect the overall result, two cases of including a peak at frequency of 465 cm−1 and excluding it were studied (Table 2). The increase of the accuracy of tumor detection with including this peak indicates that the tumor signals have very close PkAvgVal and different from quarts signal PkAvgVal at this dimension. Likewise, the decrease of the accuracy of Quarts detection indicates that the Quarts signals are spread on this dimension and some are closer to the Tumor than the
Figure 8. An example of selecting and modifying a range to magnify a peak at 1660 cm−1 as a unique peak for tumor. (a) shows first selection of the range; (b) shows eliminating some parts of sharp change before frequency of 1700 cm−1; (c) shows eliminating the neighbor points with values higher than the unique peak.
Figure 9. An example of converting the signals to n-dimentional points when n = 3.
Figure 10. Detection of tumor by converting the data to threeand two-dimensional spaces made of unique peaks at (a) 465 cm−1; (b) 852 cm−1; and 1003 cm−1.
Table 1. Effect of the value of hypercube edge length on the average difference of PDF of testing signal with respect to the right and wrong classes.
Table 2. Comparison between the results received for spaces including and excluding the peak located at frequency of 465 cm−1.
The above data presents the software results for the various types of cancer tissues. However, the objective of this work is to diagnose whether the tissue is cancerous or not, these graphs show the software capability in accurately detecting various cancerous tissues.
The above algorithm was also used for three frequencies that distinguish between Quarts and Tumor classes, including breast cancer, and the data of each group. The testing data is given in Figures 10 and 11 shows the effect of studying the patterns at various frequency shifts.
5. MINIMUM SAMPLE SIZE DETECTION
A test was performed for a number of cancer tissues in order to find the minimum size cancer tissue that can be detected via the laser beam of the Raman spectroscopy. In this test, a few micron tissue size, which is smaller than the laser beam diameter has been tested. The tissue was placed on the quartz tube and the test was run for more than an hour since the beam power was low. As it can be seen from Figure 12, the results for three various samples are reported and clear distinctions for the Raman shifts are observed. The same tissue with smaller size diameters is reported in Figure 13. There were no clear shifts designated from the quartz tube. The extraction between the sample and the quart carrier is just noise. In conclusion, Raman shift is capable to detection minimum size samples higher than the laser beam diameters. Typical size beams range in the order of microns, which is advantageous over mammography in that aspect.
6. CONCLUSIONS AND FUTURE WORK
In this work, a procedure was developed to classify Raman Spectroscopy signal for the diagnosis of cancer tissue. As indicated in Figure 5 above, the cancerous tissues have different Raman spectroscopy from the normal human tissue, leading to proper approach of diagnosis. The raw Raman Spectroscopy data was transformed to a 3-D visible form that has the ability to distinguish various types of cancer tissues. This approach was realized by normalizing each signal, converting it to
Figure 11. Detection of tumor by converting the data to three and two dimensional spaces made of unique peaks at (a) 420.3 cm−1; (b) 640.5 cm-1; and (c) 1778.5 cm−1.
Figure 12. Sample 134492: Cancer tissue samples of larger size than the laser beam.
Figure 13. Same samples as in Figure 11, but of smaller size than the laser beam. The red oval represents the approximate shape and position of the laser over the sample. Subtraction of the quartz background reveals no significant signals.
an n-dimensional point, and classifying it with the Parzen-Window method. The peak point selection for each interval class affects the choice of study intervals and value of each point at each dimension. Therefore this procedure is very data dependent and like every other classifier could be improved by increasing the training data set. Since the available data sets were few, there is room for improvement in the selection of peaks. However, the process could be used even for higher number of training data. Based on the available data, it was observed that each of kidney and breast tumors signals have their own unique peaks. The minimum size detected via Raman Spectroscopy has showed promises for early detection cancer diagnosis before spreading out in the human body. Additional investigation is needed for more sample tissues and verifications for various cancer tissues. The laser power is also important to be considered for further study. The work presented here would be valuable to many cancer researchers including those who develop equipment for in vivo diagnosis.
The team of researchers assembled here will next pursue an experimental model that assembles human body members. With that, a research scheme will be developed to distinguish a single scattered data among others. For instance, in case of breast cancer detection, a model that combines cancer tissue, bone, fat cells, etc, should be investigated. The completion of this phase will be pursued by a servo mechanism system for the practical realization of the investigation. Such in vivo approach is reserved for future considerations.
The authors appreciate the assistance of the Indiana University Simon Cancer Center for their support in providing the cancer tissue samples used for this research. The authors also thank Mr. Josh Reid for this time given in the Raman Spectroscopy Laboratories. Special thanks go to Dr. Eliza Du for her guidance in the pattern recognition software, and Dr. Paul Salama for his assistance with the manuscript.
Sample of tissue information provided by the the IU Med School.