where refers to cardinality of set, f(k, l) is intensity at pixel position (k, l) in the image of order and the order of matrix D is .
Using Co-occurrence matrix, features can be defined which quantifies coarseness, smoothness and texture— related information that have high discriminatory power.
Among them , Angular Second Moment (ASM), Contrast, Correlation, Homogeneity and Entropy are few such measures which are given by:
ASM is a feature that measures the smoothness of the image. The less smooth the region is, the more uniformly distributed P(I1, I2) and the lower will be the value of ASM. Contrast is a measure of local level variations which takes high values for image of high contrast. Correlation is a measure of correlation between pixels in two different directions. Homogeneity is a measure that takes high values for low-contrast images. Entropy is a measure of randomness and takes low values for smooth images. Together all these features provide high discriminative power to distinguish two different kind of images.
All features are functions of the distance d and the orientation θ. Thus, if an image is rotated, the values of the features will be different. In practice, for each d the resulting values for the four directions are averaged out. This will generate features that will be rotations invariant.
4. Experimental Setup and Results
In this section, we investigate different combination of feature extraction methods and classifiers for the classification of two different types of MRI images i.e. Normal image and Alzheimer image. The feature extraction methods under investigations are: Features based on First and second order statistics (FSStat), Features using Daubechies-4 (Db4) as described by Chaplot et al.  and Haar in combination with PCA (HaarPCA) as described by Dahshan et al. . We will explore the classifiers used by Chaplot et al.  (SVM with linear (SVM-L), polynomial kernel (SVM-P) and radial kernel (SVM-R)), Dahshan et al.  (K-nearest neighbor (KNN) and Levenberg-Marquardt Neural Classifier (LMNC)) and C4.5. The polynomial kernel of SVM is used with degrees 2, 3, 4 & 5 and best results obtained in terms of accuracy are reported. Similarly radial kernel (SVM-R) is used with various parameters 10i where I = 06 and only results corresponding to highest Accuracy is reported. Description of LMNC and remaining classifiers can be found in  and  respectively.
Textural features of an image are represented in terms of four first order statistics (Mean, Variance, Skewness, Kurtosis) and five-second order statistics (Angular second moment, Contrast, Correlation, Homogeneity, Entropy). Since, second order statistics are functions of the distance d and the orientation, hence, for each second order measure, the mean and range of the resulting values from the four directions are calculated. Thus, the number of features extracted using first and second order statistics are 14.
To evaluate the performance, we have considered medical images from Harvard Medical School website . All normal and disease (Alzheimer) MRI images are axial and T2-weighted of 256 × 256 size. For our study, we have considered a total of 60 trans-axial image slices (30 belonging to Normal brain and 30 belonging to brain suffering from Alzheimer’s disease). The research works [7-10] have found that the rate of volume loss over a certain period of time within the medial temporal lobe is a potential diagnostic marker in Alzheimer disease. Moreover lateral ventricles are on average larger in patients with Alzheimer’s disease. Hence, only those axial sections of the brain in which lateral ventricles are clearly seen are considered in our dataset for experiment. As temporal lobe and lateral ventricles are closely spaced, our axial samples thus cover hippocampus and temporal lobe area sufficiently, which can be good markers to distinguish two types of images. Figure 2 shows the difference in lateral ventricles portion between a normal and an abnormal (Alzheimer) image.
In literature, various performance measures have been suggested to evaluate the learning models. Among them the most popular performance measures are following: 1) Sensitivity, 2) Specificity and 3) Accuracy.
Sensitivity (True positive fraction/recall) is the proportion of actual positives which are predicted positive. Mathematically, Sensitivity can be defined as
Specificity (True negative fraction) is the proportion of
Figure 2. Pyramidal structure of DWT up to level 3.
actual negatives which are predicted negative. It can be defined as
Accuracy is the probability to correctly identify individuals. i.e. it is the proportion of true results, either true positive or true negative. It is computed as
where TP: correctly classified positive cases, TN: correctly classified negative cases, FP: incorrectly classified negative cases and FN: incorrectly classified positive cases.
In general, sensitivity indicates, how well model identifies positive cases and specificity measures how well it identifies the negative cases. Whereas accuracy is expected to measure how well it identifies both categories. Thus if both sensitivity and specificity are high (low), accuracy will be high (low). However if any one of the measures, sensitivity or specificity is high and other is low, then accuracy will be biased towards one of them. Hence, accuracy alone cannot be a good performance measure. It is observed that both Chaplot et al.  and Dahshan et al.  used highly imbalance data whose classification accuracy was highly biased towards one. Hence, we have constructed balanced dataset (samples of both classes are in same proportion) so that classification accuracy is not biased. Two other performance measures used are training and testing time of learning model.
The dataset was arbitrarily divided into a training set consisting of 12 samples and a test set of 48 samples. The experiment is performed 100 times for each setting and average sensitivity, specificity, accuracy, training and testing time are reported in Table 1. The best results achieved for each classifier corresponding to different performance measure is shown in bold. All experiments were carried out using Pentium 4 machine, with 1.5 GB RAM and a processor speed of 1.5 GHz. The programs were developed using MATLAB Version 7 using combination of Image Processing Toolbox, Wavelet Toolbox and Prtools  and run under Windows XP environment.
We can observe the following from Table 1:
1) The classification accuracy with FSStat is significantly more in comparison to both Db4  and HaarPCA  for all classifiers.
2) Similar variation in observation is noticed with performance measure sensitivity.
3) For specificity, FSStat provide better results, except for classifiers SVC-P and LMNC, in comparison to both Db4 and HaarPCA.
4) The difference between sensitivity and specificity is
Table 1. Comparison of performance measures values for each combination of feature extraction technique and classifier.
large for both Db4 and HaarPCA in comparison to FSStat. Accuracy obtained using both Db4 and HaarPCA is more even though the sensitivity is low and specificity is high which suggest that classification accuracy obtained is biased.
5) The variation in classification accuracy with different classifiers is not significant with FSStat in comparison with both Db4 and HaarPCA.
6) The training time with FSStat is significantly less in comparison to both Db4 and HaarPCA. This is because the number of features obtained with FSStat is less and does not involve any computation intensive transformation like PCA in HaarPCA.
7) Testing time of an image is not significant in comparison to training time. However, testing time of an image is least with FSStat in comparison to both Db4 and HaarPCA.
From above, it can be observed that the performance of decision system using FSStat is significantly better in terms of all measures considered in our experiment.
5. Conclusions and Future Work
In this paper, we investigated features based on First and Second Order Statistics (FSStat) that gives far less number of distinguishable features in comparison to features extracted using DWT for classification of MRI images.
Since, the classification accuracy of a pattern recognition system not only depends on features extraction method but also on the choice of classifier. Hence, we investigated performance of FSStat based features in comparison to wavelet-based features with commonly used classifiers for the classification of MRI brain images. The performance is evaluated in terms of sensitivity, specificity, classification accuracy, training and testing time.
For all classifiers, the classification accuracy and sensitivity with textural features is significantly more in comparison to both wavelet-based feature extraction techniques suggested in literature. Moreover it is found that FSStat features are not biased towards either sensitivity or specificity. Their training and testing time are also significantly less than other feature extraction techniques suggested in literature. This is because First and Second Order Statistics gives far less number of relevant and distinguishable features and does not involve in computational intensive transformation in comparison to method proposed in literature.
In future, the performance of our proposed approach can be evaluated on other disease MRI images to evaluate its efficacy. We can also explore some feature extraction/construction techniques which provide invariant and minimal number of relevant features to distinguish two or more different kinds of MRI.