Convolutional Neural Networks (CNN) has been a very popular area in large scale data processing and many works have demonstrate that CNN is a very promising tool in many field, e.g., image classification and image retrieval. Theoretically, CNN features can become better and better with the increase of CNN layers. But on the other side more layers can dramatically increase the computational cost on the same condition of other devices. In addition to CNN features, how to dig out the potential information contained in the features is also an important aspect. In this paper, we propose a novel approach utilize deep CNN to extract image features and then introduce a Regularized Locality Preserving Indexing (RLPI) method which can make features more differentiated through learning a new space of the data space. First, we apply deep networks (VGG-net) to extract image features and then introduce Regularized Locality Preserving Indexing (RLPI) method to train a model. Finally, the new feature space can be generated through this model and then can be used to image retrieval.
In traditional CBIR systems, low-level features such as the color, shape and texture features are usually extracted to construct a feature vector for describing images and then, based on a proper similarity measure, images are retrieved by comparing the feature vector corresponding to the query image and those corresponding to images in the data set. Generally, there are three key issues in CBIR systems, (1) selecting appropriate feature extraction method, (2) extracting appropriate image features and (3) matching features with effective method. Many researchers devote most of their attention to the first issue. However, they usually fail to extract the internal structure contained in the features which is crucial for distinguishing data points. In our paper, we aims to find this internal structure from the original data space. Moreover, the convolutional neural network has been developing rapidly since 2012 when Krizhevsky et al. won the championship on the classification of the Image Net based on CNN [
Indeed, how to dig out the potential information contained in these features is another critical issue. We believe that there is a certain internal structural link between similar features. Thus, our main purpose is to find out this link and RLPI is a good choice in helping us with this research [
As mentioned before, we utilize the deep CNN for extracting abstract features from images. In our work, we utilize a VGG-net model with 5 nets for our image retrieval purpose. In here, these five nets are represented with five alphabets from A to E, respectively. The width of convolution layers starts from 64 in the first layer and then increases by a factor of 2 after each max-polling layer, until it reaches 512 and then maintains. In addition to convolution layers, there are five max-polling layers. Although VGG-net contains five nets, the convolution layers and the pooling layers in these five nets have the same parametric settings. This strategy can make sure that the shape comes out of each convolution layer group is consistent, no matter how many convolution layers are added in the convolution group.
Many studies demonstrate that deeper networks can achieve better performance. However, training deeper networks not only dramatically increases the computational requirements but also needs stringent hardware support. In our work, we utilize VGG-net model to extract image features. In order to implement this network with moderate computing requirements, each image is re- scaled to the same size of 224 × 224, which is then represented with a vector of 4096 in dimension in terms of the network after removing the FC-layer.
LPI is proposed to find out the discriminative inner structure of the document space and extract the most discriminative features hidden in the data space. Given a set of data points and a similarity matrix. Then LPI can be obtained through solving the following minimization problem:
where
where
As the objective function will generate a heavy penalty if neighboring data points
Thus, the minimization problem in Equation (1) can be changed to the following problem:
and the optimal
However, according to [
The following theorem can be used to solve the eigen-problem in Equation (5) efficiently:
Let
with eigenvalue
Based on this theorem, the direct computation of the eigen-problem in Equation (5) can be avoided and the LPI basis function can be acquired through the following two steps:
1) Solve the eigen-problem in Equation (6) to get
2) Find
where
where
where
In order to evaluate the performance of our proposed method, experiments are conducted on Caltech-256 dataset. For the purpose of comparisons, results from other methods are also presented.
Caltech-256 dataset contains 29780 images in 256 categories. We select images from the first 70 classes of the caltech-256 dataset to construct a smaller dataset (referred to as Caltch-70 here and after) consisting of 7674 images for our experiment. We select 500 images randomly as the queries and the remaining as the targets for search.
Three measures are used to evaluate the performance of different algorithms. The first one is the precision defined as:
where
The precision tells us the rate of relevant images in total retrieved images in a particular search. However, sometimes, we want to get more relevant images from the database rather than just a very high precision. Thus, the recall is also an important measure of the performance of different algorithms. We define the second measure recall as:
where
A. Comparisons with Hash Type Methods
In this section, we compare our proposed method (VGG-RLPI) with many hash type methods [
B. Comparisons with Dimension Reduction Methods
As has mentioned above, the RLPI is also a dimension reduction method. Thus, comparisons are also performed with two other dimension reduction methods: the Principle Component Analysis (PCA) method and the Linear Graph Embedding (LGE) method. For fair comparisons, retrieval experiments are first performed with reduced feature vectors obtained from these three methods to determine the optimal dimension.
In this paper, a novel method utilizing the deep CNN and the RLPI is proposed for image retrieval. Since the CNN features have both abstract and global properties in FC-layers, it can well represent an image and also has a good discrimination ability in both classification and information retrieval tasks. However, using the features extracted from CNN to perform pattern matching directly is inefficient. On the other hand, the RLPI can learn a new feature space which is
more discriminative compare to the original features. Experiments results in the Caltech-70 datasets show that our proposed method outperforms existing hash based methods and two other popular dimension reduction methods.
Ma, X.X. and Wang, J.J. (2017) Image Retrieval Using Deep Convolutional Neural Networks and Regularized Locality Preserving Indexing Strategy. Journal of Computer and Communications, 5, 33-39. https://doi.org/10.4236/jcc.2017.53004