Content-Based Image Retrieval (CBIR) from a large database is becoming a necessity for many applications such as medical imaging, Geographic Information Systems (GIS), space search and many others. However, the process of retrieving relevant images is usually preceded by extracting some discriminating features that can best describe the database images. Therefore, the retrieval process is mainly dependent on comparing the captured features which depict the most important characteristics of images instead of comparing the whole images. In this paper, we propose a CBIR method by extracting both color and texture feature vectors using the Discrete Wavelet Transform (DWT) and the Self Organizing Map (SOM) artificial neural networks. At query time texture vectors are compared using a similarity measure which is the Euclidean distance and the most similar image is retrieved. In addition, other relevant images are also retrieved using the neighborhood of the most similar image from the clustered data set via SOM. The proposed method demonstrated promising retrieval results on the Wang Database compared to the existing methods in literature.
Content-Based Image Retrieval (CBIR) is a technique to search and index images in a large collection database based on their visual contents like colors, textures, shapes or spatial layouts instead of using tags or other descripting metadata keywords that might associate with the images in the database [
Typically, most CBIR systems work by extracting one or more multi-dimensional vectors from each image in the database, this process is done in a posterior step to start retrieving. At query time, the same vectors are usually extracted from the query image and a similarity based function is used then to quantify the amount of difference between the query image vector and other images vectors in the database. Images that have similar vectors to the query one are finally retrieved as a result.
Content Based Image Retrieval finds its applications in many domains such as medical diagnostics, GIS and military applications, pattern recognition, computer vision and many others [
Color features are widely used in CBIR systems as they are independent of image size and orientation [
Stricker and Orengo [
Liu et al. [
In this paper, we decomposed the HSV images using the Discrete Wavelet Transform (DWT) and then quantized the resulted approximation sub band to extract a set of dominant coefficients to form the color vector.
Extracting the color vectors are easy to compute and don’t take long processing time. However, depending on them as a sole factor for deciding the images similarity will usually result it retrieving images with similar color distributions regardless their contents similarity. So extracting texture vectors, which represent the spatial arrangement of pixels in the grey level [
The wavelet-based methods, e.g. standard wavelet t and Gabor wavelet, are the most commonly used techniques to extract the texture vectors as they provide better spatial information [
The main motivation of this work was to retrieve images that best match the query image in colors and textures. So we suggested clustering the images based on their color vectors to group images with similar color characteristics in the same cluster. The decision on images similarity was made by calculating the Euclidean dis- tance between the query image’s texture vector and the database images’ texture vectors. So the most texturally similar image (I), which is the one that has the minimum Euclidean distance from the query image, was first retrieved and used to identify the index of the cluster within which the search for further similar images was bounded. Results showed that the proposed method allowed retrieving images with better precession average values than others reported in literature [
The rest of this paper is organized as follows: section 2 explains the proposed method, Experimental results and discussions are given in section 3. And section 4 concludes the work.
Two kinds of vectors were extracted from each image in the database. The first vector held the color information while the other one was used for the texture information. Images were then clustered according to their color vectors and this process yielded in grouping images with similar color trends in the same and neighboring clusters as we applied the Self Organizing Map (SOM) clustering technique to make use of its topology preserving property [
1) Pre-processing: for each image in the database
a) Extract the Color Vector by:
i) Converting the image into the HSV colors space;
ii) Decomposing the image using the DWT for (2) levels;
iii) Quantizing the coefficients the of first color channel of the Approximation (LL2) sub band using SOM;
iv) Taking the most dominant (16) coefficient as color vector;
b) Extract the Texture Vector by:
i) Converting the image into grey scale image;
ii) Decomposing the image using the DWT for (2) levels;
iii) Computing the mean value for each block of pixels for all resulted sub bands.
2) At query time
a) Extract the texture vector in the same way used for the database images;
b) Compare the texture similarity between the query image and the database images;
c) Retrieve the most similar image (I);
d) Define the cluster within which (I) is located;
e) Retrieve other most texturally similar images from the same cluster of (I).
Discrete Wavelet Transformation (DWT) decomposes (analyzes) the image (signal) into a set of approximations and details by passing it through two complementary filters (high pass (H) and low pass (L) filters) [
1) The Approximation sub band (LL): describes the low frequency components in the horizontal and vertical directions of the image. It presents the general trend of pixel values (wavelet approximation of the original image);
2) The Horizontal detail sub band (LH): describes the low frequency components in the horizontal direction and the high frequency components in the vertical direction, represents the Horizontal edges;
3) The Vertical detail sub band (HL): describes the high frequency components in the horizontal direction and the low frequency components in the vertical direction, it detects the Vertical edges;
4) The Diagonal detail sub band (HH): describes the high frequency components in both directions, detects the corners.
All of these sub bands can be reassembled back to reproduce the original image without loss of information in a process called reconstruction or synthesis.
DWT is able to decompose the image (R × C) into 4 sub bands with lower spatial resolution (R/2 × C/2) by down sampling it by a factor of (2). However, for each level of decomposition (N) a hierarchal structure of different frequency sub bands (3N + 1) will result, i.e. three levels of decomposition results in (10) different frequency sub bands as shown in figure 3.
We used the HSV color space to extract the color vectors from each image in the database as it was widely used in the previous works [
We have generated a general approximation for every HSV image by decomposing it for two levels. Decomposing images for more levels will result in more generalization (more down samplings) and more loss of details
which will affect the retrieving results. Decomposition process can be concluded by the following two steps:
1) Decompose the image for the first level, this step produces the first (4) sub bands (LL1, LH1, HL1, HH1) and each of which was down sampled by a factor of (2).
2) Decompose the resulted Approximation sub band (LL1) for another level to produce (LL2, LH2, HL2, HH2) and each of which was also down sampled by a factor of (2).
We then quantized the coefficients of the first color channel (Hue) of the (LL2) sub band (as it represents a
general approximation of the image (image icon)) to obtain the most dominant (16) coefficients as discriminating features for the color vector.
We applied the proposed technique on the Wang database after resizing the images to (256 × 256), so the resulted (LL2) sub band had a (64 × 64) size and we found that extracting more than (16) coefficients increased the computation time without adding any significant improvement on the retrieved results.
After extracting the color vectors from the database images, they were clustered in order to group images with similar color characteristics in the same cluster. Quantization and image clustering were both done using the Self Organizing Map (SOM) technique as neural networks algorithms proved many advantageous in vector quantitation [
SOM is a competitive unsupervised learning clustering technique that is used to classify unlabeled data into a set of clusters displayed usually in a regular Two-Dimensional array [
Each neuron in the input layer is fully connected to every neuron in the output layer using weighted connections (synapses). The output layer structure can have a raw form (One-Dimensional), lattice form (Two-Dimen- sional) or they can be arranged in Three-Dimensional mesh of neurons.
At the beginning a random value is assigned for each of the output neuron (W) vector elements (W = w1, w2, ∙∙∙, wn), these values correspond to the overall density function of the input space and are used to absorb similar input vectors which also have the same dimensionality as the output neurons. Similar input vectors are found according to a predefined similarity measure function, usually the Euclidean distance [
D(w, d): Distance function.
wv: the output neuron vector which consists of (n) number of features.
dv: the training input vector which consists of (n) number of features.
In training phase, each training input vector dv (all coefficients of the 1st color channel of the LL2 in the case of color quantization, while the set of all color vectors in the case of image clustering) seeks to find its best similar output neuron Best Matching Unit BMU, which is the one that has the minimum Euclidean distance from
the currently training input vector, then the weight of the (BMU) and its topological neighbors vectors are stretched toward this training vector according to the following updating rule [
e: epoch number as the training phase runs for a specified number of epochs.
wn(e + 1): The weight of neuron ( n) in the next epoch.
wn(e): The weight of neuron ( n) in the current epoch.
α(e): Learning rate at current epoch.
hn(e): The neighbor kernel around the BMU, it defines the region of influence that a training vector has on the SOM.
d(e): The weight of the selected input vector in the current epoch.
By the end of the training phase, the output neurons will have weights that are actually formed by the input space.
In this paper, we used SOM for two purposes: 1) To extract the most dominant coefficients (quantize the coefficients); 2) To group similar images into clusters. In both cases we used the two-dimensional (Grid) output layer structure. However, one input neuron and 16 output neurons were used to extract the most dominant (16) coefficients from the first color channel of the Approximation sub band (LL2). And (16) input neurons (as each color vector has 16 elements) and (9) output neurons were used to cluster images colors vectors for (9) clusters. The number of the extracted coefficients and the number of clusters were experimentally chosen taking into account the computation time.
To extract the texture vector, all images were converted to grey scale images as [
The proposed method has been tested using the Wang database, which has (1000) images in the JPEG format and categorized into (10) categories (African People, Beach, Building, Buses, Dinosaurs, Elephants, Flowers, Horses, Mountains and Food).
First of all, images were resized to (265 × 265) and converted from the RGB color space into the HSV color space. They were down sampled then by decomposing them for two levels using the DWT. As a result, the size of each of the resulted sub bands became (64 × 64).
The first color channel of the LL2 sub band is then quantized using SOM and the most dominant (16) coefficients were selected to form the color vector. After extracting the color vectors, images were clustered into a set of (9) clusters using also the SOM technique, to group images with similar colors characteristics in the same and neighboring clusters.
To extract the texture vectors, all images were converted to grey scale images and also decomposed for two levels using the DWT. The mean value for each (8 × 8) block of the all level (2) sub bands coefficients were computed and stored as texture vectors so each image is stored as vector with (64) elements.
For every query image, the texture vector is extracted by the same way used for extracting the database images texture vectors and the most similar image to the query one is then retrieved by calculating the Euclidean Distance between the query (Q) image texture vector and every texture vector (M) in the database according to the following Equation:
where:
D: Distance between the Query image texture vector (Q) and image texture vector (M) in the database.
f : the features index in the texture vector.
The most similar image (the one that has the smallest distance) to the query image is first retrieved while other texturally similar images are retrieved from the same cluster of that image. Each of the images in the database was taken as a query image and compared to the other (999) images while the top (5) similar images were just retrieved. The performance of the retrieval system was measured by calculating the precision value according to the following Equation as it was found in [
where:
R: The number of retrieved relevant images.
T: The total number of retrieved images.
Experiments were done by retrieving the most similar images to the query image from the same cluster of the most similar image. Other experiments were also done by retrieving images from neighboring clusters in order to experimentally determine the best retrieval results (figure 5).
The results showed that retrieving images from the same cluster of the most similar image (I) gives the highest average precision values i.e. the best retrieving results as the retrieving process was focusing on selecting the most texturally similar images from a cluster that bounds images with so much similar colors, unlike retrieving from more than one cluster where the focus was on retrieving the nearest texturally similar images from clusters that have images with different degree of colors similarity.
Selecting from two clusters (the cluster that has the most similar image and one of its four neighbors) gives the lowest precision values as we took the average value for four retrieving experiments i.e. retrieving from one of the 4 neighbors at a time, the same was done for retrieving from 3 and 4 clusters, and this indicates that the location of the neighboring cluster from which the images were selected affect the retrieving results as the num-
ber of similar images that might be grouped in each of the neighboring clusters differs from one cluster to another.
Our results were also compared with others reported in [
In this article, we proposed a method to retrieve relevant content based images using both color and texture vectors. Images were first clustered based on their most dominant (16) color coefficients, while images texture vectors were extracted by converting them to grey images, decomposing them for two levels using the DWT and calculating the mean value for each block of pixels from the (4) sub bands of level (2).
Results showed that the proposed method is able to retrieve images with higher average precision values than other methods proposed in literature by just comparing the texture similarity and without any need to compare color similarities as images are already grouped according to their colors and the top 5 similar images are retrieved from the same cluster of the image that has most similar texture features to the query image.
No. | Category | Precision | Precision [ | Precision [ |
---|---|---|---|---|
1 | African People | 27.8 | 27 | 13.3 |
2 | Beach | 54.2 | 33.4 | 26.15 |
3 | Building | 34.4 | 35 | 11.05 |
4 | Buses | 52.6 | 32.2 | 17.25 |
5 | Dinosaurs | 52.6 | 32.2 | 17.25 |
6 | Elephant | 55.6 | 38.4 | 34.9 |
7 | Flowers | 82.8 | 29.6 | 49.5 |
8 | Horses | 74.8 | 34.6 | 20.8 |
9 | Mountains | 50 | 38 | 25.9 |
10 | Food | 30.4 | 40 | 15.6 |
Average | 55.88 | 33.86 | 31.09 |
Query Image | Top 5 Retrieved images | ||||
---|---|---|---|---|---|
Other techniques may be applied in the future by using stochastic artificial neural network like Restricted Boltzman Machine (RBM) to extract features that might help in matching more accurate results.