### Paper Menu >>

### Journal Menu >>

J. Software Engineering & Applications, 2010, 3, 593-602 doi:10.4236/jsea.2010.36069 Published Online June 2010 (http://www.SciRP.org/journal/jsea) Copyright © 2010 SciRes. JSEA Scalable Varied Density Clustering Algorithm for Large Datasets Ahmed Fahim1, Abd-Elbadeeh Salem2, Fawzy Torkey3, Mohamed Ramadan4, Gunter Saake1 1Faculty of Information, Otto-Von-Guericke University, Magdeburg, Germany; 2Faculty of Computers & Information, Ain Shams University , Ca iro, Egypt; 3Kafrelshiekh University, Kafrelshiekh, Egypt; 4Faculty of science, Menofyia University, Shebeen Elkoum, Egypt. Email: ahmmedfahim@yahoo.com Received November 11th, 2009; revised December 8th, 2009; accepted December 11th, 2009. ABSTRACT Finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Herein a n ew scalable clustering technique which addresses a ll these issues is proposed. In data mining, the purpose of data clustering is to identify useful pa tterns in the underlying dataset. Within the last several yea rs, many clustering alg orithms have been pr oposed in t his area of researc h. Among all these proposed methods, density clusterin g methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities, where clusters are defined as regions of typical densities separated by low or no density regions. In this paper, we aim at enhancing the well-known algorithm DBSCAN, to make it scalable and able to discover clusters from uneven datasets in which clusters are regions of homogenous densities. We achieved the scalability of the proposed algorithm by using the k-mea ns algorithm to get initial partitio n of the dataset, applying the enhanced DBSCAN on each partition, and then using a merging process to get the actual natural number of clusters in the underlying dataset. This means the proposed algorithm consists of three stages. Experimental results using synthetic datasets show that the proposed clustering algorithm is faster and more scalable than the enhanced DBSCAN counterpart. Keywords: EDBSCAN, Data Clustering, Varied Density Clustering, Cluster Analysis 1. Introduction Over the past several years, even though the computing power has increased exponentially, hard-drive capacity has increased at an order of magnitude greater than that of processor power. Thus the capability to store data has greatly outpaced the capability to process it. As a result, large volumes of data have been generated. The result of this unceasing data collection is that organizations have become data-rich and knowledge-poor [1]. The main purpose of data mining is to extract knowledge from the data at hand, in other words data mining is the process of extracting hidden and interesting patterns from huge datasets. Data clustering is one of the promising techniques of data mining, which groups a set of objects into classes or clusters such that objects within a cluster have high simi- larity in comparison to one another, but they are dissimilar to objects in other clusters. Data clustering algorithms can be classified into four categories; (1) partitioning, (2) hie- rarchical, (3) density-based and (4) grid-based. However, some algorithms may fall into more than one category. By clustering one can identify dense and sparse regions and, therefore, discover overall distribution patterns. Finding clusters in data is challenging when th e clusters are being of widely differing sizes, shapes and densities and when the data contains noise and outliers. Although many al- gorithms exist for finding clusters of different sizes and shapes, there are few al gorithms that can detect clusters of different densities. Basic density based clustering techniques such as DBSCAN [2] and DENCLUE [3] treat clusters as regions of high typical densities separated by regions of no or low densities. So they are able to suitably handle clusters of different sizes and shapes besides effectively separate noise and outliers. But they may fail to identify clusters with varying densities unless the clusters are totally se- parated by sparse regions. There are some algorithms which handle clu sters of different densities, like OPTICS [4] but it does not produce explicit clusters. Traditional Scalable Varied Density Clustering Algorithm for Large Datasets 594 ) DBSCAN algorithm sometimes has trouble with clusters of varying densities. An enhanced DBSCAN algorithm has been developed to discover varied density clusters from uneven datasets [5]. The disadvantage of this algo- rithm is its high computational complexity and it does not scale well with the size of huge datasets. It requires O(n log n); where n is the size of the input dataset. Its main advantages are discovering varied density clusters and requiring only one input parameter; (ma xpts) which de- termines a suitable value for Eps in DBSCAN for each cluster based on the local density. So it is very important to exploit these to two golden advantages and improve the scalability of this algorithm. The original DBSCAN al- gorithm has been merged with k-means algorithm in KIDBSCAN [6], and DBSK [7], and with CLARANS algorithm in [8]. This paper proposes an algorithm that merges among partitioning, de nsity, a nd h iera r chical ba sed m etho ds. It is based on ideas extracted from k-means [9], enhanced DBSCAN [5], and CURE [10]. The enhanced DBSCAN selects a suitable value for its parameter Eps in each cluster, based up on the local density of the starting point in each cluster, and adopts the traditional DBSCAN for each value of Eps. The idea of the enhanced DBSCAN algorithm depends on discovering the highest density clusters at first, and then the Eps is adapted to discover next low density clusters with ignoring the previously clustered points. For more details, one can refer to [5]. The proposed algorithm improves the scalability of the EDBSCAN algorithm via partitioning the dataset in order to reduce the search space of the neighborhoods. Instead of examining the whole dataset, the EDBSCAN searches only in the objects within each partition. Merging stag e is needed to get the final natural number of clusters in the underlying dat a se t . The rest of this paper is organized as follows; some related works are reviewed in Section 2. Section 3 pre- sents the proposed algorithm and describes in details the three stages of it, and analyzes its time complexity. Sec- tion 4 presents some experimental results on different datasets to show the performance of the proposed algo- rithm. Finally the conclusion is presented in Section 5. 2. Related Work Clustering is the organization of the objects in a dataset D into homogeneous and separated groups with respect to a distance or a similarity measure. Its ultimate objective is to assign the similar objects to the same cluster or group, and dissimilar ones to different clusters. Clustering me- thods can basically be classified into two main types; par- titioning and hierarchical base d methods [11]. Partitioning algorithms construct a partition of a dataset D of n objects into a set of k clusters; k is an input parameter for these algorithms. A partitioning algorithm typically starts with an initial partition of D and then uses an iterative control strategy to optimize an objective function. The square error criterion, defined below in (1), is the most com- monly used (mi is the mean of cluster Ci). 2 1( k i ipc i p m (1) The square -error is a good m easure of the within cluster variation across all the partiti ons. The objective herein is to find k partitions that minimize the square error. Thus, square error clustering tries to make the k clusters as compact and se parated as possibl e and it wo rks well whe n clusters are compact clouds that are rather well separated from one another. Each cluster is represented by the gra- vity center of the cluster (k-means algorithm) or by the object that is the most centrally located in the cluster (k-medoids algorithms). Consequently, partitioning algo- rithms use a two-step procedure. First, they determine k representatives minimizing the objective function. Second, they assign each object to the cluster with its representa- tive “closest” to the considered object. This type of algo- rithms discovers only spherical shaped clusters of similar sizes, and requires k as input parameter. Hierarchical algorithms create a hierarchical decom- position of D. The hierarchical decomposition is repre- sented by a dendrogram; a tree that iteratively splits D into smaller subsets until each subset consists of only one object. In such a hierarchy, each node of the tree repre- sents a cluster of D. The dendrogram can either be c reated from the leaves up to the root (agglom erative approach ) or from the root down to the leaves (divisive approach) by merging or dividing clusters at each step. Agglomerative hierarchical clustering (AHC) is more stable but its time and memory space requirements are consistently higher. Therefore it is unfeasible for a large dataset. Moreover, for examples, the single link approach is very susceptible to noise and di fferences in de nsit y. While grou p avera ge and complete link are not as susceptible to noise, but they have trouble with varying densities and cannot hand le clusters of different shapes and sizes [12]. Another hierarchical algorithm called CURE [10] has been proposed, it stops the creation of a cluster hierarchy if a level consists of k clusters, where k is one of several input parameters. It utilizes multiple represen tative points to evaluate the dis- tance between clusters. Thereby, it is adjusting well to arbitrary shaped clusters and avoiding the chain effect problem of th e sin g le-link. Th is results in go od clu stering quality. But this algorithm has several parameters. The parameter setting does have a profound influence on the result. Besides the partitioning and hierarchical approaches, density based clustering methods such as DENCLUE [3] and DBSCAN [2] for m a third clustering type. These are often used in data mining for knowledge discovery. Den- sity-based clustering uses a local cluster criterion, in Copyright © 2010 SciRes. JSEA Scalable Varied Density Clustering Algorithm for Large Datasets595 which clusters are defined as regions in the data space where the objects are dense, and remain, separated from one another by low-density regions. The basic idea of the DBSCAN algorithm is that for each point of a cluster the neighborhood of a given radius (Eps) has to contain at least minimum number of points (MinPts), where Eps and MinPts are input parameters. These two parameters are global for the dataset. Furthermore it is not easy to de- termine the best value for Eps, and hence DBSCAN can not discover clusters with varied density unless they are totally separated. Density-based clustering has some ad- vantages over k-clustering and AHC in discovering clus- ters of ar bitrary shapes and sizes. However, in previous studies, it was shown that current density based clustering works well only on a simple dataset where cluster densities are similar [4]. Density based clustering is important for knowledge discovery in databases. Its practical applications include biomedical image segmentation [13], molecular biology and geos- patial data clustering [14], and earth science tasks [2]. The enhanced DBSCAN algorithm [5] has previously been proposed to solve the problem of varied density clusters. In this paper, we improve the scalability of the enhanced DBSCAN by first partitioning the dataset to reduce the search space of the neighborhoods, then ap- plying the enhanced DBSCAN on each partition sepa- rately, and finally applying merging procedure to get the actual number of clusters in the whole dataset. 3. The Proposed Algorithm In this section, we describe the details of the proposed algorithm which called scalable enhanced DBSCAN (SEDBSCAN). This algorithm composed of three main stages; the first stage is to partition the dataset into k su- per-clusters, the second stage is to find out the sub-clus- ters within each partition (super-cluster), the third stage is to find out the natural number of clusters by merging dense sub-clusters from different partitions (super-clus- ters). Figure 1 depicts these three stages. 3.1 Partitioning Stage The main purpose of this stage is to partition the un- derlying dataset into a finite number of smaller datasets, because most algorithms perform efficiently well with small datasets more than large datasets. So this stage will improve the scalability of th e proposed algorithm. In this stage, discovering varied-shaped clusters is not of high importance, but the most important issue is getting the initial partitions as soon as possible. To fulfill this goal, an algorithm with time complexity O(n) should be used. Figure 1. Main stages of the scalable EDBSCAN k-means algorithm [9] is the best choice for th is stage. In the following, a brief rev iew about k-means is presented. The k-means is classified as a partitioning method that requires only one input parameter, k, which represents the number of clusters. T he main s teps of this algor ithm are as follows: Input: the number of clusters k, and a dat aset contai ning n objects. Output: a set of k clusters which minimizes the square error as in Equation (1). Method: 1) Arbitrary select k centers as the initial solution. 2) Compute membership the objects according to the current solution. 3) Update cluster centers according to new member- ships of the objects. 4) Repeat steps 2 and 3 until converg en ce. The k-means is scalable and efficient in processing large datasets due to its low time complexity (i.e. O(n)). Since the k-means works well with spherical shaped clusters of similar size, we expect that the result of this stage is not always correct. Consider the following ex- ample depicted in Figure 2. It is shown that the k-means produces six clusters. One can easily discover that this result is not really correct, as an actual cluster may be distributed over more than one partition, or some clusters are merged together. The second stage will handle each partition as a new separate dataset. We used a scalable version of the k- means algorithm. This version was previously presented in [9]. It reduces the number of computations in the sec- ond step of the original k-means. It achieved its scalabil- ity through redistributing the objects that became far from their previous centers, while the objects which be- come closer to their centers will not be redistributed. 3.2 EDBSCAN Each Partition This stage applies the enhanced DBSCAN on each parti- tion. The main advantage of this algorithm is that it has the ability to discover varied density clusters. So the proposed Figure 2. Initial partition resulted from k-means algorithm Copyright © 2010 SciRes. JSEA Scalable Varied Density Clustering Algorithm for Large Datasets 596 ure 3. algorithm will inherit this important advantage. A brief review about EDBSCAN [5] is presented. The EDBS- CAN algorithm is based on the DBSCAN [2] algorithm, but it surmounts the problem of fixed Eps in DBSCAN. The EDBSCAN needs two input parameters; they are Minpts and Maxpts, Minpts < Maxpts < 20. These two parameters det ermine the minim um and maximum densi ty for core points respectively. Maxpts also determines the Eps for each cluster according to the highest local density of its starting point. Minpts is fixed to 4 as in DBSCAN algorithm. Thus the user will set only one input parameter. The main steps in this algorithm are as follows: Input: Maxpts, and dataset containing n obj ects. Output: actual clusters discovered from t he input dataset. Method: 1) Find the k-nearest neighbors for each object p. (i.e. Nk(p)) and keep them in ascending order from p. 2) Set local density value for each object p as DEN(p,y1,…,yk) which represents the sum of distances among the object p and its k-nearest neighbors. 3) Rearrange the objects in descending order ac- cording to their local densities. 4) ClusId = 1. 5) Starting from the first unclassified object p in the sorted data do the following: a) Eps = distance to maxpts-nieghbor for the object p. b) Assign the object p to the current cluster (ClusId). c) Append its unclassified neighbor qi, wrt. Eps and Minpts to the seed list SLp in ascending order of their dis- tance to p, Continue expanding current cluster until no object can be assigned to it. 6) ClusId = ClusId + 1. Assign the next unclassified object to the current cluster and go to step 5 until all objects are classified. In EDBSCAN there are no border points as in DBSC- AN. So it treats small clusters as noise and discards them from the data. Experimentally, a small cluster is the cluster that has less than 0.006 of the size of the dataset. Figure 3 shows the sub-clusters discovered from each partition resulted from the first stage. This figure shows 16 sub-clusters discovered from the six initial partitions. From Figure 3 one can see that some sub-clusters are merged together to get the natural clusters in the under- lying dataset. To merge sub-clusters, the idea of repre- sentative points proposed by CURE algorithm [10] will be used, but not using the same technique of the CURE. The k-means algorithm will be used for selecting these representative points. Based on these representttatives, two clusters with the most near representatives will be merged in a hierarchical fashion until termination condi- tion is hold. This process will be don e in the third stage. 3.3 Merging Dense Clusters This stage aims to merge the nearest dense sub-clusters detected from applying the EDBSCAN on each partition in the second stage. As it is known there is no border point can be detected by EDBSCAN algorithm, since it uses minimum and maximum density for core points. Therefore every point in the given dataset is a core point, and the maximum density determines the Eps value in each cluster according to the density of region where the initial point resides. Referring to the method presented in [8], one can find that the DBSCAN algorithm can detect large number of border points in each cluster. We need smaller number of points from each cluster as representa- tives to reduce computational complexity of the merging stage. So the merging stage will search for k representa- tives from each sub-cluster using the k-means algorithm. This means, we apply the k-means algorithm on each generated sub-cluster from applying the EDBSCAN on each partition in the second stage. Referring to Figure 3 we have 16 sub-clusters, so the size of clusters is very small compared to the size of the whole dataset. There- fore, the time required to get these representatives using the k-means will be very small and can be neglected. We use these representatives for merging dense sub-clusters. Figure 4 depicts the k representatives for each sub-cluster shown in Fig Figure 3. Result from applying EDBSCAN on each partition Figure 4.The k representatives for each cluster generated from the second stage (k = 8) Copyright © 2010 SciRes. JSEA Scalable Varied Density Clustering Algorithm for Large Datasets597 Two dense sub-clusters with the most nearest repre- sentatives will be merged together in a hierarchical form. This idea is similar to that was propo sed by CURE algo- rithm [10]. We need a threshold to stop the merging process. This may be a simple problem because we al- ready have some information about the data from the previous stages; like the number of sub-clusters, Eps in each cluster (density level of each cluster), distances among cluster’s representatives. Intuitively, two sub- clusters in second stage are not allowed to be merged together if they are belonging to the same partitio n (sup er cluster) in the first stage. The algorithm arranges the merging distances in as- cending order. By examining the plot of distances, we can easily select threshold distance to stop the merging process. Figure 5 presents the merging distances plot for the sub-clusters in Figure 3. Since we have 16 sub-clusters from the second stage, we need 15 merging steps for merging all sub-clusters into a single cluster. We search for the gaps in distances plot. From Figure 5 we notice that only one gap between distances 28.7 and 47.6. So any value between these two values will be a threshold distance. The merging process is applied seven times so that the final number of clusters will be 16-7 = 9 clusters as shown in Figure 6. We can get rid of the smallest three clusters as noise, and return the remaining six clusters as a final result. 3.4 Complexity Analysis The execution time of the proposed algorithm is com- posed of three components; the first one is the time for executing the k-means algorithm on the entire dataset which is O(n), where n is the size of the input dataset. The second component is the time for executing the EDBSCAN algorithm on each part resulting from the k- means in the first stage, the EDBSCAN requires O(m2) if it doesn’t use an index structure like R*-tree, or O(m log m) if it uses R*-tree; where m is the size of the input dataset, and m is very smaller than n since m is the size of one part of the original dataset. The third component is the time for getting the k representatives from each sub- cluster, and the merging process, this time is very small and can be neglected, because we apply the k-means al- gorithm on each sub-cluster which is very small com- pared with the size of the original dataset, and the total number of representatives is also very small compared to the original dataset size. Furthermore, we don’t find the distances among representatives belonging to the same part in the initial partition. Hence the entire execution time of the proposed algorithm is O(n + m log m) which is definitely smaller than O(n log n). 4. Experimental Results In this section the performance of the proposed algorithm is evaluated. This algorithm has been implemente d in C++. Figure 5. Merging distances plot Figure 6. Natural number of clusters in dataset We have used m any synthetic data sets to test the pr oposed algorithm. The experiments have been done on seven different datasets containing 2D points. The first dataset has six clusters of different sizes, shapes, and orientation, as well as random noise points and special artifacts such as streaks running across clust ers, this dataset is used as an example in Figure 2. The second dataset has six clusters of different shapes. Moreover, it also contains random noise and special artifacts, such as a collection of points forming horizontal streak, this dataset is shown in Figure 7. The third dataset has eight clusters of different shapes, sizes, and orientation, some of which are inside the space enclosed by other clusters. Moreover, it also contains random noise and special artifacts, such as a collection of points forming vertical streaks. This dataset is shown in Figure 8. The fourt h dataset h as eight clust ers of differe nt shapes, sizes, densities, and orientation, as well as random noise. A particularly challenging feature of this data set is that the clust ers are very c lose to each othe r and the y ha ve different densities, this dataset is shown in Figure 9. The fifth dataset has four clusters of different shape, size, and density. A particularly challenging feature of this data set is that, the clusters are very close to each other and they have different densities, this dataset is shown in Figure 10. The sixth and seventh datasets have clusters of different Copyright © 2010 SciRes. JSEA Scalable Varied Density Clustering Algorithm for Large Datasets 598 Figure 7. Clusters obtained from dataset 2 sizes and densities. These datasets are shown in Figures 11 and 12. The last three datasets are presented here to insure that the proposed algorithm is very efficient in discovering varied density clusters without requiring se- parating regions with low density. We evaluate t he performance of the propose d algorithm compared to the original EDBSCAN using different synthetic datasets include noise; the sizes of these datasets are ranging from 3147 to 10000 points in two-dimensions Figure 8. Clusters obtained from dataset 3 (2D), and they contain varied shaped, size, and density clusters. Figure 7 shows the three steps of the proposed algorithm on dataset 2 of size 8000 points. For comparing the proposed algorithm with the original EDBSCAN, the same value for the parameter maxpts in both EDBSCAN and the proposed algorithm is used in order to demonstrate the higher enhancement of the pro- posed algorithm. Figure 8 shows the three steps of the proposed algorith m on the third dataset con taining 10000 points. Copyright © 2010 SciRes. JSEA Scalable Varied Density Clustering Algorithm for Large Datasets599 Figure 9. Clusters obtained from dataset 4 However data set 3 has cl usters of va ried sha pes that are nested with each other, the proposed algorithm could discover the right clusters in the data. There is a small mistake in the ring cluster that has two different spherical clusters within it. This cluster has three levels of density in Figure 10. Clusters obtained from dataset 5 its right half. This mistake resulted from the EDBSCAN which considered the small cluster as outlier and removed it from the data, but this problem can easily be resolved by assigning the outlier clusters to the nearest cluster. Figure 9 shows the three steps of the proposed algorithm on Copyright © 2010 SciRes. JSEA Scalable Varied Density Clustering Algorithm for Large Datasets 600 Figure 11. Clusters obtained from dataset 6 dataset 4 of size 8000 poin ts. Dataset 5 has four clusters of varied shapes, sizes and densities. The proposed algorithm could successfully discover the correct clusters. Figure 10 shows the result on dataset 5 that has varied-density clusters with no sepa- ration among them. Figure 12. Clusters obtained from dataset 7 From Figures 11 and 12, one can find that the final result is the same as the result from EDBSCAN in step 2 of the proposed algorithm. This is because the clusters in the same part are not allowed to be merged, and each part has clusters that are totally separated from clusters in other parts. Copyright © 2010 SciRes. JSEA Scalable Varied Density Clustering Algorithm for Large Datasets601 From the experimental result, we can deduce that the proposed algorithm is able to discover clusters with varied shapes, sizes, and densities efficiently. It can easily be noticed that the result obtained by the proposed algori t hm is not exactly similar to the one of EDBSCAN on the entire dataset. Instead, the proposed algorithm produces additional small clusters that are discarded as noise by EDBSCAN on the entire dataset. This is not an irresolv- able problem. If we want to get the identical result of EDBSCAN, we should remove the small clusters as noise. The results of the original EDBSCAN on the entire data- sets are shown in Figure 13. Figure 13. The results from the original EDBSCAN on the entire datasets The proposed algorithm is more scalable than EDBS- CAN algorithm, the scalability of the proposed algorith m comes from partitioning the orig inal dataset into finite set of smaller datasets in the first stage, and depending on k representatives from each sub-cluster to get the actual clusters in the original dataset. The following Figure 14 depicts the execution time of the proposed algorithm and EDBSCAN without using an index structure like R*-tree or canopies. We use only the fist four datasets because the other datasets are very small. From Figure 14 it is noticed that the proposed algorithm (SEDBSCAN) rea- ches a speed up factor 5.25, 5, 5.33, an d 5.25 for datasets 1, 2, 3, and 4 respectively. 5. Conclusions This paper has introduced a scalable and efficient clus- tering algorithm for discovering clusters with varied shapes, sizes, and densities. The proposed algorithm has exploited all the advantages of previous different algo- rithms (e.g., the scalability of the k-means, and the ability of discovering clusters from uneven datasets of the EDBSCAN, and the idea of multiple representatives taken from the CURE algorithm, and finally the idea of local density taken from the DENCLUE Algorithm) and has overcame their disadvantages. So this algorithm collects ideas from partitioning, hierarchical, and density based methods. Generally speaking, combining all these ideas into the proposed algorithm allows it to be scalable and more efficient in discovering clusters of varied density. The experimental results have given a clear indication on the scalability of the proposed algorithm. Furthermore the proposed algorithm has better performance than EDBSC- AN with speed up factor up to 5 times. Additionally, the algorithm can be partially imp lemented in parallel (in the second stage) which help s in improv ing th e scalab ility b y a significant factor. Figure 14. The execution time comparison Copyright © 2010 SciRes. JSEA Scalable Varied Density Clustering Algorithm for Large Datasets Copyright © 2010 SciRes. JSEA 602 REFERENCES [1] J. MacLennan, Z. Tang and B. Crivat, “Data Mining with SQL Server 2008,” Wiley Publishing, Indiana, 2009. [2] M. Ester, H. P. Kriegel, J. Sander and X. Xu, “A Density Based Algorithm for Discovering Clusters in large Spatial Datasets with Noise,” Proceedings of International Conference on Knowledge Discovery and Data Mining, 1996, pp. 226-231. [3] A. Hinneburg and D. Keim, “An Efficient Approach to Clustering in Large Multimedia databases with Noise,” Proceedings Int ernational Conf e r e n ce on Knowled g e Di s - covery and Data Mining, 1998, pp. 58-65. [4] M. Ankerst, M. Breunig, H. P. Kriegel and J. Sandler, “OPTICS: Ordering Points to Identify the Clustering Structure,” Proceedings of the International Conference on Management of Data (SIGMOD’99), 1999, pp. 49-60. [5] A. Fahim, G. Saake, A. Salem, F. Torkey and M. Rama- dan, “Enhanced Density Based Spatial clustering of Application with Noise,” in Proceedings of the 2009 Inter- national Conference on Data Mining, Las Vegas, July 2009, pp. 517-523. [6] C.-F. Tsai and C.-W. Liu, “KIDBSCAN: A New Efficient Data Clustering Algorithm,” Artificial Intelligence and Soft Computing-ICAISC, Springer, Berlin/Heidelberg, 2006, pp. 702-711. [7] R. Xin and C. H. Duo, “An Improved Clustering Algo- rithm,” International Symposium on Computatio na l Intel l- igence and Design, 2008, pp. 394-397. [8] Y. El-Sonbaty, M. Ismail and M. Farouk, “An Efficient Density Based Clustering Algorithm for Large Data- bases,” Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2004, pp. 673-677. [9] A. Fahim, A. Salem, F. Torkey and M. Ramadan, “An Efficient Enhanced k-Means Clustering Algorithm,” Journal of Zhejiang University Science A, Vol. 7, No. 10, 2006, pp. 1626-1633. [10] S. Guha, R. Rastogi and K. Shim, “CURE: An Efficient Clustering Algorithms for Large Databases,” Procee- dings of ACM SIGMOD International Conference on Management of Data, Seattle, 1998, pp. 73-84. [11] A. K. Jain, M. N. Murty and P. J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, Vol. 31, No. 3, September 1999, pp. 264-323. [12] L. Ertoz, M. Steinbach and V. Kumar, “A New Shared Nearest Neighbor Clustering Algorithm and its Appli- cations,” Worksho p on C lu steri ng High Dime nsion a l Data and its Applications at 2nd SIAM International Con- ference on Data Mining, 2002. [13] M. Emre Celebi, Y. Alp Aslandogan and P. R. Bergs- tresser, “Mining Biomedical Images with Density-Based Clustering,” Proceedings of the International Confer- ence on Information Technology: Coding and Computing, Washington, DC, IEEE Computer Society, Vol. 1, 2005, pp. 163-168. [14] J. Sander, M. Ester, H.-P. Kriegel and X. Xu “Density- Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications,” Data Mining and Knowledge Discovery, Vol. 2, No. 2, 1998, pp. 169-194. |