Paper Menu >>
Journal Menu >>
![]() J. Software Engineering & Applications, 2010, 3: 384-390 doi:10.4236/jsea.2010.34043 Published Online April 2010 (http://www.SciRP.org/journal/jsea) Copyright © 2010 SciRes JSEA Exploring Design Level Class Cohesion Metrics Kuljit Kaur, Hardeep Singh Department of Computer Science and Engineering, Guru Nanak Dev University, Amritsar, India. Email: kuljitchahal@yahoo.com Received November 17th, 2009; revised December 15th, 2009; accepted January 25th, 2010. ABSTRACT In object oriented paradigm, cohesion of a class refers to the degree to which members of the class are interrelated. Metrics have been defined to measure cohesiveness of a class both at design and source code levels. In comparison to source code level class cohesion metrics, only a few design level class cohesion metrics have been proposed. Design level class cohesion metrics are based on the assumption that if all the methods of a class have access to similar pa- rameter types then they all process closely related information. A class with a large number o f parameter types common in its methods is more cohesive than a class with less number of parameter types common in its methods. In this paper, we review the design level class cohesion metrics with a special focus on metrics which use similarity of parameter types of methods of a class as the basis of its cohesiveness. Basically three metrics fall in this category: Cohesion among Methods of a Class (CAMC), Normalized Hamming Distance (NHD), and Scaled NHD (SNHD). Keeping in mind the anomalies in the definitions of the existing metrics, a variant of the existing metrics is introduced. It is named NHD Modified (NHDM). An automated metric collection tool is used to collect the metric data from an open source software program. The metric data is then subjected to statistical analysis. Keywords: Design Metrics, Class Cohesion Metrics, Cohesion among Methods of a Class, Normalized Hamming Distance, Scaled N H D 1. Introduction In the object oriented paradigm, cohesion of a class re- fers to the degree to which members of the class are in- terrelated. Chidamber and Kemerer defined the first met- ric to measure cohesiveness of a class [1]. Since then, several class cohesion metrics have been proposed (dis- cussed in the next section). Empirical studies report that class cohesion metrics are useful to assess software de- sign quality [2,3], to predict fault proneness of classes [4-6], and to identify reusable components [7,8].Existing class cohesion metrics mainly fall into two categories – metrics which can be computed at design level (high level) and metrics which can be computed one step later i.e. at source code level (low level). Design level class cohesion metrics use the limited amount of information available about a class at this level i.e. only the class at- tributes, and method signatures. Method implementation is not completely defined at design level. So some as- sumptions are made. Different class cohesion metrics defined at design level are based on different assump- tions. 1) One school of thought assumes that the types of method parameters match the types of the attributes ac- cessed by the method. It is further assumed that the set of attribute types accessed by a method is the intersection of this method’s parameter types and the set of parameter types of all the methods in the class [9,10]. 2) Another school of thought assumes that the set of attribute types accessed by a method is the intersection of the set of this method’s parameter types and the set of its class attribute types [11]. In this paper we review the design level class cohesion metrics based on the first assumption. Keeping in mind the anomalies in the definitions of the ex isting metrics, a variant of the metrics is introduced. The paper is organ- ized as follows: Section 2 reviews the related work. Sec- tion 3 explains the existing design level class cohesion metrics and introduces a modified version as well. Sec- tion 4 presents the statistical analysis of the data col- lected from an open source project. Section 5 concludes the paper. 2. Related Work A number of class cohesion metrics are defined in the low level metrics category [1,12-26]. However, there are only a few proposals for design level class cohesion met- rics [9-11]. ![]() Explo r ing Design Level Class Cohesion Me tric s385 The metric, named Cohesion among Methods of a Class (CAMC) captures the information about parameter types of methods of a class [9]. A class is cohesive if all methods of the class use the same set of parameter types. Methods which use same type of parameter types are assumed to process related kind of information. CAMC metric values lie in the range [0, 1]. Counsell et al. point out some anomalies in definition of this metric, and pro- pose a new metric named Normalized Hamming Dis- tance (NHD) [10]. It is a normalized metric which meas- ures average agreement between each pair of methods on their parameter types. A variant of the NHD metric called Scaled NHD (SNHD) is introduced in the same paper. It addresses shortcomings of both CAMC and NHD, as claimed by the authors [10]. This research finds anomalies in the definitions of NHD and SNHD as well, and proposes a modified version of the NHD metric - NHD modified (NHDM). The NHDM metric gives sta- tistically significant results. Dallal proposes another metric for measuring cohesion of a class at design level [11]. Similarity based Class Cohesion (SCC) metric is based on the second assump- tion discussed above. This metric is not analyzed in this paper as the automated tool developed for this research does not support collection of this metric. 3. Design Metrics This section describes the class cohesion metrics com- putable with information available at design level. At design level, information regard ing name of the class, its attributes (names, and data types), and method signatures is available. Method signature includes name of the method and its parameter list which describes names of the parameters and their data types. A Class does not have a detailed or algorithmic description of its methods available at this level. 3.1 CAMC The CAMC metric measures the extent of intersection of individual method parameter type lists with the parame- ter type list of all methods in the class [9]. This metric computes the relatedness among methods of a class based upon the parameter list of the methods. It is as- sumed that methods of a class, having access to similar parameter types, process closely related information. The CAMC metric uses a parameter-occurrence matrix (PO matrix) that has a row for each method and a column for each data type that appears at least once as the type of a parameter in at least one method in the class. The value in row i and column j in the matrix is 1 when the ith method has a parameter of the jth data type and is 0 oth- erwise. In the original version of the metric [9], the PO matrix has an additional column of all 1s. This column represents the ‘self’ parameter that corresponds to the type of the class itself which is by default one of the pa- rameters of every method. In this discussion, the original version of the metric is referred to as CAMCs (Cohesion among methods of a class with ‘self’ parameter) and metric definition without the ‘self’ parameter is named as CAMC [10]. The CAMC metric is defined as the ratio of the total number of 1s in the PO matrix to the total size of th e ma- trix. 11 CAMC(C)where[ ][] kl PO ij kl CAMC suffers from the following anamolies: 1) CAMC gives false positives – the metric gives a non-zero value for a class with no parameter sharing in its methods. 2) CAMC can not differentiate between two classes having same number of 1s but with different patterns of 1s in their PO matrices. 3) Smaller classes take high values for the cohesion metric than the larger classes with same properties. 3.2 NHD Counsell et al. [10] suggested an alternative of CAMC. It is based on the definition of hamming distance. NHD measures agreement between rows in the PO matrix. NHD metric for a class with k methods and l unique pa- rameter types (set obtained from union of parameter types received by all its methods) is defined as: 1 11 2 NHD( ,) (1) kk jai j lk k where a(i,j) is value of the cell at (i,j)th location in the PO matrix. Another easy way to compute NHD is to first find the sum of disagreements between methods for all the parameter types and then subtract it from 1. 1 2 1( (1) l) j j N HDc kc lk k where cj is the number of 1s in the jth column of the PO matrix. A varaint of NHD (with self parameter), NHDs can be defined for a PO matrix with an additional column of all 1s. NHD suffers from the following anomalies: 1) NHD metric also gives false positives. The metric removes the first anomaly of the CAMC for a class with k = l = 2. The metrics fails to give correct answer for higher values of k and l (e.g. when k = l = 3, and th ere is no parameter sharing among methods, NHD metric gives a non-zero value). 2) NHD does not give different answers for classes with different properties – metric fails to distinguish a class with no parameter sharing in its methods from a class with substantial amount of parameter sharing in its methods. 3) Class size influences metric value. As size of the class increases, value of the NHD metric also increases (even if the PO matrix gets sparser). Copyright © 2010 SciRes JSEA ![]() Explo r ing Design Level Class Cohesion Me tric s 386 3.3 SNHD SNHD is the Scaled NHD metric proposed to interpret values of the NHD metric in a more varied range. Pro- ponents of the NHD metric are of the opinion that NHD metric can take values at two extremes: the minimum or the maximum. But they admit that it is not clear as to which of these extremes represents a cohesive class. However without giving any clear explanation they state that classes at both the extremes may be cohesive. They define these extreme values as NHDmin and NHDmax re- spectively [10]. SNHD metric value helps to know how close the NHD metric is to the maximum value of the NHD value in comparison to the minimum value. SNHD is defined as follows: min max min 0 , 1 21, max min if NHDNHDandkl SNHD=if kl NHD NHDotherwise NHD NHD , The SNHD metric values lies in the range [-1,1]. SNHD = –1 implies that NHD = NHDmin, and SNHD = 1 implies that NHD=NHDmax . NHD is closer to its minimum or maximum value depending upon whether SNHD is get- ting values close to –1 or +1 respectively. A class is con- sidered non-cohesive if SNHD metric value for the class is 0. SNHDs is defined by considering the ‘self’ parameter. SNHD suffers from these Anomalies: 1) Difficult to calculate and interpret. 2) False negatives – SNHD metric gives 0 value for a class with good degree of cohesion. 3.4 NHDM Keeping in view the anomalies of the cohesion metrics discussed above, this research prop oses a variation of the NHD metric. This variat is named as Normalized Ham- ming Distance Modified (NHDM) metric. The NHD metric ignores the method pairs with zero values in a column of the PO matrix. It counts only those methods pairs which do not agree, and ignores all other method pairs irrespective of whether they agree on a 0 or a 1. NHDM counts the method p airs which agree on a 0, as a disagreement. NHDM for a class with k methods and l unique parameter types, of all its methods, is defined as: 1 21 1(()( (1)2 l jj jj NHDMc kczz lk k 1)) where cj is the number of ones and zj is the number of zeroes in the jth column of the PO matrix for the class. Similarly NHDMs is defined by including the ‘self’ parameter in the PO matrix. This metric removes the anomalies present in the de- fintion of CAMC, NHD, and SNHD metrics. NHDM gives correct results. It gives different results for classes with different properties. NHDM metric values are inde- pendent of the class size. 4. Data Analysis Cohesion metrics discussed above are collected from an open source software system available at www.source- forge.net. The software is a JAVA based charting library, and it consists of 884 classes. For automated collection of metrics, a tool CohMetric is developed. 4.1 Descriptive Analysis Histograms in Figures 1 to 4 show metrics distributions. Table 1 presents the descriptive statistics. It can be ob- served that majority of the CAMC metric values lie close to 0 (see Figure 1). On average a class’s cohesion value is 0.21. NHD metric takes values in a higher range (Fig- ure 2). Average NHD metric value is 0.66. SNHD is 0 for maximum of the classes. Its values lie more on the Figure 1. Distribution of CAMC metric Table 1. Descriptive statistics for cohesion metrics MetricAverageStd Dev Metric AverageStd Dev CAMC0.21 0.18 CAMCs 0.48 0.21 NHD 0.66 0.21 NHDs 0.81 0.12 SNHD -0.43 0.51 SNHDs 0.63 0.42 NHDM0.05 0.16 NHDMs 0.38 0.22 Figure 2. Distribution of NHD metric Copyright © 2010 SciRes JSEA ![]() Explo r ing Design Level Class Cohesion Me tric s387 Figure 3. Distribution of SNHD metric Figure 4. Distribution of NHDM metric left side of 0 which implies that majority of the classes has NHD more close to NHDmin than NHDmax . Average SNHD for a class is –0.43 and standard deviation is also very high (Figure 3). NHDM takes very low values (Figure 4). For majority of the classes it is 0. Its average value is just 0.05. As earlier stated, it may be due to the reason that it does not give false positives. 4.2 Metric Vari ants Variants of these cohesion metrics are defined on the basis of the assumption that all the methods of a class by default receive the class type itself (self) as one of the parameter types. CAMCs, NHDs, SNHDs and NHDMs are defined as variants of CAMC, NHD, SNHD, and NHDM respectively. Cohesion metrics which consider the ‘self’ parameter are expected to give higher values as the class methods agree on at least one parameter type. Table 1 gives a comparison of averages of cohesion met- rics and their variants. All the metrics in this category (which consider self parameter type) have higher aver- ages than their counterp arts. The observation is that met- ric variants, which consider ‘self’ as one of the parameter types, take values in higher range.It is also confirmed by the descriptive analysis of these metrics as shown in Figures 5 to 8. It is worth noting that SNHDs takes val- ues in the range from 0 to 1 more frequently, in contrast to SNHD which takes values in the range from 0 to –1. It CAMCS Figure 5. Distribution of CAMCs metric NHDS Figure 6. Distribution of NHDs metric SNHDS Figure 7. Distribution of SNHDs metric NHDMS Figure 8. Distribution of NHDMs metric Copyright © 2010 SciRes JSEA ![]() Explo r ing Design Level Class Cohesion Me tric s 388 implies that a class whose NHD value is more close to NHDmax is more cohesive. 4.3 Size Independence Figures 9 to 12 present the relation between cohe- sion metrics and class size (measured in terms of number of methods). CAMC metric value is higher for small classes and is lower for large classes. NHD takes large values for classes with larger nu mber of methods. This is in line with the earlier findings about these two metrics [10]. As shown in Figure 11, SNHD is close to 1 for Figure 9. Scatter diagram of CAMC and class size Figure 10. Scatter diagram of NHD and class size Figure 11. Scatter diagram of SNHD and class size Figure 12. Scatter diagram of NHDM and class size some comparatively small classes. For larger classes, SNHD lies in the range [-1, 0]. NHDM takes values near 0 for most of the classes. However small classes have metric value in the higher range. However if size of the parameter occurrence (PO) matrix is taken into consid- eration then it is found that it does not have significant correlation with any of the metrics (see Table 2). Here l represents the number of parameter types, k is the num- ber of methods of the class, and lk is the size of the pa- rameter occurrence matrix. This result is unlike the pre- vious studies on these metrics [10,27]. 4.4 Metrics Inter-Dependencies The parametric Pearson’s correlation coefficient between each pair of cohesion metrics is given in Table 3. All the correlation figures are significant at p = 0.01 level. Met- ric variants such as CAMCs, NHDs, SNHDs, and NHDMs are moderately correlated with their counterparts. NHD and NHDs have the highest correlation coefficien t in this category. NHDM and CAMC are strongly correlated. Similar is the case for their variants NHDMs and CAMCs. SNHD is moderately correlated with NHDMs and CAMCs. Unlike the previous studies, the correlation analysis for this data set does not show any significant correlation in NHD and CAMC [10,27]. However the scatter plot of values for these two metrics shows a n ega- tive trend. CAMC and NHD show a negative relationship in the scatter diagram given in Figure 13. CAM C is ve ry low for the classes for which NHD is very high. On av- erage the NHD metric takes values in higher range.This implies that this metric pair does not have a linear co- variation. Principal Component Analysis (PCA) is used to iden- tify the metrics measuring orthogonal dimensions. Ro- tated principal components are obtained using the vari- max rotation technique. Three principal components are extracted which capture 93.28% of the data set variance (shown in Table 4). Metrics with significant loading co- efficients in a particular dimension are highlighted in bold. An analysis of the table shows that NHDMs and CAMCs and SNHD contribute significantly to the first Copyright © 2010 SciRes JSEA ![]() Explo r ing Design Level Class Cohesion Me tric s389 Table 2. Correlation in cohesion metric s and size CAMC CAMCs NHD NHDs SNHD SNHDs NHDM NHDMs l -.222 -.696 .337 .003 -.356 -.539 -.128-.645 k -.307 -.520 .350 .219 -.071 -.179 -.177-.429 lk -.158 -.357 .210 .138 .067 -.223 -.075-.298 Table 3. Correlation analysis among metrics CAMC CAMCs NHD NHDs SNHD SNHDsNHDM CAMCs 0.575 NHD -0.267 -0.542 NHDs -0.372 -0.024 0.623 SNHD 0.341 0.654 -0.043 0.334 SNHDs -0.253 0.347 0.271 0.678 0.478 NHDM 0.854 0.466 0.122 0.107 0.403 -0.043 NHDMs 0.456 0.962 -0.356 0.249 0.726 0.5200.480 Figure 13. Scatter diagram shows correlation in CAMC and NHD metrics Table 4. Principal Components Matrix PC1 PC2 PC3 Eigen Value 3.72 2.39 1.35 Percent 46.45 29.93 16.90 Comm. percent 46.45 76.38 93.28 CAMC 0.25 -0.32 0.90 CAMCs 0.91 -0.27 0.30 NHD -0.39 0.89 0.11 NHDs 0.27 0.90 -0.13 SNHD 0.84 0.21 0.27 SNHDs 0.66 0.59 -0.31 NHDM 0.24 0.15 0.95 NHDMs 0.95 -0.01 0.25 dimension: PC1. SNHDs is moderately significant in two dimensions: PC1 and PC2. NHD and NHDs both load significantly on PC2. NHDM and CAMC both load sig- nificantly on PC3. It is worth mentioning here that NHDM and NHDMs have the maximum variance among all the metrics in this analysis. So metrics measuring different dimensions are: PC1: NHDMs, CAMCs PC2: NHD, NHDs PC3: NHDM, CAMC 5. Conclusions Cohesion is one of the important design properties to realize a quality software product. Many empirical stud- ies exist which relate the cohesion deign property with other properties of interest such as maintainability, reus- ability, and reliability. Several metrics have been pro- posed to compute cohesion at class level in object ori- ented systems. In this paper design level cohesion met- rics such as CAMC, NHD, SNHD have been investigated using empirical data. In view of the anomalies present in the existing metrics’ definitions, a modified version of the NHD metric is proposed and is named as NHDM (NHD Modified) ss. Statistical analysis of the metrics data shows that CAMC and NHD are influenced by the size of class (measured in terms of number of methods). None of the studied metrics correlates with the size of the Parameter Occurence matrix (PO matrix) of the class. Principal Component Analysis of the data shows that NHDM and CAMC both give similar results but NHDM has more variation in its values. Similar is the case for NHDMs and CAMCs. SNHD or SNHDs does not con- tribute significantly to any dimension. NHD and NHDs are not significantly related to any of the other metrics. REFERENCES [1] P. Chidamber and C. Kemerer, “Towards a Metrics Suite for Object Oriented Design,” Proceedings of 6th ACM Conference on Object Oriented Programming, Systems, Languages and Applications, Phoenix, Arizona, 1991, pp. 197-211. [2] L. Briand, J. Wust, J. Daly and D. Porter, “Exploring the Relationships between Design Measures and Software Quality in Object Oriented Systems,” Journal of Systems and Software, Vol. 51, No. 3, 2000, pp. 245-273. [3] J. Bansiya and C. Davis, “A Hierarchical Model for Ob- ject Oriented Quality Assessment,” IEEE Transactions on Software Engineering, Vol. 28, No. 1, 2002, pp. 4-17. [4] T. Gyimothy, R. Ferenc and I. Siket, “Empirical Valida- tion of Object-Oriented Metrics on Open Source Software for Fault Prediction,” IEEE Transactions on Software Enineering, Vol. 31, No. 10, 2005, pp. 897-910. [5] Z. Zhou and H. Leung, “Empirical Analysis of Object- Oriented Design Metrics for Predicting High and Low Copyright © 2010 SciRes JSEA ![]() Explo r ing Design Level Class Cohesion Me tric s Copyright © 2010 SciRes JSEA 390 Severity Faults,” IEEE Transactions on Software Engi- neering, Vol. 32, No. 10, 2006, pp. 771-789. [6] M. Marcus and D. Poshyvanyk, “Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Orien- ted System,” IEEE Transactions on Software Engineering, Vol. 34, No. 2, 2008. [7] J. Lee, S. Jung, S. Kim, W. Jang and D. Ham,“Compo- nent Identification Method with Coupling and Cohesion,” Proceedings of the Eighth Asia-Pacific Software Engi- neering Conference, December 2001, pp. 79-86. [8] G. Gui and D. Scott, “Measuring Software Component Reusability by Coupling and Cohesion Metrics,” Journal of Computers, Vol. 4, No 9, Academy Publishers, 2009, pp. 797-805. [9] J. Bansiya, L. Etzkorn, C. Davis and W. Li, “A Class Cohesion Metric for Object Oriented Designs,” Journal of Object Oriented Programming, Vol. 11, No. 8, 1999, pp. 47-52. [10] S. Counsell, S. Swift and J. Crampton, “The Interpreta- tion and Utility of Three Cohesion Metrics for Object- Oriented Design,” ACM Transactions on Software Engi- neering and Methodology, Vol. 15, No. 2, 2006, pp. 123-149. [11] J. Dallal, “A Design-Based Cohesion Metric for Object- Oriented Classes,” Proceedings of the International Conference on Computer and Information Science and Engineering, 2007, pp. 301-306. [12] W. Li and S. Henry, “Object-Oriented Metrics that Predict Maintainability,” Journal of Systems and Software, Vol. 23, No. 2, 1993, pp. 111-122. [13] S. Chidamber and C. Kemerer, “A Metrics Suite for Object Oriented Design,” IEEE Transactions on Software Engineering, Vol. 20, 1994, pp. 476-493. [14] M. Hitz and B. Montazeri, “Measuring Coupling and Cohesion in Object-Oriented Systems,” Proceedings of International Symosium on Applied Corporate C omputing, 1995. [15] J. Bieman and B. Kang, “Cohesion and Reuse in an Object-Oriented System,” Proceedings of the 1995 Sym- posium on Software Reusability, ACM Press, 1995, pp. 259-262. [16] B. Henderson-Sellers, L. Constantine and I. Graham, “Coupling and Cohesion (towards a Valid Metrics Suite for Object-Oriented Analysis and Design),” Object Ori- ented Systems, Vol. 3, 1996, pp. 143-158. [17] L. Briand, J. Daly and J. Wust, “A Unified Framework for Cohesion Measurement in Object-Oriented Systems,” Empirical Software Engineering, Vol. 3, No. 1, 1998, pp. 65-117. [18] H. Chae, Y. Kwon and D. Bae, “A Cohesion Measure for Object-Oriented Classes,” Software Practice and Experi- ence, Vol. 30, No. 12, 2000, pp. 1405-1431. [19] Z. Chen, Y. Zhou and B. Xu, “A Novel Approach to Measuring Class Cohesion Based on Dependence Analy- sis,” Proceedings of the International Conference on Software Maintenance, 2002, pp. 377-384. [20] L. Badri and M. Badri, “A Proposal of a New Class Co- hesion Criter Ion: An Empirical Study,” Journal of Object Technology, Vol. 3, No. 4, 2004. [21] J. Wang, Y. Zhou, L. Wen, Y. Chen, H. Lu and B. Xu, “DMC: A More Precise Cohesion Measure for Classes,” Information and Software Technology, Vol. 47, No. 3, pp. 176-180, 2005. [22] C. Bonja and E. Kidanmariam, “Metrics for Class Cohesion and Similarity between Methods,” Proceedings of the 44th Annual Southeast Regional Conference, ACM Press, New York, 2006, pp. 91-95. [23] G. Cox, , L. Etzkorn and W. Hughes, “Cohesion Metric for Object-Oriented Systems Based on Semantic Close- ness from Disambiguity,” Applied Artificial Intelligence, Vol 20, No. 5, 2006, pp. 419-436. [24] L. Fernández and R. Peña, “A Sensitive Metric of Class Cohesion,” International Journal of Information Theories and Applications, Vol. 13, No. 1, 2006, pp. 82-91. [25] S. Makela and V. Leppanen, “Client Based Object Oriented Cohesion Metrics,” 31st Annual International Computer Software and Applications Conference, Vol. 2, 2007, pp. 743-748. [26] A. Marcus and D. Poshyvanyk, “The Conceptual Cohe- sion of Classes,” Proceedings of 21st IEEE International Conference on Software Maintenance, 2005, pp. 133-142. [27] J. Dallal and L. Briand, “An Object-Oriented High-Level Design-Based Class Cohesion Metric,” Simula Technical Report (2009-1), Version 2, Simula Research Laboratory, 2009. |