Journal of Software Engineering and Applications, 2013, 6, 16 doi:10.4236/jsea.2013.67B001 Published Online July 2013 (http://www.scirp.org/journal/jsea) New Topological Approaches for Data Granulation A. S. Salama, O. G. Elbarbary Department of Mathematics, Faculty of Science, Tanta University, Tanta, Egypt. Email: asalama@su.edu.sa, omniaelbarbary@yahoo.com Received March, 2013 ABSTRACT Data granulation is a good tool of decision making in various types of real life applications. The basic ideas of data granulation have appeared in many fields, such as interval analysis, quantization, rough set theory, DempsterShafer theory of belief functions, divide and conquer, cluster analysis, machine learning, databases, information retrieval, and many others. In this paper, we initiate some new topological tools for data granulation using rough set approximations. Moreover, we define some topological measures of data granulation in topological I formation systems. Topological generalizations using δβopen sets and their applications of information granulation are developed. Keywords: Knowledge Granulation; Topological Spaces; Rough Sets; Data Mining; Decision Making; Fuzzy Sets 1. Introduction Granulation of the universe involves the decomposition of the universe into parts. In other words, the grouping individual elements or objects into classes, based on offering information and knowledge [7,14,21,40,44,45]. Elements in a granule are pinched together by indiscerni bility, similarity, proximity or functionality [42,43]. One can thus form a granulated view of the universe. The theory of rough sets can be used for constructing a granulated vision of the universe and for interpreting, representing, and processing concepts in the granulated universe. It offers a more actual model of granular computing. The starting point of the theory of rough sets is the indiscernibility of objects or elements in a universe of concern [1522]. The customary approach for model ing indiscernibility of objects is through an equivalence relation defined based on their attribute values with reference to an information system [16]. Two objects are comparable if they have exactly the same description. The induced granulation is a part of the universe, i.e., A family of pairwise disjoint subsets. It is studied widely in mathematics under the name of quotient set. The notion of indiscernibility can be generalized by any general binary relation. The original rough set theory was based on an equivalent relation on a finite universe U. For practice use, there have been some extensions on it. One extension is to replace the equivalent relation by a arbitrary binary relation ; the other direction is to study rough set via topological method [8,14]. In this work, we construct topology for a family covering rough sets. After that, we study the relationship among upper approximations based on this topological space so that we can study data granulation by method of topology. In [40] Y.Y. Yao addressed four operators on a knowledge base, which are sufficient for generating new knowledge structures. Also, they addressed an axiomatic definition of knowledge granulation in knowledge bases. Rough set theory, proposed by Pawlak in the early 1980s [1720], is an expansion of set theory for the study of intelligent systems characterized by inexact, uncertain or insufficient information. Moreover, this theory may serve as a new mathematical tool to soft computing besides fuzzy set theory [4245], and has been successfully applied in machine learning, information sciences, expert systems, data reduction, and so on [3038,40]. In recent times, lots of researchers are interested to generalize this theory in many fields of applications [16,16,22]. But, partition or equivalence relation is still limiting for many applications. To study this matter, several interesting and having an important effect generalization to equivalence relation have been proposed in the past, such as tolerance relations, similarity relations [51], topological bases and subbases [1,2,6,22,23,2729] and others [3,4,16,20,25,26,33]. Particularly, some researchers have used coverings of the universe of discourse for establishing the generalized rough sets by coverings [11,15]. Others [912,16,27] combined fuzzy sets with rough sets in a successful way by defining rough fuzzy sets and fuzzy rough sets. Furthermore, another group has characterized a measure of the roughness of a fuzzy set making use of the concept of rough fuzzy sets [9,10,14,15,21,41,42]. They also suggested some possi Copyright © 2013 SciRes. JSEA
New Topological Approaches for Data Granulation 2 ble real world applications of these measures in pattern recognition and image analysis problems [13,24,30]. Topology is a significant and interesting area of mathematics, whose study introduces you to new concepts ( semiopen, pre open, δβopen sets and others) and theorems, which are very useful in many applica tions .Topological notions like semiopen, preopen, open sets are as basic to mathematicians of today as sets and functions were to those of last century [14,15,17]. Then, we think the topological structure will be so important base for knowledge extraction and processing. The topology induced by binary relations on the universes of information systems is used to generalize the basic rough set concepts. The suggested topological operations and structure open up the way for applying affluent more of topological facts and methods in the process of granular computing. In particular, the notion of topological membership function is introduced that integrates the concept of rough and fuzzy sets [17,4245]. 2. Essentials of Rough Set Approximations under General Binary Relations For any approximation space (,) UR, where is an equivalence relation, lower and upper approximations of a subset R U, namely ()RX and ()RX are defined as follows: (){ :[]} R RXx UxX , (){ :[]} R RXx UxX . The lower and upper approximations have the following properties: For every , YU from the approximation space (,) UR we have: 1. ()(), 2.()( ), 3.()( ), 4.()()(), RXX RX RURU U RR RXYRXRY 5.()()( ), 6.()()( ), 7.()()( ), 8. ()( ), 9. ()(), 10.( ( ))( ( ))( ), 11.(())(())(), 12.,()() () (). RX YRXRY RX YRXRY RX YRXRY RX RX RX RX RRXRRX RX RRX RRXRX IfXYthenRXR Y andRXR Y The equality in all properties happens when () ()RX RXX. The proof of all these properties can be found in [1720]. Furthermore, for a subset U, a rough membership function is defined as follows: [] () [] R XR X xx , where denotes the cardinality of the set . The rough membership value () X may be interpreted as the conditional probability that an arbitrary element belongs to given that the element belongs to [] . Based on the lower and upper approximations, the universe can be divided into three disjoint regions, the positive , the negative and the boundary , where: U ()POSX ()BND X()NEG X () () () () ()() () POSXRX NEG XUR X BND XR XR X Considering general binary relations in [18,52] is an extension to the classical lower and upper approxima tions of any subset of U. {: } x RxX is the base generated by the general relation defined in [18,52]. The general forms based on are defined as follows: () {:,} x RXBBB X , () {:,} x RXBB BX , where {: xBxB} . For data granulation by any binary relation, in [8] a rough membership function is defined as follows: () () Xx X x . 3. Data Granulation Using Equivalence and General Binary Relations By generalizing equivalence relations to general binary relations, one may obtain a different granulation of the universe. For any kind of relations, a pair of rough set approximation operators, known as lower and upper ap proximation operators can be defined in many ways [1720]. Let be an equivalence relation on finite nonempty universe U. The equivalence class, R RUU {:(,[]) } yU xyR consists of all elements equivalent to , and is also the equivalence class con taining . The relation R induces a partition of the uni verse namely U/{[]: R x xU}UR . The partition is regularly known as the quotient set and provides a granulated view of the universe under the equivalence classes of U. Naturally speaking, the available knowledge only allows us to talk about an equivalence class as a single unit. In other words, under the granulated view, we consider an equivalence class as a whole instead of individuals. The pair /RU (,) UR is referred to as an approximation space, indicating the in Copyright © 2013 SciRes. JSEA
New Topological Approaches for Data Granulation 3 tended application of the partition for approxima tion. Each equivalence class is called a simple granule. /UR A topological space [1,2,25] is a pair (,)X consisting of a set and a family of subset of satisfying the following conditions: (1) ,X , (2) is closed under arbitrary union, (3) is closed under finite intersection. The pair (,)X is called a topological space. The elements of are called points . The subsets of belonging to are called open sets. The complement of the open subsets are called closed sets. The family of all open subsets of is also called a topology for . is called () ={ai}clAFnd F :F Xs closedA  closure of a subset X. Obviously, is the smallest closed subset of ()cl A which contains . Note that is closed iff =() cl A. ()=intA {:ais openGX ndG }GA is called the interior of a subset X. Manifestly, is the union of all open subsets of ()int A which contained in . Make a note of that is open iff =() int A. is called the ()= ()(clA int)AbA  boundary of a subset X. For any subset of the topological space (,)X , , and are closure, interior, and boundary of ()cl A()int A()bA respectively. The subset is exact if ()bA= , otherwise is rough. It is clear that is exact iff . In Pawlak space a subset ()cl A=int()A X has two possibilities either rough or exact. In later years a number of generalizations of open sets have been considered [22,23]. We talk about some of these generalizations concepts in the following defini tions. Let be a finite universe set and is any binary relation defined on , and be the set of all el ements which are in relation to certain elements U R )xU(rR in from right for all U U ; ,) , in symbols where () {}xxR U,xrR {:(, } RyxyRxyU. Let be the general knowledge base (topological base) using all possible intersections of the members of . The component that will be equal to any union of some members of (rR x) must be misplaced. 4. Topological Granulation of Topological Information Systems Let (,) UR be an approximation space where is any binary relation defined on . Then we can define two new approximations as follows: R U ()( ()) XR RX , ()( ()) XR RX . The topological lower and the topological upper ap proximations have the following properties: For every , YU and every approximation space (,) UR we have: 1. () () XX , 2. () ()UU U , 3. () () , 4. ()()() YX Y , 5. ()()() YX Y , 6. ()()() YX Y , 7. ()()() YX Y , 8. () () X , 9. () () X , 10. (()) () X , 11. (()) () X , 12. ,()() fXY thenXY () ().and XY Given that topological lower and topological upper approximations satisfy that: () ()()()RXXXXRXU this enables us to divide the universe into five dis joint regions (granules) as follows: U 1. () ()POS XR X , 2. () ()()POSXXRX , 3. ()() ()BND XXX , 4. ()() ()NEGXRXX , 5. () ()NEG XURX . The following theorems study the properties and rela tionships among the above regions namely boundary, positive and negative regions. Theorem 4.1 let (,, ) ISU A be a topological in formation system and for any subset U we have: 1. ()()BND XX , 2. ()()BND XNEG X , 3. () ()() XBND X, 4. (), () NEG X and ()BND X are dis joint granules of . U Proof: directly. Theorem 4.2 let (,, ) ISU A be a topological in formation system and for any subsets , YU we have: 1. ()BND U , 2. ()( )BNDXBNDUX , 3. (())(BNDBND XBND X) , Copyright © 2013 SciRes. JSEA
New Topological Approaches for Data Granulation 4 ()() (BND XYBND XBNDY) Proof: (1) and (2) is obvious, by definitions. (()) (()()) (() ()) ((() ()) () () ()() ()() BNDBND X BNDXU X XUX UXUX XUX BND XBND XY ) YUXY Theorem 4.3 let (,, ) ISU A a topological infor mation system and for any subset , YU we have: 1. ()UNEG , 2. ()()NEGXUX , 3. ()XNEGX , 4. (())NEGUNEGXNEGX() , ()() (NEGXYNEGXNEGY) ()()(NEGXYNEG XNEGY) Proof: (1), (2), (3) and (4) are obvious. () () (()( (())(()) () () () () (())(()) (() ()) () (). NEGXY UXYU XY UXUY NEGXNEG Y NEGXNEG Y UXUY UXY UXY NEGXY )) Example 4.1 let 1234567 be the universe of 7 patients have data sheets shown in Table 1 with possible dengue symptoms. If some experts give us the general relation {,,,,,,}U uuuuuuu defined among those patients as follows: 1117223 3 36 4455 6677 {( ,),( ,),(,),(,), (,),(,),(,),(,),( ,)}. R uuuuuuuu uu uuuuuu uu Table 1. Patients information system. Conditional Attributes ( C) Decision (D) Patients (U) Temperature Flu Headache Dengue u1 Normal NoNo No u2 High NoNo No u3 Very High NoNo Yes u4 High NoYes Yes u5 Very High NoYes Yes u6 High Yes Yes Yes u7 Very High Yes Yes Yes The topological knowledge base will take the follow ing form: 172364567 {{,},{ },{ ,},{ },{},{ },{ }}uuuuuuuuu . For some patients 237 {, , } uuu the upper and lower approximations based on the topological knowledge base are given by: 12 3 67 (){, ,,,}RX uuuuu , and 27 {,}Ruu . By using the lower and upper approximations, the granules of universe are three disjoint regions as follows: 27 () (){,}POSXRXuu , 136 ()() (){,,}BNDXRXRXu uu , 45 () (){,}NEGXURXuu . According to the topological knowledge base we can easily see that: 1237 (){,,,} uuuu , 237 () {,,} uuu . Then we have the following granules of the universe: 1. (){2,7}POSXu u , 2. (){3}POSXu , 3. (){1}BND Xu , 4. () {6}NEGXu , 5. (){4,5}NEGXu u 5. Conclusions and Application Notes The rough set approach to approximation of sets leads to useful forms of granular computing that are part of com putational intelligence. The basic idea underlying the rough set approach and their topological generalizations to information granulation are to discover to what extent a given set of objects (these objects can be pixels of an image) approximates another set of objects of interest. Objects of definite universe are compared by considering their descriptions. The recent generalization of rough set theory has led to the introduction of topological rough set approaches [2426,35] and a consideration of the affini ties (topological nearness) of objects. REFERENCES [1] D. Andrijevic, Semipreopen sets and Mat. Vesnik, Vol. 38, 1986, pp. 2432. [2] A. S. Mashhour, M. E. Abd ElMonsef and S. N. ElDeeb, “On PreContinuous and Week Precontinuous Map pings,” Proc. Math. & phys. Soc. Egypt, Vol. 53, 1982, pp. 4753. [3] T. Nishino, M. Nagamachi and H. Tanaka, “Variable Precision Bayesian Rough Set Model and Its Application to Human Evaluation Data,” RSFDGrC 2005, LNAI 3641, Springer Verlag, 2005, pp. 294303. [4] T. Nishino, M. Sakawa, K. Kato, M. Nagamachi and H. Tanak, “Probabilistic Rough Set Model and Its Applica tion to Kansei Engineering, Transactions on Rough Sets V (Inter. J. of Rough Set Society)”, LNCS 4100, Springer, Copyright © 2013 SciRes. JSEA
New Topological Approaches for Data Granulation 5 2006, pp. 190206. [5] O. Njasted, “On Some Classes of Nearly Open Sets,” Pro. J. Math. Vol. 15, 1965, pp. 961970. [6] N. Levine, “Semi Open Sets and Semi Continuity Topo logical Spaces, “The American Mathematical Monthly,” Vol. 70, 1963, pp. 2432. doi:10.2307/2312781 [7] J. Y. Liang, J. H. Wang and Y. H. Qian, “A New Measure of Uncertainty Based on Knowledge Granulation for Rough Sets,” Information Sciences, Vol. 179, 2009, pp. 458470. doi:10.1016/j.ins.2008.10.010 [8] E. Lashein,, A. M. Kozae, A. A. Khadra and T. Medhat, “Rough Set Theory for Topological Spaces,” Interna tional Journal of Approximate Reasoning, Vol. 40, 2005, pp. 3543. doi:10.1016/j.ijar.2004.11.007 [9] T.Y. Lin, Granular Computing on Binary Relations I: data mining and neighborhood systems, II: rough set repre sentations and belief functions, In: Rough Setsin Knowl edge Discovery 1, L. Polkowski, A. Skowron (Eds.), Phys.Verlag, Heidelberg, 1998, pp. 10714. [10] T. Y. Lin, Y. Y. Yao and L. A. Zadeh, “Data Mining, Rough Sets and Granular Computing (Studies in Fuzzi ness and Soft Computing),” PhysicaVerlag, Heidel berg ,2002.doi:10.1007/9783790817911 [11] G. L. Liu and Y. Sai, “A Comparison of Two Types of Rough Sets Induced by Coverings,” International Journal of Approximate Reasoning, Vol. 50, 2009, pp. 521528. doi:10.1016/j.ijar.2008.11.001 [12] Y. Leung, M. M. Fischer, W.Z. Wu and J.S. Mi, “A Rough Set Approach for the Discovery of Classification Rules in IntervalValued Information Systems,” Interna tional Journal of Approximate Reasoning, Vol. 47, 2008, pp. 233246. doi:10.1016/j.ijar.2007.05.001 [13] G. L. Liu, “Axiomatic Systems for Rough Sets and Fuzzy Rough Sets,” International Journal of Approximate Rea soning, Vol. 48, 2008, pp. 857867. doi:10.1016/j.ijar.2008.02.001 [14] Z. Pei, D. W. Pei and L. Zheng, “Topology vs General ized Rough Sets,” International Journal of Approximate Reasoning, Vol. 52, No. 2, 2011, pp. 231239. doi:10.1016/j.ijar.2010.07.010 [15] Z. Pei, D. W. Pei and L. Zheng, “Covering Rough Sets Based on Neighborhoods an Approach without Using Neighborhoods,” International Journal of Approximate Reasoning, Vol. 52 , 2011, pp. 461472. doi:10.1016/j.ijar.2010.07.010 [16] L. Polkowski and A. Skowron, “Towards Adaptive Cal culus of Granules,” Proceedings of 1998 IEEE Inter. Conf. on Fuzzy Sys., 1998, pp. 111116. [17] Z. Pawlak and A. Skowron, “Rough Sets and Boolean Reasoning,” Information Sciences, Vol. 177, 2007, pp. 4173. doi:10.1016/j.ins.2006.06.007 [18] Z. Pawlak and A. Skowron, “Rough Sets: Some Exten sions,” Information Sciences, Vol. 177, 2007, pp. 2840. doi:10.1016/j.ins.2006.06.006 [19] Z. Pawlak and A. Skowron, “Rudiments of Rough Sets,” Information Sciences, Vol. 177, No.1, 2007, pp. 327. doi:10.1016/j.ins.2006.06.003 [20] Z. Pawlak, “Rough sets,” International Journal of Com pute r＆ Information Sciences, Vol. 11, 1981, pp. 341356. doi:10.1007/BF01001956 [21] Y. H. Qian, J. Y. Liang and C. Y. Dang, “Knowledge Structure, Knowledge Granulation and Knowledge Dis tance in a knowledge base,” International Journal of Ap proximate Reasoning, Vol. 50 , 2009, pp. 174188. doi:10.1016/j.ijar.2008.08.004 [22] Y. H. Qian, J. Y. Liang, Y. Y. Yao and C. Y. Dang, “MGRS: A MultiGranulation Rough Set,” Information Sciences, Vol. 180, 2010, pp. 949970. doi:10.1016/j.ins.2009.11.023 [23] Q. H. Hu, J. F. Liu and D. R. Yu, “Mixed Feature Selec tion Based on Granulation and Approximation,” Knowl edgebased system, Vol. 21, 2008, pp. 294304. [24] A. S. Salama, “Topologies Induced by Relations with Applications,” Journal of Computer Science, Vol. 4, 2008, pp. 879889. doi:10.3844/jcssp.2008.877.887 [25] A. S. Salama, “Two New Topological Rough Operators,” Journal of Interdisciplinary Math,” Vol. 11, No. 1, New Delhi Taru Publications, INDIA, 2008, pp. 110. [26] A. S. Salama, “Topological Solution for Missing Attrib ute Values in Incomplete Information Tables,” Informa tion Sciences, Vol. 180, 2010, pp. 631639. doi:10.1016/j.ins.2009.11.010 [27] D. J. Spiegelhalter, K. R. Abrams and J. P. Myles, “Bayesian Approaches to Clinical Trials and HealthCare Evaluation,” John Wiley & Sons Ltd, The Atrium, South ern Gate, Chichester, England, 2004. [28] D. Slezak and W. Ziarko, “Attribute Reduction in the Bayesian Version of Variable Precision Rough Set Mod el,” In: Proc. of RSKD, ENTCS, Vol. 82, 2003, pp. 414. [29] D. Slezak, W. Ziarko, “The Investigation of the Bayesian Rough Set Model,” International Journal of Approximate Reasoning, Vol. 40, 2005, pp. 8191. doi:10.1016/j.ijar.2004.11.004 [30] D. Slezak, “The Rough Bayesian Model for Distributed Decision Systems,” RSCT 2004, LNAI 3066, Springer Verlag, 2004, pp. 384393. [31] D. Slezak, “Rough Sets and Bayes factors,” Transactions on Rough Set III, Lecture Notes Computer Science, Vol. 3400, 2005, pp. 202229. doi:10.1007/11427834_10 [32] D. Slezak and W. Ziarko, Bayesian Rough Set Model, In: Proc. of the Int. Workshop on Foundation of Data Mining (FDM 2002), December 9, Maebashi, Japan, 2002, pp. 131135. [33] D. Slezak and W. Ziarko, “Variable Precision Bayesian Rough Set Model,” RSFDGrC 2003, LNAI 2639, Springer Verlag, 2003, pp. 312315. [34] R. R. Yager, “Comparing Approximate Reasoning and Probabilistic Reasoning Using the Dempster–Shafer Framework,” International Journal of Approximate Rea soning, Vol. 50, 2009, pp. 812821. doi:10.1016/j.ijar.2009.03.003 [35] E. A. Rady, A. M. Kozae and M. M. E. Abd ElMonsef, “Generalized Rough Sets,” Chaos, Solitons, & Fractals, Vol. 21, 2004, pp. 4953.doi:10.1016/j.chaos.2003.09.044 Copyright © 2013 SciRes. JSEA
New Topological Approaches for Data Granulation Copyright © 2013 SciRes. JSEA 6 [36] Y. Y. Yao, “Constructive and Algebraic Methods of The ory of Rough Sets,” Information Sciences, Vol. 109, 1998, pp. 2147. doi:10.1016/S00200255(98)000127 [37] Y. Y. Yao, “Relational Interpretations of Neighborhood Operators and Rough Set Approximation Operators,” In formation Sciences, Vol. 111, 1998, pp. 239259. doi:10.1016/S00200255(98)100063 [38] Y. Yang and R. I. John, “Generalizations of Roughness Bounds in Rough Set Operations,” International Journal of Approximate Reasoning, Vol. 48, 2008, pp. 868878. doi:10.1016/j.ijar.2008.02.002 [39] Y. Yao and Y. Zhao, “Attribute Reduction in Deci sionTheoretic Rough Set Models,” Information Sciences, Vol. 178, 2008, pp. 33563373. doi:10.1016/j.ins.2008.05.010 [40] Y. Y. Yao, “Granular Computing using Neighborhood Systems,” in: Advances in Soft Computing: Engineering Design and Manufacturing, R. Roy, T. Furuhashi, and P. K. Chawdhry (Eds.), SpringerVerlag, London, 1999, pp. 539553. [41] A. M. Zahran, “Regularly Open Sets and a Good Exten sion on Fuzzy Topological Spaces,” Fuzzy Sets and Sys tems, Vol. 116, 2000, pp. 353359. doi:10.1016/S01650114(98)001390 [42] L. A. Zadeh, “Fuzzy Sets and Information Granularity,” In: Advances in Fuzzy Set Theory and Applications, Gupta, N., Ragade, R. and Yager, R. (Eds.), North Hol land, Amsterdam, 1979, pp. 318. [43] L. A. Zadeh, “Towards a Theory of Fuzzy Information Granulation and its Centrality in Human Reasoning and Fuzzy Logic,” Fuzzy Sets and Systems, Vol. 19, 1997, pp. 111127. doi:10.1016/S01650114(97)000778 [44] L. A. Zadeh, “Generalized Theory of Uncertainty (GTU)— Principal Concepts and Ideas,” Computational Statistics & Data Analysis, Vol. 51, No. 1, 2006, pp. 1546. doi:10.1016/j.csda.2006.04.029 [45] L. A. Zadeh, “Toward a PerceptionBased Theory of Probabilistic Reasoning with Imprecise Probabilities,” Journal of Statistical Planning and Inference, Vol. 105, No. 1, 2002, pp. 233264. doi:10.1016/S03783758(01)002129
