Soft computing is a combination of methods that complement each other when dealing with ambiguous real life decision systems. Rough Set Theory (RST) is a technique used in soft computing that enhances the idea of classical sets to deal with incomplete knowledge and provides a mechanism for concept approximation. It uses reducts to isolate key attributes affecting outcomes in decision systems. The paper summarizes two algorithms for reduct calculation. Moreover, to automate the application of RST, different software packages are available. The paper provides a survey of packages that are most frequently used to perform data analysis based on Rough Sets. For benefit of researchers, a comparison of based on functionalities of those software is also provided.
One common aspect among the fields of machine learning, decision analysis, data mining and pattern recognition is that all of them deal with imprecise or incomplete knowledge. As a result, it is imperative that appropriate data processing tools must be employed when researching computational intelligence and reasoning systems [
Since its development Rough Set Theory has been able to devise computationally efficient and mathematically sound techniques for addressing the issues of pattern discovery from databases, formulation of decision rules, reduction of data, principal component analysis, and inference interpretation based on available data [
The rest of the paper is organized as follows: Section 2 presents a brief review of rough sets, reducts and several algorithms used to compute reducts. Section 3 presents survey of a number of software packages used to automate application of RST. These include Rosetta, RSES, Rose2, Rough Sets, and WEKA. Section 4 shows a comparison of the different components of the packages surveyed in section 3. Conclusion, future work and references are presented at the end.
The Rough Set Theory has had a significant impact in the field of data analysis and as a result has attracted the attention of researchers worldwide. Owing to this research, various extensions to the original theory have been proposed and areas of application continue to widen [
While studying decision systems, all researches confront the question of dropping some (superfluous) condition attributes without altering the basic properties of the system [
The idea can be more precisely stated as: Let C,
where
Any given information system may have a number of reduct sets. The collection of all reducts is denoted as
For many tasks, such as feature selection, it is necessary to search for the reduct that has the minimum cardinality (
As Komorowski [
attributes may be equal to
of the bottleneck of the rough set methodology. For tunately, several algorithms have been developed to calculate reducts particularly in cases when the information system is large and involves a number of attributes [
Johnson Algorithm is a famous approach to calculate reducts and extract decision rules from a decision system [
This algorithm considers attribute occurring most frequently as most significant. Although, this is not true in all cases, but it generally finds out an optimal solution. Application of both the algorithms presented here onto the decision systems can be automated. The software used for the said purpose are stated consequently.
To apply RST on datasets, a number of software systems have been developed by computer scientists across the globe. This development can be attributed to the successful application of rough sets to data mining and knowledge discovery. A brief review of most commonly used software is presented. Details of the software can be obtained by referring existing literature or contacting respective authors [
It is a free package for R language that facilitates data analysis using techniques put forth by Rough Set and Fuzzy Rough Set Theories. It does not only provide implementations for basic concepts of RST and FRST but also popular algorithms that derive from those theories.
The development of the package involved Lala Septem Riza and Andrzej Janusz as Authors; Dominik Ślęzak, Chris Cornelis, Francisco Herrera, Jose Manuel Benitez and Sebastian Stawicki as Contributors; and Christoph Bergmeir as Maintainer. The functionalities provided by the package include Discretization, Feature selection, Instance selection, Rule induction, and Classification based on nearest neighbors. The main functionalities are summarized in
RSES is a toolset used for analysis of data using concept of the Rough Set Theory. It has a user friendly Graphical User Interface (GUI) that run under MS Windows® environment. The interface provides access to the methods that have been provided by RSES lib library, the core computational kernel of RSES [
As stated on their website, the system was designed and implemented as a result of research on Rough Set led by Andrzej Skowron (Project Supervisor) and team involving Jan Bazan, Nguyen Hung Son, Marcin zczuka, Rafał Latkowski, Nguyen SinhHoa, Piotr Synak, Arkadiusz Wojna, Marcin Wojnarski and Jakub Wróblewski. RSESlib is a library that provides functionalities for performing a number of data exploration tasks including:
Decomposition of large data sets into fragments that have the same properties.
Manipulation of data.
Discretization of numerical attributes.
Calculation of reducts.
Generation of decision rules.
Search for hidden patterns in data.
The library has been implemented in C++ and Java. The development took place between 1994 and 2005. First version of library, after several extensions was included in the computational kernel of the Rosetta system.
The Rosetta system (Rough Set Toolkit for Analysis of Data) is a toolkit for analyzing datasets in tabular form using Rough Set Theory [
Rosetta has been developed as a general purpose tool for modelling based on discernibility and not geared towards any particular application domain. This is the reason why it has been used by a large community of scientists [
Rosetta has been developed by two groups: Knowledge Systems Group Norwegian University of Science and Technology, Trondheim, Norway and the Group of Logic, Inst. of Mathematics, University of Warsaw, Poland under the guidance of Jan Komorowski and Andrzej Skowron [
1) Data import/export.
a) Ability to integrate with other DBMS using ODBC.
b) To export tables, graphs, induced rules, and reducts etc. to a variety of formats, including plain text, XML,
Matlab and Prolog.
2) Pre-processing.
a) Discretization of numerical attributes.
b) Completion of missing values in decision tables using different algorithms.
c) Partition of data into training and testing groups using random number generators.
3) Computation.
a) Efficient computation of reducts.
b) Provides support for supervised and unsupervised learning.
c) Generation of IF-THEN rules using reducts.
d) Execution of script files.
e) Support for cross validation.
4) Post processing.
a) Advanced filtering of sets of reducts and rules.
5) Validation and analysis.
a) Application of induced rules on testing data.
b) Generation of confusion matrices and ROC curves.
c) Supports statistical hypothesis testing.
6) Miscellaneous.
a) Performs clustering using tolerance relations.
b) Computes variable precision rough set approximations.
c) Support for random sampling of observations.
ROSE (Rough Sets Data Explorer) is another software that implements Rough Set Theory and other techniques for rule discovery [
Rose2 provides number of tools for knowledge discovery based on rough set (shown in
reducts of attributes, inducing sets of decision rules from rough approximations of decision classes and using them as classifiers, and evaluating sets of rules on testing data in classification experiments.
The project WEKA, Waikato Environment for Knowledge Analysis [
A comparison of different components offered by the Rough Sets, Rose2, Rosetta, RSES, and WEKA is provided in
Components | Rough Sets | Rose2 | Rosetta | RSES | WEKA |
---|---|---|---|---|---|
Technique | RST FRST | RST | RST | RST | RST, … |
Programming Language | R | C++ | C++ | Java/C++ | Java |
Operating System | Windows/Linux/Mac | Windows | Windows | Windows/Linux | Windows/Linux/Mac |
User Interface | Script | GUI | GUI | GUI | GUI |
Basic Concepts | Yes | Yes | No | No | No |
Discretization | Yes | Yes | Yes | Yes | No |
Feature Selection | Yes | Yes | No | Yes | Yes |
Instance Selection | Yes | No | Yes | No | Yes |
Missing Value Completion | No | Yes | Yes | Yes | Yes |
Decomposition | No | No | No | No | No |
Rule-Based Classifiers | Yes | Yes | Yes | Yes | Yes |
Nearest Neighbour Based Classifiers | Yes | No | Yes | No | Yes |
Cross Validation | No | Yes | Yes | Yes | Yes |
The components listed include:
Technique used in the package.
Programming language used to develop the package.
The Operating System that the package supports.
The type of user interface provided.
Whether or not package provides calculation of basic concepts of rough sets such lower, upper approximation, boundary sets etc.
Whether the package provide the facility of feature/instance selection.
Can the package divide the data into training and test sets as per requirement of the user?
Can the package induce decision rules based on reducts?
Can data be classified on the basis of nearest neighbor based algorithms?
Does package provide the facility of cross validation to determine the accuracy and reliability of classification?
Soft Computing lies at the foundation of computational and conceptual intelligence. It exploits the tolerance of imprecision, uncertainty and partial information to mimic human mind like thinking ability and calculating decisions. Rough Set Theory is an adaptable technique that uses approximation sets to represent a vague concept. The calculations of RST can be cumbersome for large datasets but many existing software can be effectively used to automate them. A number of software has been briefly presented together with the main functionalities provided.
Our future work will explore application of RST on real datasets using some of the software presented and formulation of a step-by-step guide for other researchers to explore and adapt.
Zain Abbas,Aqil Burney, (2016) A Survey of Software Packages Used for Rough Set Analysis. Journal of Computer and Communications,04,10-18. doi: 10.4236/jcc.2016.49002