Journal of Intelligent Learning Systems and Applications
Vol.09 No.03(2017), Article ID:78687,12 pages
10.4236/jilsa.2017.93004
Classification Based on Invariants of the Data Matrix
Vladimir N. Shats
St. Petersburg, Russia

Copyright © 2017 by author and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).
http://creativecommons.org/licenses/by/4.0/



Received: May 16, 2017; Accepted: August 21, 2017; Published: August 24, 2017
ABSTRACT
The paper proposes a solution to the problem classification by calculating the sequence of matrices of feature indices that approximate invariants of the data matrix. Here the feature index is the index of interval for feature values, and the number of intervals is a parameter. Objects with the equal indices form granules, including information granules, which correspond to the objects of the training sample of a certain class. From the ratios of the information granules lengths, we obtain the frequency intervals of any feature that are the same for the appropriate objects of the control sample. Then, for an arbitrary object, we find object probability estimation in each class and then the class of object that corresponds to the maximum probability. For a sequence of the parameter values, we find a converging sequence of error rates. An additional effect is created by the parameters aimed at increasing the data variety and compressing rare data. The high accuracy and stability of the results obtained using this method have been confirmed for nine data set from the UCI repository. The proposed method has obvious advantages over existing ones due to the algorithm’s simplicity and universality, as well as the accuracy of the solutions.
Keywords:
Artificial Intelligence, Classification Algorithms, Granular Computing

1. Introduction
The classification problem is the central problem in machine learning, and methods for solving it are dealt with in a considerable number of research papers, which is constantly growing. Nevertheless, the analysis of modern methods, which is adequately described in [1] [2] [3] , shows that some essential particulars of this task have not been considered.
The problem regards the set of real objects, the patterns of which are represented by the feature vectors. The composition of features that describes the objects is, to a certain extent, random, and in some cases, the list can be changed or shortened. In addition, feature values contain random errors of measurement or observation. The influence of the inevitable uncertainty in the relationship between the real object and its model (pattern) is further increased, since the given information is divided between the object and its model, and neither of them is fully defined.
However, the existing methods have been unable to consider these factors as they use mathematical tools within the framework of formalization of pure mathematics. Such approaches have another drawback: in solving the problem, one must proceed from the assumption of existence of a metric in the feature space and a probability density function for the objects of each class.
At the same time, all objects have feature values that are very similar or equal, so the object classes differ in the probability density functions for features, but not for objects. Yet only a generalized function can accurately consider the discontinuous densities of the features. Therefore, each of the existing methods is applied in a restricted area whose boundaries can be established, as a rule, only experimentally.
Studies in recent decades regarding the principles of information processing in complex systems open up new possibilities for solving the problem and eliminate these gaps in the theory. Most of these works were triggered by soft computing theory and were based on the concept of an organism as a granular system. It is assumed that the system consists of holons or granules, which simultaneously represent a single entity and its part in the larger system on different hierarchical levels [4] [5] [6] [7] . Objects, subsets of programs or intelligent agents are examples of such entities. Here, the key issue is the operation for forming and separating granules.
Granular computing is a paradigm of research in the field of artificial intelligence. It covers multiple process modeling concepts of information processing in various hierarchical systems, as well as new approaches to learning with fuzzy databases [8] [9] . In this respect, the paradigm has common roots with the methods of machine learning. In the general theory of pattern recognition [10] , the remaining unfinished, objects are “split” into elements of multiple hierarchical levels linked by combinatorial relations, and the simplest standard blocks are at the lowest level. Recently, granulation has been used to solve the problem of classification based on two existing machine-learning methods [11] .
The present work is based on the concept of soft computing L. Zadeh, according to which the human mind considers the “comprehensive inaccuracy of the real world” [12] [13] . The author supposes that flows of inaccurate and partially correct information enters the brain through visual, auditory, tactile and other sensors, and the brain selects only the information that is related to a specific task. This information corresponds to the original information “with a minimum degree of accuracy”.
This concept is reflected in the proposed method, which builds upon a new approach to feature description. The range of each feature value is divided into n equal intervals. Each interval can contain anywhere from zero to several objects forming granules. The length of the granules randomly depends on the ordinal interval, which is called index of the corresponding feature. Then, each object will be uniquely described by its index vector, and the data matrix will be transformed into an index matrix. For each index, we can calculate the ratio between the number of objects from a certain class and the total number of objects, which defines the index frequencies and provides the sampling density estimate. These results allow us to establish the class of any object and the value of the parameter n that provides an acceptable margin of error.
The effectiveness of adopted approach is explained by the fact that a given set of data defines a hierarchically organized system in which there are relations of the whole and part between its elements: features, objects and classes. The mechanism of operation of such a system is determined by frequencies of interaction of granules in accordance with the simplest formulas of probability theory.
Note that the uncertainty of initial data of the problem is taken into account indirectly under all transformations. They lead to random changes in the description of any object, but the relation between the object and its class remains the same. Therefore, we can assume that the granulation is based on an approximate calculation of the data matrix invariants whose role, with known error, is played by the index matrices.
The article summarizes and develops previously completed studies [14] [15] . The method has been cross-checked on nine data sets from the UCI repository [16] .
2. Transformation of Original Information
The article is devoted to the classification problem, in which training and control samples of real objects
are considered. They are described with errors by feature vectors
and a distribution of objects into
disjoint classes. Objects have features of quantitative, categorical or mixed types, and there are no missing data. The task is to find an object classification rule for the training sample and evaluate its applicability to the control sample.
To identify objects and their patterns q, we will use the sequence of numbers
, where t and
are the number of objects in the training sample and the total number of objects in the combined sample, respectively. Then,
and
will represent an object s, its pattern and the
feature, respectively. Let
denote the data matrix for the combined sample, and
be the list of object numbers of class
.
This problem relates to the field of artificial intelligence and has specific relevant peculiarities. Here we are dealing with two entities of the arbitrary object number s: a real-world object
and information representing its pattern
that contains errors. It follows that pattern, specified in the task, is one of the realizations of the random variable (see. Figure 1), and there is a set of patterns of its possible values. Each of the vectors of this set could replace the corres-
Figure 1. Scheme of interrelations between the real object, its class and the set of its patterns.
ponding row of the matrix
. Note that these observations are consistent with the concept of soft computing.
Hence, the set
would be represented by another data matrix, and there are infinite set such matrices. These matrices share a common property: they can be regarded as invariants of the matrix
in relation to the class of objects, although the concept of an invariant is defined for deterministic quantities. Therefore, it is advisable to find invariants that simplify the algorithm for solving the problem.
For the implementation of the relevant mapping, we will consider the given data set as a system that processes multi-level data with a hierarchical structure
. We need to find an approximate mapping of this set into another set such that the result of partitioning it into subsets could be estimated using the frequencies of the features.
Let us make two general remarks concerning the calculation of granules.
1) It should be noted that the frequencies of the features do not depend on the technique used to identify the feature values, such as magnitude, number or any other identifiers. On this basis, we can transform the system of reference for the features value and use any types of features, including mixed, in the calculations.
In particular, for a non-quantitative feature, we should establish relationship of partial order on the set of options for its values under the arbitrary rule of their numbering. The value 


2) The training set is designed to reduce the level of uncertainty of our knowledge of the properties of objects of each class. The uncertainty is measured by the value of information entropy, and it is obvious that increasing the entropy of each feature improves the solution quality. The maximum value of entropy is equal to
To implement the ideas in both of the comments, we will first calculate a matrix that is of a convenient form and has sufficient accuracy to represent information contained in the conditions of the task. To this end, we will sequentially apply randomization and indexing of information.
The randomization procedure is designed to transform the feature values 






We assume that the interval 









Now we can clarify the concept of the index. The value 









In these mappings, the initial data for the problem are significantly transformed and boundaries of not quantitative features are become fuzzy. The ongoing changes could be illustrated by the example of the vector of an object with dissimilar features: 

3. Design Formulas
It follows from the definition of index that 




Then we can partition the combined sample into subsets called granules that contain objects of ordered pairs. Of particular interest is a subset of training sample objects of a certain class, which we will call information granules.
To calculate the composition of the granules, we establish on the set 



As a result of these transformations, the values of features will be measured on a single scale with a division value equal to one index. Therefore, the matrix 
Let 








Since the appearance of each of the 

From (1) it follows that 

The estimation of the class of the object s is found using the obvious formula

The relation 

Figure 2. The scheme for calculating the indices and information granule 

respectively, characterizes the algorithm’s ability to study and classify objects. The quality of teaching as a function of length of the training sample t is estimated by overage error rates 




4. Existence and Accuracy of the Solution
We will consider the issue of convergence for the sequence of error rates. Let the granule 








In this case for 









Classes can differ significantly in the convergence of error rates of learning 



Figure 3. The impact of the variety parameter n on the error rate of training 

Figure 4. The impact of the variety parameter n on the error rate of classification 

Nevertheless, the high accuracy of training does not guarantee acceptable classification accuracy since training is the first step in determining the class. Within the second step, it is necessary to evaluate the accuracies of solutions for the control sample objects based on relations (1) and (2), according to which any object for each n is characterized by the frequency


Here we face a problem of overtraining: if we restrict the value of n, it is possible to obtain a more accurate solution for the control sample, but the reliability of this result will be decreased because simultaneously the accuracy of learning will be lower. It is obvious that the simplicity of the algorithm reduces the computational complexity and severity of this problem, as well as allows us to estimate the effects of the characteristics of the matrix
The impact of the parameter



Now consider the joint impact of n and 









5. The Results of Solving Particular Tasks
The information about data sets from the repository UCI is given in Table 1, where 

The algorithm of the method is based on the data grouping that predetermines the decreasing influence of various kinds of outliers. Calculations have shown that the solution is stable for an infinite set of acceptable solutions: small oscillations of the parameters 


The calculations have confirmed that for sufficiently large


Figure 5. The impact of the length of the training sample t of the data set Car evaluation on the error rate of classification 
Table 1. Parameters of data sets.
aObjects were numbered via a random number generator. b Objects with data errors are not considered.
the lower boundary of the value range of these parameters, under which the error rates are 

The results were verified by 10-fold cross-validation, in which splits of combined sample 





From Table 1 it following that, in most cases, 




This analysis can greatly simplify the process of solving practical problems, in which, instead of a control sample, a test sample is specified, for which the distribution of classes is unknown. Now, we can use tabulated values of 



Another important result was obtained by testing the method experimentally. A direct application of the above algorithm for the data set Adult gives the minimum values of 



To mitigate of noted effects of rare information, we will introduce the additional procedure of compression, or pre-granulation, which is also based on the above considerations about the data reliability. Now, the values 











After the additional transformation, the solution of task Adult has remained stable and errors are now closer to zero. The corresponding table data calculated at
This procedure has proven effective in addressing a number of other data sets, for example, Letter Image Recognition. It can also be observed as a way to reduce the values of 
6. Conclusions
The paper proposes a methodology for solution classification problems, the core of which is an approximate calculation of the invariants of data matrix. The new approach implements the concepts of soft computing and granulation and is biologically inspired. In essence, it reduces to the transformation of all measured scale features, in which the values of features, called indexes, are defined according to a single scale in the new units.
Developed methodology is based on procedures of randomization and indexation of the data set (and, in some cases, also a pre-granulation procedure), which generate an infinite sequence of index matrices. These matrices are invariants of the data matrix in relation to a class of objects. They provide error-free training and allow us to calculate the object class under the simplest formulas of total probability for any single type or mixed types of features.
The proposed method differs from existing ones by the universality and simplicity of the algorithm and, as a rule, almost an order of magnitude higher accuracy.
The obtained results go beyond the problem of classification and have independent significance for the solutions of the problems of data analysis. It can be expected that the method will receive a hardware implementation, and its extension to multi-level data will lead to the development of effective image recognition systems and information retrieval. The application of the method does not require mathematical education, which increases its innovative potential.
Cite this paper
Shats, V.N. (2017) Classification Based on Invariants of the Data Matrix. Journal of Intelligent Learning Systems and Applications, 9, 35-46. https://doi.org/10.4236/jilsa.2017.93004
References
- 1. Bishop, C. (2006) Pattern Recognition and Machine Learning. Springer, Berlin, 738.
- 2. Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, And Prediction. Springer-Verlag, Berlin, 764. https://doi.org/10.1007/978-0-387-84858-7
- 3. Murphy, K. (2012) Machine Learning. A Probabilistic Perspective. MIT Press, Cambridge, London, 1098.
- 4. Zadeh, L. (1979) Fuzzy Sets and Information Granularity. In: Gupta, N., Ragade, R. and Yager, R., Eds., Advances in fuzzy set theory and applications, World Scientific Publishing, Amsterdam, 3-18.
- 5. Calabrese, M., Amato, A., Lecce, V. and Piuri, V. (2010) Holonic-Granularity Holonic Modelling. Journal of Ambient Intelligence and Humanized Computing, 1, 199-209. https://doi.org/10.1007/s12652-010-0013-3
- 6. Lecce, V., Calabrese, M. and Martines C. (2015) Towards Intelligent Sensor Evolution: A Holonic-Based System Architecture. The Sixth International Conference on Sensor Technologies and Applications, SENSORCOMM 2012, Rome, 19-24 August, 2012, 248-254.
- 7. Yao, J., Vasiliacos, V. and Pedrycz, W. (2013) Granular Computing: Perspective and Challenges. IEEE Transactions on Cybernetics, 43, 1977-1989. https://doi.org/10.1109/TSMCC.2012.2236648
- 8. Xu, W. and Li, W. (2016) Granular Computing Approach to Two-Way Learning Based on Formal Concept Analysis in Fuzzy Datasets. IEEE Transactions Cybernetic, 46, 366-378. https://doi.org/10.1109/TCYB.2014.2361772
- 9. Li, J., Mei, C., Xu, W. and Qian, Y. (2015). Concept Learning via Granular Computing: A cognitive Viewpoint. Information Sciences, 298, 447-467. https://doi.org/10.1016/j.ins.2014.12.010
- 10. Grenander, U. (1976) Lectures on Pattern Theory 1: Pattern Synthesis. Springer-Verlag, New York-Heidelberg-Berlin. https://doi.org/10.1007/978-1-4612-6369-2
- 11. Yao, J. and Yao, Y. (2002) A Granular Computing Approach to Machine Learning. Proceedings of the 1st International Conference on Fuzzy Systems and Knowledge Discovery, Singapore, 18-22 November 2002, 732-736.
- 12. Zadeh, L. (1973) Outline of a New Approach to the Analysis of Complex Systems and Decision Processes. IEEE Transactions on Systems, MAN, and Cybernetics, 1, 28-44. https://doi.org/10.1109/TSMC.1973.5408575
- 13. Kruse, R., Borgelt, C., Klownnn, F., Steinbrecher, M. and Held, P. (2013) Computational Intelligence: A Methodological Introduction. Springer-Verlag, London, 490. https://doi.org/10.1007/978-1-4471-5013-8
- 14. Shats, V.N. (2015) On New Computing Technology in Machine Learning. International Conference on Artificial Neural Networks “Neuroinformatics-2015”, Moscow, 19-23 January 2015, 2, 148-158.
- 15. Shats, V.N. (2016) Invariants of Matrix Data in a Classification Problem. Stochastic Optimization in Informatics, SPbGU, 12, 17-32. http://www.math.spbu.ru/user/gran/optstoch.htm
- 16. Asuncion, A. and Newman, D.J. (2007) UCI Machine Learning Repository. Irvine University of California, Irvine.
- 17. Ashby, W.R. (1957) An Introduction to Cybernetics. Chapman and Hall, London, 294. https://doi.org/10.1063/1.3060436
- 18. Cios, K.J. and Kurgan, L. (2004) CLIP4: Hybrid Inductive Machine Learning Algorithm That Generates Inequality Rules. Informing Science, 163, 37-83. https://doi.org/10.1016/j.ins.2003.03.015







