Quality Assessment of Training Data with Uncertain Labels for Classification of Subjective Domains

doi:10.4236/jcc.2017.57014

Journal of Computer and Communications
Vol.05 No.07(2017), Article ID:76403,17 pages
10.4236/jcc.2017.57014

Ying Dai

●How to Cite this Article

Faculty of Software and Information Science, Iwate Prefectural University, Takizawa, Japan

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

http://creativecommons.org/licenses/by/4.0/

Received: April 18, 2017; Accepted: May 21, 2017; Published: May 24, 2017

ABSTRACT

In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the QoSTD is used as a weight of the predicted class scores to adjust the likelihoods of instances. Moreover, two measurements are defined to assess the performance of the classifiers trained by the subjective labelled data. The binary classifiers of Traditional Chinese Medicine (TCM) Zhengs are trained and retrained by the real-world data set, utilizing the support vector machine (SVM) and the discrimination analysis (DA) models, so as to verify the effectiveness of the proposed method. The experimental results show that the consistency of likelihoods of instances with the corresponding observations is increased notable for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicate the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains.

Keywords:

Quality Assessment, Subjective Domain, Multimodal Sensor Data, Label Noise, Likelihood Adjusting, TCM Zheng

1. Introduction

Recently, much research is aimed at predicting the status of individuals in subjective domains, including their emotional states, their heath, and their personality by using a set of training data, acquired from a variety of sensors and interpreted or labelled by a first-person or a third-person [1] [2] [3] . As described by C. E. Brodley in [4] , labeling noises of training data set are emergent in such domains for several reasons including data-entry error, inadequacy of the information used to label each object, especially, uncertainty of the states. In [5] , statistical taxonomy of label noise inspired by [6] is summarized. There are three kinds of models defined: noisy completely at random model, noisy completely at random model, noisy random model, and noisy not at random. All of these models assume that the true classes are existed, and whether the labelling error occurs is told by introducing a binary variable. However, in the above domains of emotion, health, or personality, the absolute ground truth is unknown. The subjects including first-person and third-person subjectively provide labels as their choices, so that the class-uncertain label noise naturally appears, whoever providing these labels, a skillful expert or a nonprofessional. For example, in the domain of Traditional Chinese Medicine (TCM), the status of health is described fundamentally by 13 Zhengs which are diagnosed by TCM doctors based on the information acquired from five senses. Further, a disease severity of these 13 Zhengs are scored according to the subjective observation. Accordingly, scores of 13 Zhengs labelled by TCM doctors are ambiguous, because the absolute ground truth of the 13 Zhengs is unknown.

Many methods are proposed to deal with label noise. In the literature, there exist three main approaches to take care of label noise [5] . A first approach focuses on building algorithms that are robust to label noise; second, the quality of training data is tried to improve by identifying mislabeled data; eventually, label noise-tolerant learning algorithms aim at building a label noise model simultaneously with a classifier, which uncouples both components of the data generation process and improves results of the classifier. All of these methods are based on the underlying premise that the label errors are independent of the ground truth of the classes, that is, the ground truth of the classes is existed although the observation labels are contaminated for some reasons. However, it is obvious that the above premise is not available for the domains being of subjectivity, because the observation labels are also influenced by the uncertainty of the classes.

In order to deal with the above issue about the label noise of training data caused by the subjective labelling, especially in the domain that the ground truth is uncertain, we defined a metric QoSTD that is intended to measure the quality of the training data with uncertain labels for the classification of subjective domains, so as to predict the states of person’s emotion, health, and so on. QoSTD is an aggregation of two components which reflect the ability of clustering and partitioning of the training data set. The training data include the features extracted from the multimodal sensor data of subjects, the subjective scores of various items in a first-person questionnaire, and observation scores of classes in a subjective domain which are provided by third-persons. By using this metric, we can analyze the influence of subjectively labelled data on the quality of the training data, and we can estimate the sufficiency of the training data for the classification. When QoSTD for a particular class is less than a predetermined value, this indicates that the training data for this class can’t satisfy the performance of classification.

We trained binary classifiers for the states based on the support vector machine (SVM) model and the discrimination analysis (DA) model, so as to validate the relation of QoSTD with the performance of classification. Furthermore, the QoSTD is used as weights of predicted class scores to adjust the likelihoods of the instances without the absolute ground-truth. To evaluate the effectiveness of QoSTD in dealing with the label noise brought by subjective labelling, we used TCM Zheng training data set that was used in [7] and [8] for experiments. The experimental results show that the proposed method improved the consistency of likelihoods of instances with the corresponding observations notably for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicated the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains.

2. Related Works

The literature contains many studies on the classification in the presence of label noise [4] . In [5] , a method to identifying and eliminating mislabeled training instances for supervised learning is proposed. The paper focuses on the issue of determining whether or not to apply filtering to a given data. However, for the work described in the paper, the data were artificially corrupted. Therefore, the application of this method to relatively noise free datasets should not significantly impact the performance of the finally classification procedure. Moreover, the authors indicated that a future direction of this research would be to extend the filter approach to correct labeling errors in training data. However, it is difficult to judge labeling errors in the subjective domains, because the absolute ground truth is unknown. In [9] , authors propose to use a unsupervised mixture model in which the supervised information is introduced, so as to compare the supervised information given by the learning data with an unsupervised modelling. For this model, the probability that the jth cluster belongs to the ith class is introduced to measure the consistency between classes and clusters. However, it is not possible to obtain an explicit solution of the above probability for the classes of the subjective domains. In [10] , a self-training semi-supervised support vector machine (SVM) algorithm and its corresponding model selection method is proposed to train a classifier with small training data. The model introduces the Fisher ratio which represents the separability of a corresponding set. It is obvious that the above parameter is not available for the classes of the subjective domains in the case that the ground truth of the classes is unknown. In [11] , the quality of class labels in medical domains is considered. However, the ground truth of the training data used in the experiments is assumed to be certain, and those were corrupted artificially to analyze the impact of inputted noise on the classification.

Moreover, reference [12] analyzers a number of pieces of evidence supporting a single subjective hypothesis within a Bayesian framework. Reference [13] introduces an emotion-processing system that is based on fuzzy inference and subjective observation. In [14] , to make the annotation more reliable, the proposed method integrates local pairwise comparison labels together to minimize a cost that corresponds to global inconsistency of ranking order. In [15] , the authors construct subjective classification systems to predict sensation of reality from multimedia experiences based on EEG and peripheral physiological signals such as heart rate and respiration. In [16] , the authors propose a machine learning based data fusion algorithm that can provide real time per frame training and decision based cooperative spectrum sensing. For the labelled data imbalance, the authors in [17] propose a framework based on the correlations generated between concepts. The general idea is to identify negative data instances which have certain positive correlations with data instances in the target concept to facilitate the classification task. In [18] , robust principal component analysis and linear discriminant analysis are used to identify the features, and support vector machine (SVM) is applied to classify the tumor samples of gene expression data based on the identified features. However, all of these methods didn’t consider how to deal with the effects of training data’s mislabeling on the classification.

On the other hand, various methods have been proposed that utilize TCM to infer the health status of an individual as a means of auto-diagnosing. References [7] and [8] propose methods that use TCM Zheng to infer the health status of individuals by using images of their face and eyes, data on their emotional and physical state, and Zheng scores assigned by different TCM doctors (TCMDs). Reference [19] and [20] analyzes the effect of multimodal sensor data on the Zheng classification. However, all of these papers don’t consider in introducing the metric QoSTD as the weights of the predicted class scores of the instances, so as to improve the reliability of the classification in the subjective domain.

3. Measuring the Quality of Subjectively Labelled Training Data

Because the target is the status’ classification in the subjective domains, the data used as training data are generally diverse. For an instance, the data used to extract features or attributes maybe include the data measured by sensors or other equipment, or the data from the first-person questionnaires; with direct observation to the object, the states of the instance are labelled by third-persons for supervised learning. Although the kinds of obtained data are heterogeneous, all of the features extracted from the different modes are handled in the same way as the features of different modes. For example, the histogram, shape, and the texture of an image are the features of the image mode, and the blood pressure measured by a bio-sensor is the feature of bio-sensor mode. All of these features are considered to be homogeneous. They are denoted as $a_{s}^{m n}$ , which are normalized for each data set. Here, s, m, and n indicate indices of the sample, the mode, and a certain mode’s feature. The combined features of all of training samples yield a matrix A with the size of $S * M$ . Here, S, and M are the number of samples and total features, respectively. On the other hand, the labelled state scores from the third-person for each instance are denotedas $z_{s}^{i j}$ , and the values range from 0 to 10. Here, s, i, and j indicate indices of the sample, the observer, and the state, respectively.

Eigen feature vectors of the instance is obtained by calculating the eigenvalues and eigenvectors of $A^{'} * A$ ; this is based on the method of principle component analysis (PCA) [21] . With ranking the eigenvalues in a descending order, the corresponding top P eigenvectors are selected to form a matrix U with the size of $M * P$ . Then, the matrix of eigenfeatures regarding the samples is computed by the following equation.

$E F = {e f_{s, p} : s \in [1, S], p \in [1, P]} = A * U$ (1)

where s and p are indices indicating the sample and the eigenfeature, respectively. Thus, the size of $E F$ is $S * P$ .

The eigenfeature vector is then used to represent the instance. The samples belonging to a given state and those not belonging to that state are considered to overlap due to the subjectivity of the labelling. Accordingly, a matrix called QoSTD is defined to measure how well the training data set can be divided into binary classes. This allows us to explore the influence of the features and the subjectively labelled data on the state that is perceived. QoSTD is calculated not only based on the partition of the training data, but also the clustering ability of those. We call these two metric as the partition and the clustering. These determine the performance of the classification regarding the training data. Let the score of State j for Samples labelled by Observer i be denoted as $z_{s}^{i j}$ . In the training data set, those that have scores larger than the value of 0 for state j are considered being labelled as state j, and compose the data set $P Z^{i j} = {z_{s}^{i j} : z_{s}^{i j} \geq 0, s \leq S}$ , and those that have a score of 0 for State j compose the data set $N Z^{i j} = {z_{s}^{i j} : z_{s}^{i j} = 0, s \leq S}$ . We used K-means clustering to divide the data set into two groups. One of them with the more samples labelled by State j isassumed as the positive cluster of State j, denoted as $P C^{i j} = {p c_{s}^{i j}, s \leq S}$ , and the other is the negative cluster, denoted as $N C^{i j} = {n c_{s}^{i j}, s \leq S}$ . Accordingly, the partition of the data set for State j labelled by Observer i is defined as

$p a r^{i j} = \frac{# (P Z^{i j} \cap P C^{i j})}{# P C^{i j}},$ (2)

and the clustering of the data set for State j labelled by Observer i is defined as

$c l u^{i j} = \frac{# (P Z^{i j} \cap P C^{i j})}{# P Z^{i j}},$ , (3)

where # indicates the number of data points; $# (P Z^{i j} \cap P C^{i j})$ indicates the number of the samples which are labelled as State j and clustered into the positive cluster of State j. So, the larger the values of $p a r^{i j}$ and $c l u^{i j}$ is, the better the separability of the training data set for State j is. If these values are equal to 1, this means that the training data are completely separable. Accordingly, the quality of the training data set for classifying State j labelled by Observer i is defined as $Q o S T D^{i j}$ by the following expression, which is an aggregation of $p a r^{i j}$ and $c l u^{i j}$ :

$Q o S T D^{i j} = w_{1} p a r^{i j} + w_{2} c l u^{i j} .$ (4)

Here, $w_{1}$ and $w_{2}$ are the weights of partition and clustering, reflect the importance of the partition and the clustering ability of the training data in the classification. In the case that these two factors are equivalently important, both are set to 0.5. The value of $Q o S T D^{i j}$ is equal to 1, if the training data set is completely separable for the Sate j which are labelled by Observer i.

Figure 1 shows the example of 120 data points of $e f_{s, 1}$ and $e f_{s, 2}$ . Figure 1(a) is the instances’ scatter regrading State j1 labelled by an observer, and Figure 1(b) is the distribution regarding State j2. The dark blue points indicate the corresponding positive instances belonged to that state, and the light blue points indicate the negative instances not belonged to that. Figure 1(c) is the clustering of the data points by K-means. The instances of cluster 1 are indicated by light orange points, and the instances of cluster 2 are indicated by dark orange points. We assumed that the cluster with the more positive instances is the positive cluster, and the cluster with the more negative instances is the negative cluster. For the case of (a), it is obvious that the cluster 1 is regarded as the negative cluster, and the cluster 2 is regarded as the positive cluster according to the results of Fig. (a), and (c). So does the case of (b).

Based on the definition of QoSTD, and combining the results of instances’ clustering in Figure 1(c), it is obvious that the quality of the data points’ distribution in Figure 1(a) is better than that in Figure 1(b) for training the classification model, although the class of positive instances and the class of negative instances are overlapped either in the case of (a) or in the case of (b). In fact, the value of $Q o S T D^{i j}$ regarding the case (a) is 0.78, and the value of that regarding the case (b) is 0.43. So, we think that the larger the values of $Q o S T D^{i j}$ are, the better the quality of the training data set labelled by Observer i for classifying State j is. When $Q o S T D^{i j} = 1$ , this indicates that the data set labelled by Observer i can be divided completely into two classes with a positive or negative State j.

4. Using QoSTD for Classification

As mentioned above, the metric $Q o S T D^{i j}$ could be used to judge the quality of training data for the classification. Accordingly, the value of $Q o S T D^{i j}$ is considered to be used as a weight of the predicted class scores of the instances regarding States j.

For calculating $Q o S T D^{i j}$ , the data modes used as the training set are determined based on the context in which the data were collected and the capacity for computations. Next, the features of matrix Aare extracted from the multimodal data set. The eigenfeatures matrix EF is obtained by Equation (1). Then, the value of $Q o S T D^{i j}$ for State j labelled by Observer i is calculated using Equations (2), (3), and (4).

The following is the scheme that trains classifiers utilizing $Q o S T D^{i j}$ .

Generally, existing supervised learning algorithm, for example, discrimination

Figure 1. Samples’ distribution and clustering. (a) Samples’ distribution regarding state j1. (b) Sample’s distribution regarding state j2. (3) Samples’ clustering.

analysis (DA), support vector machine (SVM), or decision tree (DT), could be utilized to train binary classifiers of State j with the training data labelled by observer i. With using the trained classification model, the predicted class score belonging to State j is generated for the response to the instance s, which is denoted as $s c o r e_{s}^{i j}$ . However, considering that the quality of training data influences the performance of classification, $Q o S T D^{i j}$ is utilized as a weight of $s c o r e_{s}^{i j}$ to adjust the scores of prediction. The corresponded computation is as the below.

$s c o r e_r_{s}^{i j} = Q o S T D^{i j} * s c o r e_{s}^{i j}$ (5)

where, $s c o r e_r_{s}^{i j}$ denotes the adjusted score of instance s belonging to State j labelled by observer i.

Then, the likelihood of the instance belonging to State j is calculated by the Equation (6).

$l_{r}_{s}^{i j} = \frac{1}{1 + \exp (- a * s c o r e_{r}_{s}^{i j})}$ (6)

where, $l_{s}^{i j}$ indicates the likelihood of instance s belonging to State j labelled by observer i; the parameter a is the slope parameter.

For the instance s, if the value of $l_r_{s}^{i j}$ is more than a threshold T_max, it is assigned to the positive lass of State j; if that value is less than a threshold T_min, it is assigned to the negative class of State j; otherwise, whether the instance is belonged to State j is uncertain. Then, the uncertain instances are eliminated from the training data set, and the classification model is trained again with the refined training data.

Two measurements, $C o n^{i j}$ and $R e c a l l^{i j}$ , are introduced to assess the performance of classifying the classes without the absolute ground-truth. $C o n^{i j}$ , which is defined by the following Equation (7), reflects the consistency of the labelled score of the assigned instances from the training data with the likelihood of those. Let the labelled score that is larger than the value of 0 is denoted as $p z_{s}^{i j}$ , the likelihoods of the assigned instance is denoted as $l_r a_{s}^{i j}$ , and the number of the assigned instances is denoted as $S 1^{i j}$ , Then,

$C o n^{i j} = \frac{\sqrt{\sum_{s = 1}^{S 1^{i j}} (p z_{s}^{i j} - \bar{p z}_{s}^{i j}) (l_r a_{s}^{i j} - \bar{l_r a}_{s}^{i j})}}{\sqrt{{‖ p z_{s}^{i j} - \bar{p z}_{s}^{i j} ‖}^{2}} \sqrt{{‖ l_r a_{s}^{i j} - \bar{l_r a}_{s}^{i j} ‖}^{2}}}$ (7)

On the other hand, $R e c a l l^{i j}$ , which is defined by Equation (8), reflects the rate of the number of the assigned instances to the all.

$R e c a l l^{i j} = \frac{S 1^{i j}}{S^{i j}}$ (8)

It is obvious that the larger the values of $C o n^{i j}$ and $r e c a l l^{i j}$ are, the better the performance of the classifiers for the classification.

Let the object value of $C o n^{i j}$ is $C o n_O b j^{i j}$ , and that of $R e c a l l^{i j}$ is $R e c a l l_O b j^{i j}$ . Then, the whole training procedure is as the below.

Step 1

Constructing the binary classification model;

Step 2

Calculating the adjusted likelihood $l_r_{s}^{i j}$ of instances by Equations (5) and (6);

Step 3

If $T_m i n < l_r_{s}^{i j} < T_m a x$ , the instance s is not assigning to State j; otherwise, the instance s is assigned to the positive or negative class of State j according to the likelihood;

Step 4

Eliminating the unassigned instances from the training data set;

Step 5

Calculating $C o n^{i j}$ and $R e c a l l^{i j}$ by Equations (7) and (8), and repeating the procedure from Step 1 to Step 4, until the limited rounds or $R e c a l l_O b j^{i j}$ are reached;

Step 6

Finding the maximal value of $C o n^{i j}$ , and the corresponding round. The binary classification model constructed in this round is used as the final model, if this is larger than $C o n_O b j^{i j}$ , and the corresponding $R e c a l l^{i j}$ is larger than $R e c a l l_O b j^{i j}$ ; if not, the final classification model can’t be determined, and the training procedure is given up.

After constructing the binary classification model, a new instance could be assigned to the positive class of State j, if its $l_r_{s}^{i j}$ is larger than T_max; however, it is assigned to the negative class of State j, if its $l_r_{s}^{i j}$ is less than T_min. Otherwise, which the instance is belonged to is uncertain.

5. Training Data Set

In this study, the real-world training data set that was used in [7] [8] is utilized for predicting the individual’s health status represented by the states of TCM’s thirteen Zhengs (Clod syndrome, Pyretic syndrome, Deficiency of vital energy, Qi stagnation, Blood asthenia, Blood stasis, Jinxu, Phlegm retention, Heart syndrome, Lung syndrome, Spleen syndrome, Liver syndrome, Kidney syndrome), so as to validate the effectiveness of the proposed method. This dataset contains multimodal sensor data about the health status of various individuals. These data include scores of measured physical states and reports of subjective information obtained by first-person questionnaires; in addition, features are extracted from images of the individual’s tongue, face, and eyes. The corresponding labelled data set comprises the scores of thirteen Zhengs given by four TCM doctors (TCMDs) who inspected and diagnosed the provided samples. The labelled Zheng scores range from 0 to 10. However, most of these data have values less than 5 because the subject volunteers were students at the university, and thus they were generally healthy. The data from the first-person questionnairescontainsnine types of feelings and thirteen physical states related tohealth status, as proposed by the World Health Organization (WHO). The scores of the corresponding items range from 0 to 5. The features that were extracted from the images of the faces, and tongues are shown in Figure 2.

The extracted features were combined with the above feelings and physical states to form the matrix A. Each of these items is the modes of the features. The training data set includes five modes: Feelings, Physical States, Eye, Tongue, and Face. The modes and the number of features for each mode are shown in Table 1. The total number of features is 71. There are 150 instances from 32 individuals in the dataset, each of which includes 71 features and the corresponding thirteen Zheng scores labelled by the four TCMDs.

The matrix EF of eigenfeature vectors of the instances is obtained by Equation (1) with calculating the eigen values and eigen vectors of $A^{'} * A$ . Then, the matrix EF is used to train the binary classifiers of TCM Zhengs.

Although two kinds of classification models are trained In order to verify the above statement that $Q o S T D^{i j}$ can be utilized as the weight of the predicted class score to improve the performance of the classifiers, especially, in the case that the training data are subjectively labelled and the ground truth is uncertain, all of existing supervised classification models are available. For these two kinds of classifiers, one is SVM model that is trained by utilizing the MATLAB (Mathworks, Natick, MA, USA) function fitcsvm. The kernel function here is a polynomial of order three. The other is DA model that is trained by MATLAB function fitcdiscr.

Based on the above binary classification model, the class scores of the instances belonging to Zheng j are obtained by using the MATLAB function predict. Then, the class scores are used to calculate the likelihood measures of the corresponding instances by Equations (5) and (6). For the SVM model, the value of parameter a in the Equation (6) is set as 1/100, and for the DA model, the value of that is set as 50. Moreover, T_max = 0.99, and T_min = 0.01. For a instance s, it is assigned to Zheng j, if the calculated likelihood is larger than 0.99;

(a) (b)

Figure 2. Features extracted from face and tongue. (a) Face. (b) Tongue.

Table 1. Modes and the number of features.

if the likelihood is less than 0.01, the instance s is not belonged to Zheng j; otherwise, the assigned post of the instance s is uncertain. The training procedure is repeated with the refined training data set, until the limited rounds or $R e c a l l_O b j^{i j}$ are reached.

6. Experimental Results and Analysis

6.1. About $Q o S T D^{i, j}$

Figure 3 shows the values of $Q o S T D^{i j}$ for all thirteen Zhengs, as labelled by four TCMDs.

From Figure 3, we can see that the quality of the training data sets labelled by TCMD1 and TCMD4 have relatively high values for most of the Zhengs. For TCMD1, eight of the Zhengs have a $Q o S T D^{i j}$ value larger than 0.6; for TCMD4, ten of them have a $Q o S T D^{i j}$ value larger than 0.6. However, for the data sets labelled by TCMD2 and TCMD3, the values of $Q o S T D^{i j}$ are relatively low. Most of Zhengs have a $Q o S T D^{i j}$ value less than 0.6. It is observed that the quality of Zheng scores labelled byTCMD2 and TCMD3 are not as good as those labelled by TCMD1 and TCMD4. We thus think that it is certain that the $Q o S T D^{i j}$ can be used as a criterion for judging the quality of the subjectively labelled training data. If the $Q o S T D^{i j}$ is less than a threshold, the following learning procedure should be given up, so as to ensure the performance of the classification.

6.2. About Adjusting the Predicted Class Scores

As described in Section 3, the predicted class scores of the instances are adjusted by introducing $Q o S T D^{i, j}$ as the weights of those scores. For exploiting how adjusting the predicted class scores improve the performance of the classification, another measurement that reflects the consistency of the labelled score of the assigned instances from the training data with the likelihoods of those in the case without adjusting the class scores is introduced. This measurement $C o n_o r i^{i j}$ is calculated by the Equation (9).

Figure 3. Quality of the training data set for all thirteen Zhengs.

$C o n_o r i^{i j} = \frac{\sqrt{\sum_{s = 1}^{S 1^{i j}} (p z_{s}^{i j} - \bar{p z}_{s}^{i j}) (l_a_{s}^{i j} - \bar{l_a}_{s}^{i j})}}{\sqrt{{‖ p z_{s}^{i j} - {\bar{p z}}_{s}^{i j} ‖}^{2}} \sqrt{{‖ l_a_{s}^{i j} - \bar{l_a}_{s}^{i j} ‖}^{2}}}$ (9)

where, $l_a_{s}^{i j}$ indicates the likelihoods of the assigned instances in the case that the class scores are not adjusted by the Equation (5).

Table 2 shows the values of $Q o S T D^{i j}$ and the first round’s results of $C o n^{i j}$ and $C o n_o r i^{i j}$ of the instances in the above training data set regarding thirteen Zhengs by using the DA-based and the SVM-based binary classifiers, while the scores of the instances are labelled by the TCM doctor identified as 1 (TCMD1). Table 3 shows the corresponding results, while the scores of the instances are

Table 2. Results based on TCMD1.

Table 3. Results based on TCMD3.

labelled by the TCM doctor identified as 3 (TCMD3). For Table 3, the results about Zheng 1 are empty, because there were not any instances labelled by TCMD3 for Zheng 1 in the training data set.

From the results of Table 2 and Table 3, we can see that the values of $C o n^{i j}$ is increased for almost of the Zhengs, compared with the results of $C o n_{o r i}^{i j}$ . In the case of TCMD1, compared with the corresponding values of $C o n_{o r i}^{i j}$ , the values of $C o n^{i j}$ for all of the Zhengs are gained with the SVM classification model; those rise for twelve of thirteen Zhengs with DA model. In the case of TCMD3, similar with the case of TCMD1, except two Zhengs with DA model, the values of $C o n^{i j}$ are increased for the Zhengs with neither SVM nor DA model. Especially, the increased rates are relatively notable for the almost Zhengs in the case that the $Q o S T D^{i j}$ is less than 0.5.

Moreover, it is observed that the most of $C o n^{i j}$ are larger with SVM modal compared with DA model; however, the most of increased rates of $C o n^{i j}$ to $C o n_o r i^{i j}$ are larger with DA model compared with SVM model. This means that the DA-based classifiers are more sensitive to $Q o S T D^{i j}$ than the SVM- based classifiers, although the SVM-based classifiers seem to have the better classification ability.

Accordingly, we can say that adjusting the predicted class scores with $Q o S T D^{i j}$ as the weights really improves the performance of the classifiers trained, especially in the cases that the classifiers are trained by the data set that is with the low values of $Q o S T D^{i j}$ , whatever the classification models which are used to train the classifiers.

6.3. About Re-Training

As described in Section 4, the training produce is repeated for constructing the classifiers with the refined training data set until the limited rounds or $R e c a l l_O b j^{i j}$ are reached. In our experiments, the limited rounds is set as 20, $C o n_O b j^{i j}$ is set as 0.7, and $R e c a l l_O b j^{i j}$ is set as 0.1. Table 4 shows the results of maximal $C o n^{i j}$ , the corresponding round, and the difference of maximal and first round $C o n^{i j}$ regarding thirteen Zhengs, while the scores of the instances are labelled by the TCM doctor identified as 1 (TCMD1). Table 5 shows the corresponding results, while the scores of the instances are labelled by the TCM doctor identified as 3 (TCMD3).

From the results of Table 4 and Table 5, we can see that the differences of maximal and first-round $C o n^{i j}$ is larger or equal to 0, whatever the cases of TCM doctors and the models used to training the classifiers. Especially, in most cases, these values are larger than 0. Moreover, the differences of maximal and first-round $C o n^{i j}$ is relatively high in the case of TCMD3 which corresponds to the relatively low $Q o S T D^{i j}$ . So we can deduce that eliminating the unsigned examples in the training data set and re-training the binary classifiers with the refined training data set can really improve the performance of the classifiers for the classification, especially in the case that $Q o S T D^{i j}$ is relatively low. However, it is noted that for some of Zhengs in Table 4 and Table 5, the maximal values

Table 4. Results based on TCMD1.

Table 5. Results based on TCMD3.

of $C o n^{i j}$ do not reach the value of $C o n_O b j^{i j}$ that is set as 0.7, although they raise after re-training. It is indicated that the provided training data set cannot make the corresponding classifiers achieve the required performance for these Zhengs. In such cases, constructing the classifiers for these states should be given up.

It is also noted that the round of re-training that make $C o n^{i j}$ maximal is different. For example, for Zheng 2 in the case of TCMD1, the maximal $C o n^{1, 2}$ is 0.84, and the corresponding round is 2th round by DA model; the maximal $C o n^{1, 2}$ is 0.80, and the corresponding round is 9th round by SVM model. For Zheng 2 in the case of TCMD3, the maximal $C o n^{1, 2}$ is 0.90, and the corresponding round is 4th round by DA model; the maximal $C o n^{1, 2}$ is 0.60, and the corresponding round is 6th round by SVM model. It is obvious that this matches the issue described in [18] that discarding an uncertain instance in the training dada set maybe influence the performance of the classification because that is an exception rather than an error for the small training data set. So, we deduce that we can’t say that re-training the classifiers with the refined training data set consecutively must improve the performance of the classification. The proposed solution regarding the above issue is to find the round that make $C o n^{i j}$ maximal and satisfy the condition regarding $R e c a l l_O b j^{i j}$ , so as to adopt the classifier trained in this round.

As a whole, we used the real-world training data set to train the classification models. This training data set involved in thirteen Zhengs labelled by TCM doctors with the label noises that were caused because the absolute ground-truth is unknown, while the current research in the literature almost utilizes the artificial corrupted training data set. The experiments verified that the $Q o S T D^{i j}$ is relevant to the performance of classifying the classes without the absolute ground- truth. There is the high positive correlation between $Q o S T D^{i j}$ and $C o n^{i j}$ , introducing the measurement of $Q o S T D^{i j}$ as the weights of the predicted class scores to adjust the likelihoods of the instances really improved the performance of the classifiers.

7. Conclusions

This paper defined the $Q o S T D^{i, j}$ metric as a way to measure the quality of training data subjectively labelled by observers (i), which was used to improve the prediction of states (j) without the absolute ground-truth. The $Q o S T D^{i j}$ was used as the weights of the predicted class scores to adjust the likelihoods of the instances. Moreover, two measurements of $C o n^{i j}$ and $R e c a l l^{i j}$ were defined in order to assess the performance of the classifiers trained by the subjective labelled data in a more suitable way. The training procedure was repeated by the refined training data set, until the object values of $C o n^{i j}$ and $R e c a l l^{i j}$ were reached.

For verifying the effectiveness of the proposed method, real-world training data set was used to train the classifiers based on the DA and SVM classification models. This training data set involved in thirteen Zhengs labelled by TCM doctors with the label noises that was caused because the absolute ground-truth is unknown. The experimental results showed the effectiveness of the proposed method in improving the performance of the classifiers for the instances without the absolute ground truth. Furthermore, the proposed method indicated the solution how to eliminate the instances with the label noises from the training data set.

As an area of future work, we intend to utilize the other training data set in the field of emotion, personality, and so on, to train the classifiers based on the proposed method, to verify the effectiveness of our method in improving the classification in the subjective domains.

Acknowledgements

This work was supported by research funds from Iwate Prefecture University. The author would like to thank Prof. Shaozi Li and Prof. Feng Guo in Xiamen University for their cooperation in data collection, system implementation, and experiments.

Cite this paper

Dai, Y. (2017) Quality Assessment of Training Data with Uncertain Labels for Classification of Subjective Domains. Journal of Computer and Communications, 5, 152-168. https://doi.org/10.4236/jcc.2017.57014

References

1. Picard, R.W. (2000) Affective Computing. The MIT Press, Cambridge, MA.

2. Dai, Y., et al. (Ed.) (2010) Kansei Engineering and Soft Computing: Theory and Practice. Engineering Science Reference, IGI Global.

3. Vinciarelli, A. and Mohammadi, G. (2014) A Survey of Personality Computing. IEEE Transactions on Affective Computing, 5, 273-291. https://doi.org/10.1109/TAFFC.2014.2330816

4. Brodley, C.E. and Friedl, M.A. (1999) Identify Mislabeled Training Data. Journal of Artificial Intelligence Research, 11, 131-167.

5. Frenay, B. and Verleysen, M. (2014) Classification in the Presence of Label Noise: A Survey. IEEE Transactions on Neural Networks and Learning Systems, 25, 845-869. https://doi.org/10.1109/TNNLS.2013.2292894

6. Schafer, J.L. and Graham, J.W. (2002) Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147-177. https://doi.org/10.1037/1082-989X.7.2.147

7. Guo, F., Dai, Y., Li, S. and Ito, K. (2010) Inferring Individuals’ Sub-Health and Their TCM Syndrome Based on the Diagnosis of TCM Doctors. 2010 IEEE International Conference on Systems Man and Cybernetics (SMC), Istanbul, 10-13 October 2010, 3717-3724.

8. Wang, Y., Dai, Y., Guo, F. and Li, S. (2011) Sensitive-Based Information Selection for Predicting Individual’s Sub-Health on TCM Doctors’ Diagnosis Data. Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, 23, 749-760. https://doi.org/10.3156/jsoft.23.749

9. Bouveyron, C. and Girard, S. (2009) Robust Supervised Classification with Mixture Models: Learning from Data with Uncertain Labels. Pattern Recognition, 42, 2649-2658.

10. Li, Y., Guan, C., Li, H. and Chin, Z. (2008) A Self-Training Semi-Supervised SVM Algorithm and Its Application in an EEG-Based Grain Computer Interface Speller System. Pattern Recognition Letters, 29, 1285-1294.

11. Pechenizkiy, M., Tsymbal, A., Puuronen, S. and Pechenizkiy, O. (2006) Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction. 19th IEEE International Symposium on Computer-Based Medical Systems, Salt Lake City, UT, 22-23 June 2006, 708-713. https://doi.org/10.1109/cbms.2006.65

12. Cadesch, P.R. (1986) Subjective Inference with Multiple Evidence. Artificial Intelligence, 28, 333-341.

13. Yanaru, T. (1995) An Emotion Processing System Based on Fuzzy Inference and Subjective Observations. 2nd New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems, Dunedin, 20-23 November 1995, 15-20. https://doi.org/10.1109/annes.1995.499429

14. Fu, Y., et al. (2015) Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 563-577. https://doi.org/10.1109/TPAMI.2015.2456887

15. Kroupi, E., Hanhart, P., Lee, J.S., Rerabek, M. and Ebrahimi, T. (2014) Predicting Subjective Sensation of Reality during Multimedia Consumption Based on EEG and Peripheral Physiological Signals. 2014 IEEE International Conference on Multimedia and Expo (ICME), Chengdu, 14-18 July 2014, 1-6. https://doi.org/10.1109/ICME.2014.6890239

16. Mikaeil, A.M., Guo, B. and Wang, Z. (2014) Machine Learning to Data Fusion Approach for Cooperative Spectrum Sensing. 2014 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), Shanghai, 13-15 October 2014, 429-434. https://doi.org/10.1109/CyberC.2014.80

17. Tan, Y., et al. (2014) Utilizing Concept Correlations for Effective Imbalanced Data Classification. 2014 IEEE 15th International Conference on Information Reuse and Integration (IRI), Redwood City, CA, 13-15 August 2014, 561-568.

18. Liu, J.-X., Xu, Y., Zheng, C.H., Kong, H. and Lai, Z.H. (2015) RPCA-Based Tumor Classification Using Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12, 964-970. https://doi.org/10.1109/TCBB.2014.2383375

19. Dai, Y. (2013) Evaluating the Effect of Different Mode’s Attributes on the Subjective Classification in the Case of TCM. 2013 5th International Conference on Computational Intelligence, Modelling and Simulation (CIMSim), Seoul, 24-25 September 2013, 171-176. https://doi.org/10.1109/cimsim.2013.35

20. Dai, Y. (2014) Predicting Person’s Zheng States Using the Heterogeneous Sensor Dada by the Semi-Subjective Teaching of TCM Doctors. 2014 IEEE International Conference on Systems, Man and Cybernetics (SMC), San Diego, CA, 5-8 October 2014, 636-641. https://doi.org/10.1109/SMC.2014.6973980

21. Sugiyama, K. (1999) An Introduction to Multivariate Data Analysis. Asakura Bookstore. (In Japanese)

Journal Menu>>