Paper Menu >>
Journal Menu >>
![]() J. Biomedical Science and Engineering, 2008, 1, 104-109 Published Online August 2008 in SciRes. http://www.srpublishing.org/journal/jbise JBiSE Relationship between symptoms of traditional Chinese medicine and indicator of western medi- cine about liver cirrhosis Yan Wang1, Li-Zhuang Ma3, Ping Liu2 & Xiao-Wei Liao1 1Department of Computer Science & Engineering, Shanghai Jiao tong University, Shanghai, China. 2Institute o f Liver Dis eases, Sh anghai Uni versity o f Tradi- tional Chinese Medicine, Shanghai, China. 3Department of Computer Science & Engineering, Shanghai Jiao tong University, Center of Traditional Chinese Medicine Info rmation Science and Technology, Shanghai University of Traditional Chinese Medicine, Shanghai, China . (w angyan8383@sjtu.edu.cn) ABSTRACT Traditional Chinese medicine (TCM) is one of the safe and effective methods to treat liver cir- rhosis. The practitioners of TCM assess hepatic function in term of syndrome. But the course of syndrome differentiation is subjectivity. At pre- sent most of all the researches are focused on the relationship between the syndrome and the Western medicine objective indicators such as child-pugh grade. In fact syndrome is the syn- thesis of signs and symptoms and collecting signs, symptoms is easy than syndrome differ- entiation. We try to explore the relationship be- tween the objective Western medicine standard such as child-pugh grade, decompensation or compensation stage, active or inactive period and the signs and symptoms of TCM by using the data mining method. We use the information gain method to assess the attributes and use five typical classifiers such as logistic, Bayes- Net, NaiveBayes, RBF and C4.5 to obtain the classification accuracy. After attribute selection, we obtain the main symptoms and signs of TCM relating to the stage, period and child-pugh grade about liver cirrhosis. The experiment re- sults show the classification accuracy is im- proved after filtering some symptoms and signs. Keywords: Traditional Chinese medicine, Liver cirrhosis, Attribute selection, Data mining, Clas- sification accuracy 1. INTRODUCTION Liver cirrhosis is the twelfth leading cause of death by disease, killing about 26,000 people each year. Also, the cost of liver cirrhosis in terms of human suffering, hospi- tal costs, and lost productivity is high [1]. Many efforts have been made; at last the researchers find that the treat- ment approach of traditional Chinese medicine (TCM) is more effective than other kinds of treatments [2-4]. Chi- nese medicine is safe and effective because of its pre- scriptive methodology [5]. During the diagnostics of TCM, the diagnosis is per- formed based on disease entities collected by four con- ventional examin ation s: inspectio n, smelling, inq uiry and palpation [6]. Collecting all the information, the practi- tioners of TCM will perform diagnosis and draw conclu- sions about patient’s pathological conditions in term of syndromes (called zheng in Chinese). The symptoms, no matter how they are produced, are always a sign that something is out of balance in the body mind, and the goal of professional Chinese medicine is to bring the entire organism back into a state of healthy, dynamic balance. Therefore, because it is pattern discrimination which allows us to see the larger picture or the whole person, it is treatment based on pattern discrimination which allows Chinese doctors to provide safe and effec- tive treatment without side effects [5]. Although ther e are many advantages in TCM, the sub- jectivity of the course of syndrome differentiation limits TCM’s development. One cannot apply this prescriptive methodology in a professionally standard, competent way until or unless one has mastered the syndrome dif- ferentiation course. Moreover the result of syndrome differentiation isn’t objective as lab parameters. A series of previous studies have shown that some correlations between the objective indicators of Western medicine and syndrome of TCM do exist [7, 8, 9]. But most of all the researches are focused on the rela- tionship between th e syndrome and th e indicator [7, 8, 9]. Now that the syndrome is the synthesis of signs and symptoms and the course of syndrome differentiation is subjective. We try to explore the relationship between the objective Western medicine standard and the signs and symptoms of TCM by using the data mining method. We hope to construct the classification model which can classify a new case with the corresponding Western indi- cator based on his signs and symptoms of TCM. Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. The idea is to build computer programs that sift through databases automatically, seeking regularities or patterns. SciRes Copyright © 2008 ![]() Y. Wang et al. / J. Biomedical Science and Engineering 1 (2008) 104-109 105 SciRes Copyright © 2008 JBiSE Strong patterns, if found, will likely generalize to make accurate predictions on future data [10]. A large number of the symptoms and the signs of TCM make it difficult to estimate the parameters of a classifier model. Attribute selection is the process of identifying and removing as much of the irrelevant and redundant information as possible. This reduces the di- mensionality of the data and allows learning algorithms to operate faster and more effectively. The result is a more compact, easily interpreted representation of the target concept especially in medical region. After attrib- ute selection we can get the key attributes that influenc- ing the degree of liver cirrhosis. In this paper we address attribute selection problems for classify the liver cirrho- sis. 2. MATERIALS AND METHODS 2.1. Dataset Construction The sample dataset is constructed with 294 patient cases offered by Dr. Qin Zhang [9]. Qin Zhang researches on the liver cirrhosis for many years in Shanghai Univ ersity of traditional Chinese medicine. Several approaches have been introduced to assess he- patic function, such as common biochemical tests, child- pugh score, and so on [15]. The severity of cirrhosis is commonly classified with child-pugh score, decompen- sation or compensation stage, active or inactive period. Child-pugh score uses bilirubin, albumin, INR, pres- ence and severity of ascites and ncephalopathy to clas- sify patients in child A, B or C; child A has avorable prognosis, while child C is at high risk of death. Compensation stage belongs to child A. Decompensa- tion stage belongs to middle and advanced liver cirrhosis. Active period shows that the patient has hepatitis clinic symptoms such as jaundice in evidence. In this paper, in order to analyze the relationship be- tween the numb er of attributes and the indicator of West- ern medicine about liver cirrhosis stage, we consider the following situations based on different assessing stan- dard of liver cirrhosis. According to the decompensation stage and compensation stage, 111 cases in compensa- tion stage, 183 cases in decompensation stage; Accord- ing to the child-pugh grade, 109 cases in child A, 93 cases in child B, 92 cases in child C; According to the active period and inactive period, 212 cases in active period, 82 cases in inactive period. Considered all of the attributes besides age, sex and avoirdupois exponent could be grouped into: symptoms, signs and the results of experimental examination. The main attributes con- sidered are: (i) forty symptoms such as lassitude and fa- tigue, night sweat, vexing heat in the five heart, skin itch- ing, depression, etc. (ii) twenty-seven signs such as pale tongue, white-thick and grimy tongue fur, splenomegaly, hepatomegaly, etc. (iii) the assessing standard of liver cirrhosis (iv) age, sex and avoirdupois exponent. In the patient records diagnosed by clinical physicians of TCM, the symptoms are in 4 grade scorings. If the patient has no the symptom he gets 1 point. If he has these symptoms he gets 2 to 4 according the state of the illness. Signs are in 2 grade scoring s. If the patient h a s no the sign he gets 1 point. If he has these signs he gets 2. Up to the liver cirrhosis stage, 1 represents co mpensation stage and 2 represents decompensation stage; child-pugh grade are in 3 grades, that is 1, 2, 3 degree; 1 represents the active period and 2 represents inactive period [9]. Besides, the datasets do not contain cases with missing values. Therefore, our research work does not require considering the situations of handling missing value. 2.2. Method 2.2.1. Attri b ut es Selection During the course of data mining, in practice, adding irrelevant attributes to a dataset often “confuses” ma- chine learning systems. The best way to select relevant attributes is manually, based on a deep understanding of the learning problem and what the attributes actually mean. However, automatic methods can also be useful. Reducing the dimensionality of the data by deleting un- suitable attributes improves the performance of learning algorithms. It also speeds them up, although this may be outweighed by the computation involved in attribute se- lection. More importantly, dimensionality reduction yields a more compact, more easily interpretable repre- sentation of the target concept, focusing the user’s atten- tion on the most relevant variables [10]. We use the information gain method to assess the at- tributes. It evaluates attributes by measuring their infor- mation gain with respect to the class. If X and Y are ran- dom variables, Yy ∈ equations 1 and 2 give the entropy of Y befor e a nd after observing X. 2 xXyY ()()log() (1) (|) (2) yY 2 HYpy py H(Y|X)= -p(y| x)logpyx ∈ ∈∈ =− ∑ ∑∑ In equation 1 and 2, where p(y) is the probability of y in Y, determined by dividing the number of tuples of y in Y by |Y|, the total number of tuples in Y; )|( xyp is the conditional probability. A log function to the base 2 is used, because the information is encoded in bits. The amount by which the entropy of Y decreases re- flects the additional information about Y provided by X and is called the information gain. Information gain is given by [1 4] ()(|) ()(|) (3) gainHYHY X HX HXY H(Y)+ H(X)-H(X,Y) = − =− = Some classification algorithms deal with nominal at- tributes only and cannot handle ones measured on a nu- meric scale. To use them on general datasets, numeric attributes must first be “discretized” into a small number of distinct ranges. In the dataset the avoirdupois exponent is numeric so ![]() 106 Y. Wang et al. / J. Biomedical Science and Engineering 1 (2008) 104-109 SciRes Copyright © 2008 JBiSE it need discretizes first. We use the entropy-based method to discretize the attribute. The entropy is defined as equations 1 and the minimum description length (MDL) principle is used to stop the discretization proce- dure. The MDL principle states that the “best” attributes is the attribute that can be encoded with the least number of bits. The MDL principle can be taken as an opera- tional definition of Occam’s Razor. More formally, if T is a theory inferred from data D, and then the total de- scription length is given by (4)DL(T,D)=DL(T)+DL(D|T) Equation (4) measures all description lengths in bits. After computing the information gain, we sort attributes by their individual evaluation. The procedure has low computational complexity. 2.2.2. Cross-Validation Because the data is not enough, we use an important technique that is 10 times 10-fold cross-validation for accuracy estimation. In cross-validation, the data is split into ten approximately equal partitions and each in turn is used for testing and th e remainder is used for training. That is, use nine-tenths for training and one-tenth for testing and repeat the procedure ten times so that, in the end, every instance has been used exactly once for test- ing. This is called tenfold cross-validation. To get a reliable error estimate, we repeat the cross- validation process 10 times, and then average th e results. This involves invoking the learning algorithm 100 times on datasets that are all nine-tenths the size of the origin al. 2.2.3. Classification Method The overarching goal of classification is to build a model that can be used for prediction. In our study, the goal of classification is to predict whether a patient is in decom- pensation stage and whether he is in active period and which grade is he in. We use five typical classifiers: lo- gistic, BayesNet, NaiveBayes, RBF and C4.5 to analysis how the number of attributes affects the accuracy of clas- sification. The five classifiers are used widely in the many areas especially in medicine area. For example: Yanan Sun shows that naive Bayes and Bayenet have better classification capability in Chinese traditional medical clinical diagnosis model [11], Qu haibin apples the decision tree to classify 194 patient records , the results show that the decision tree method is likely a promising method to self-extract diagnostic rules from patient records of Chinese medicine [12]. By using logistic method, Hua Cong researches the development tendencies of lung disease, physiological functions of lung that are inversely proved by clinical symptoms. Based on the classification accuracy, we filter some un- important attributes. The main attributes will help us to research the relationship between the indicator of West- ern medicine and the signs, symptoms of TCM and im- prove the predication accuracy. 3. RESULTS We compute the information gain of each attribute while choosing the decompensation or compensation stage as the classify label, the child-pugh grade as the classify label, the active or inactive period as the classify label respectively. Based on the five general classifiers such as logistic, BayesNet, NaiveBayes, RBF and C4.5, we con- struct five classification models. decompensation and compensation 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 2712 17 2227 32 3742 47 52 5762 67 attribute number accuracy(%) Logistic Bayesnet NaiveBayes RBF c4.5 Figure 1. Decompensation or compensation stage. ![]() Y. Wang et al. / J. Biomedical Science and Engineering 1 (2008) 104-109 107 SciRes Copyright © 2008 JBiSE active period and inactive period 60 62 64 66 68 70 72 74 76 78 80 82 2712 17 22 2732 37 42 47 5257 62 67 attribute number accuracy(%) logistic bayesnet naivebayes RBFNetwork c4.5 child grade 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 2712172227 323742 47 52 576267 attribute number accuracy(%) logisticbayesnetnaivebayes RBFNetwork C4.5 Figure 2. Child-pugh grade. Figure 3. Active or inactive period. Figure 1, Figure 2, and Figure 3 shows the relation- ship between the stage, the child-pugh grade, the period of liver cirrhosis and the number of attributes respec- tively. From three figures we can see that the accuracy of the classifiers will change with the number of attribute. Logistic algorithm is the most sensitive to the number ofattribute. The classification capability of BayesNet is close to the NaiveBayes method, it shows the attributes are in dependently. When we analysis the relationship between the obj ective standard such as child-pugh grade, active or inactive period, decompensation or compensa- tion stage and the signs, symptoms of TCM, we should filter some week attributes first. Based on the above figures, we also know the key at- tributes related to the indicators of Western medicine about liver cirrhosis. Table1 shows the main signs and symptoms of TCM related to the corr esponding indictors. Attributes selected are in accord with the research of a post- doctoral of Shanghai University of traditional Chi- nese medicine. After filter some weak attributes, we compare the clas- sification accuracy with the original dataset. Table 2 shows the result. Using the original dataset, to predicting whether the patient is in compensation stage the accuracy is only 58.8435% based on logistic model. After filtering some attributes, the prediction accuracy is over 80%. While predicting which child-pugh grade is the patient, the ac- curacy is 54.4218% based on BayesNet model. After filtering some attributes the accuracy is improved 10%. From table 2, we can know if we try to predict whether the patient is in compensation stage or not, the best predictor is logistic model. When predicting which ch ild-pugh grade is the patient in, the best predictor is c45 . Logistic model is the suitable to predict whether the ![]() 108 Y. Wang et al. / J. Biomedical Science and Engineering 1 (2008) 104-109 SciRes Copyright © 2008 JBiSE Table 1. The main attributes of the Wetsern medicine indictors. Main signs and symptoms of TCM Compensation or de- compensation stage Abdominal distension, low limbs puffy swelling, yellow body, yellow eyes, and yellow urine Child-pugh grade Abdominal distension, scant urine, fatigued and heavy limbs, low limbs puffy swelling, yellow body, yellow eyes, and yel lo w urin e Active or inactive pe- riod Lassitude and fatigue, vexing heat in the five hearts, abdominal distension, constipation, sloppy stool, fatigued and heavy limbs, distending pain in flanks, yellow body, yellow eyes, yellow urine, dim complexion Table 2. The classification accuracy(%) after or before filter attributes. Dataset before filtering Dataset after filtering Compensation Child-pugh Active Compensation Child-pugh Active Logistic 58.8435 43.8776 63.9456 80.9524 64.2857 79.932 BayesNet 74.4898 54.4218 71.4286 78.2313 64.966 77.551 NaiveBayes 73.4694 54.7619 70.4082 77.8912 64.6259 77.551 RBF 75.8503 54.7619 73.8095 79.5918 63.6054 78.9116 C4.5 73.4694 60.2041 69.7279 76.8707 64.966 78.5714 patient is in active period or not. When we research onthe liver cirrhosis, we can choose the best method to pre dict the patient situation. 4. CONCLUSION Chinese medicine becoming vibrant around the world as an important alternative source of health care, researches on modernizing TCM become to attract substantial atten- tions from the practitioners of TCM. But the subjective indicators of TCM limit its develop ment. In this paper, we express the disease severity objec- tively in term of the symptoms and signs of TCM. But too much attributes often confuse the practitioners of TCM. We use the date mining method to filter some un- important attributes. From the experiment, the main signs, the symptoms about the active period, compensation stage, and child-pugh grade are showed. Also we show that the compact dataset obtain the improved classifica- tion accuracy. At present, the accuracy is only about 70%. Improving the classification accuracy is our research goal. If the accuracy is improved, we can express the TCM theory objectively. ACKNOWLEDGEMENT We gratefully acknowledge all the researchers from the Shanghai Uni- versity of Traditional Chinese Medicine for the TCM databases and discussion of TCM topics. This research is partly supported by the traditional Chinese medicine etiologic study on the theory of insuffi- ciency and damage causing stasis and blockage in liver cirrhosis of China 973 project (No. 2006CB504801), National Science Fund of China (No. 60521 0 0 2). REFERENCE [1] http:// digestive.niddk.nih.gov /disease s / pubs/ cirrhosis. [2] Ping Liu. (2002) Contemporary hepatology in traditional Chinese medicine. People’ s Me d i c a l Publishing House, Beijing. [3] Qin Zhang, Hong Qiu, Lei Wang, et al. (2007) Correlation between syndromes of posthepatitic cirrhosis and biological parameters: a report of 355 cases. J o ur n a l o f C h i n e s e Integrative Medicine, 5 (2), 130-133. [4] Qin Zhang, Ping Liu, HuiFen Chen, et al. (2003) Multi-analysis characteristics of traditional Chinese medical syndrome of hepatocir- rhosis. Chinese Journal of integrated traditional and western medicine on liver diseases, 13 (2), 69-72. [5] Bob Flaws & Philippe Sionneau, (2005) The treatment of modern western medical diseases with Chinese medicine, Blue Popp y Press. [6] Xuewei Wang, Haibin Qu, Ping Liu, Yiyu Cheng. (2004) A self- learning expert system for diagnosis in traditional Chinese medicine. Expert systems with applications, 26, 557-566. [7] Guanhua He, Lanping Zhu. (2002) Exploration on relationship between cirrhosis of liver’s Traditional Chinese Medical Syndrome Differentiation Typing and Child-pugh degree, Complication. Liaoning Journal of Traditional Chinese Medicine, 29(1), 12-12. [8] Fangshi Zhu, Ping Pu, Haihang Zhu, et al. (1997) Study on correla- tion between the syndrome type of TCM and the classification of ![]() Y. Wang et al. / J. Biomedical Science and Engineering 1 (2008) 104-109 109 SciRes Copyright © 2008 JBiSE Child-Pugh in patients with cirrhosis. Chinese journal of integrated traditional and wester n m e di c i n e o n l i v e r d i s e a s es, 7(4). [9] Qin Zhang. (2005) Study on disease-pattern-efficacy integration in posthepatitic cirrhosis. Shanghai University of traditional Chinese medicine, Shanghai. [10] Ian H. Witten, Eibe Frank. (2006) Data Mining Practical Machine Learning Tools and Techniques, China Machine Press, Beijing. [11] Yanan Sun, Shiyong Ning, Mingyu Lu, et al. (2006) Chinese traditional medical clinical diagnosis for coronary heart disease based on bayes classification. Appli cat io n research of computer, 11, 164-166. [12] Haibin Qu, Lifeng Mao, Jie Wang. (2005) Method for self- extracting diagnostic rules of blood stasis syndrome based on decision tree. Chinese Journal of Biomedical Engineering, 24(6), 709-727. [13] Hua Cong, Qiming Zhang. (2002) Logistic regression on the diag- nosis and prescription of lung disease. Journal of Shandong University of TCM, 26(5), 322-327. [14] Mark A. Hall. (1999) correlation-based feature selection for ma- chine learning, University of Waikato, New Zealand. [15] G Ercolani. (2006) predictive indices of morbidity and mortality after liver resection. annals of surgery, 244(4),635-637. |