Relationship between symptoms of traditional Chinese medicine and indicator of western medicine about liver cirrhosis

doi:10.4236/jbise.2008.12017

Paper Menu >>

Journal Menu >>

J. Biomedical Science and Engineering, 2008, 1, 104-109

Published Online August 2008 in SciRes. http://www.srpublishing.org/journal/jbise JBiSE

Relationship between symptoms of traditional

Chinese medicine and indicator of western medi-

cine about liver cirrhosis

Yan Wang1, Li-Zhuang Ma3, Ping Liu2 & Xiao-Wei Liao1

1Department of Computer Science & Engineering, Shanghai Jiao tong University, Shanghai, China. 2Institute o f Liver Dis eases, Sh anghai Uni versity o f Tradi-

tional Chinese Medicine, Shanghai, China. 3Department of Computer Science & Engineering, Shanghai Jiao tong University, Center of Traditional Chinese

Medicine Info rmation Science and Technology, Shanghai University of Traditional Chinese Medicine, Shanghai, China . (w angyan8383@sjtu.edu.cn)

ABSTRACT

Traditional Chinese medicine (TCM) is one of

the safe and effective methods to treat liver cir-

rhosis. The practitioners of TCM assess hepatic

function in term of syndrome. But the course of

syndrome differentiation is subjectivity. At pre-

sent most of all the researches are focused on

the relationship between the syndrome and the

Western medicine objective indicators such as

child-pugh grade. In fact syndrome is the syn-

thesis of signs and symptoms and collecting

signs, symptoms is easy than syndrome differ-

entiation. We try to explore the relationship be-

tween the objective Western medicine standard

such as child-pugh grade, decompensation or

compensation stage, active or inactive period

and the signs and symptoms of TCM by using

the data mining method. We use the information

gain method to assess the attributes and use

five typical classifiers such as logistic, Bayes-

Net, NaiveBayes, RBF and C4.5 to obtain the

classification accuracy. After attribute selection,

we obtain the main symptoms and signs of TCM

relating to the stage, period and child-pugh

grade about liver cirrhosis. The experiment re-

sults show the classification accuracy is im-

proved after filtering some symptoms and signs.

Keywords: Traditional Chinese medicine, Liver

cirrhosis, Attribute selection, Data mining, Clas-

sification accuracy

1. INTRODUCTION

Liver cirrhosis is the twelfth leading cause of death by

disease, killing about 26,000 people each year. Also, the

cost of liver cirrhosis in terms of human suffering, hospi-

tal costs, and lost productivity is high [1]. Many efforts

have been made; at last the researchers find that the treat-

ment approach of traditional Chinese medicine (TCM) is

more effective than other kinds of treatments [2-4]. Chi-

nese medicine is safe and effective because of its pre-

scriptive methodology [5].

During the diagnostics of TCM, the diagnosis is per-

formed based on disease entities collected by four con-

ventional examin ation s: inspectio n, smelling, inq uiry and

palpation [6]. Collecting all the information, the practi-

tioners of TCM will perform diagnosis and draw conclu-

sions about patient’s pathological conditions in term of

syndromes (called zheng in Chinese). The symptoms, no

matter how they are produced, are always a sign that

something is out of balance in the body mind, and the

goal of professional Chinese medicine is to bring the

entire organism back into a state of healthy, dynamic

balance. Therefore, because it is pattern discrimination

which allows us to see the larger picture or the whole

person, it is treatment based on pattern discrimination

which allows Chinese doctors to provide safe and effec-

tive treatment without side effects [5].

Although ther e are many advantages in TCM, the sub-

jectivity of the course of syndrome differentiation limits

TCM’s development. One cannot apply this prescriptive

methodology in a professionally standard, competent

way until or unless one has mastered the syndrome dif-

ferentiation course. Moreover the result of syndrome

differentiation isn’t objective as lab parameters. A series

of previous studies have shown that some correlations

between the objective indicators of Western medicine

and syndrome of TCM do exist [7, 8, 9].

But most of all the researches are focused on the rela-

tionship between th e syndrome and th e indicator [7, 8, 9].

Now that the syndrome is the synthesis of signs and

symptoms and the course of syndrome differentiation is

subjective. We try to explore the relationship between

the objective Western medicine standard and the signs

and symptoms of TCM by using the data mining method.

We hope to construct the classification model which can

classify a new case with the corresponding Western indi-

cator based on his signs and symptoms of TCM.

Data mining is the extraction of implicit, previously

unknown, and potentially useful information from data.

The idea is to build computer programs that sift through

databases automatically, seeking regularities or patterns.

Y. Wang et al. / J. Biomedical Science and Engineering 1 (2008) 104-109 105

Strong patterns, if found, will likely generalize to make

accurate predictions on future data [10].

A large number of the symptoms and the signs of

TCM make it difficult to estimate the parameters of a

classifier model. Attribute selection is the process of

identifying and removing as much of the irrelevant and

redundant information as possible. This reduces the di-

mensionality of the data and allows learning algorithms

to operate faster and more effectively. The result is a

more compact, easily interpreted representation of the

target concept especially in medical region. After attrib-

ute selection we can get the key attributes that influenc-

ing the degree of liver cirrhosis. In this paper we address

attribute selection problems for classify the liver cirrho-

sis.

2. MATERIALS AND METHODS

2.1. Dataset Construction

The sample dataset is constructed with 294 patient cases

offered by Dr. Qin Zhang [9]. Qin Zhang researches on

the liver cirrhosis for many years in Shanghai Univ ersity

of traditional Chinese medicine.

Several approaches have been introduced to assess he-

patic function, such as common biochemical tests, child-

pugh score, and so on [15]. The severity of cirrhosis is

commonly classified with child-pugh score, decompen-

sation or compensation stage, active or inactive period.

Child-pugh score uses bilirubin, albumin, INR, pres-

ence and severity of ascites and ncephalopathy to clas-

sify patients in child A, B or C; child A has avorable

prognosis, while child C is at high risk of death.

Compensation stage belongs to child A. Decompensa-

tion stage belongs to middle and advanced liver cirrhosis.

Active period shows that the patient has hepatitis clinic

symptoms such as jaundice in evidence.

In this paper, in order to analyze the relationship be-

tween the numb er of attributes and the indicator of West-

ern medicine about liver cirrhosis stage, we consider the

following situations based on different assessing stan-

dard of liver cirrhosis. According to the decompensation

stage and compensation stage, 111 cases in compensa-

tion stage, 183 cases in decompensation stage; Accord-

ing to the child-pugh grade, 109 cases in child A, 93

cases in child B, 92 cases in child C; According to the

active period and inactive period, 212 cases in active

period, 82 cases in inactive period. Considered all of the

attributes besides age, sex and avoirdupois exponent

could be grouped into: symptoms, signs and the results

of experimental examination. The main attributes con-

sidered are: (i) forty symptoms such as lassitude and fa-

tigue, night sweat, vexing heat in the five heart, skin itch-

ing, depression, etc. (ii) twenty-seven signs such as pale

tongue, white-thick and grimy tongue fur, splenomegaly,

hepatomegaly, etc. (iii) the assessing standard of liver

cirrhosis (iv) age, sex and avoirdupois exponent.

In the patient records diagnosed by clinical physicians

of TCM, the symptoms are in 4 grade scorings. If the

patient has no the symptom he gets 1 point. If he has

these symptoms he gets 2 to 4 according the state of the

illness. Signs are in 2 grade scoring s. If the patient h a s no

the sign he gets 1 point. If he has these signs he gets 2.

Up to the liver cirrhosis stage, 1 represents co mpensation

stage and 2 represents decompensation stage; child-pugh

grade are in 3 grades, that is 1, 2, 3 degree; 1 represents

the active period and 2 represents inactive period [9].

Besides, the datasets do not contain cases with missing

values. Therefore, our research work does not require

considering the situations of handling missing value.

2.2. Method

2.2.1. Attri b ut es Selection

During the course of data mining, in practice, adding

irrelevant attributes to a dataset often “confuses” ma-

chine learning systems. The best way to select relevant

attributes is manually, based on a deep understanding of

the learning problem and what the attributes actually

mean. However, automatic methods can also be useful.

Reducing the dimensionality of the data by deleting un-

suitable attributes improves the performance of learning

algorithms. It also speeds them up, although this may be

outweighed by the computation involved in attribute se-

lection. More importantly, dimensionality reduction

yields a more compact, more easily interpretable repre-

sentation of the target concept, focusing the user’s atten-

tion on the most relevant variables [10].

We use the information gain method to assess the at-

tributes. It evaluates attributes by measuring their infor-

mation gain with respect to the class. If X and Y are ran-

dom variables, Yy

∈

equations 1 and 2 give the entropy

of Y befor e a nd after observing X.

xXyY

()()log() (1)

(|) (2)

HYpy py

H(Y|X)= -p(y| x)logpyx

∈

∈∈

=−

∑

∑∑

In equation 1 and 2, where p(y) is the probability of y

in Y, determined by dividing the number of tuples of y in

Y by |Y|, the total number of tuples in Y; )|( xyp is the

conditional probability. A log function to the base 2 is

used, because the information is encoded in bits.

The amount by which the entropy of Y decreases re-

flects the additional information about Y provided by X

and is called the information gain. Information gain is

given by [1 4] ()(|)

()(|) (3)

gainHYHY X

HX HXY

H(Y)+ H(X)-H(X,Y)

−

=−

Some classification algorithms deal with nominal at-

tributes only and cannot handle ones measured on a nu-

meric scale. To use them on general datasets, numeric

attributes must first be “discretized” into a small number

of distinct ranges.

In the dataset the avoirdupois exponent is numeric so

106 Y. Wang et al. / J. Biomedical Science and Engineering 1 (2008) 104-109

it need discretizes first. We use the entropy-based

method to discretize the attribute. The entropy is defined

as equations 1 and the minimum description length

(MDL) principle is used to stop the discretization proce-

dure. The MDL principle states that the “best” attributes

is the attribute that can be encoded with the least number

of bits. The MDL principle can be taken as an opera-

tional definition of Occam’s Razor. More formally, if T

is a theory inferred from data D, and then the total de-

scription length is given by

(4)DL(T,D)=DL(T)+DL(D|T)

Equation (4) measures all description lengths in bits.

After computing the information gain, we sort attributes

by their individual evaluation. The procedure has low

computational complexity.

2.2.2. Cross-Validation

Because the data is not enough, we use an important

technique that is 10 times 10-fold cross-validation for

accuracy estimation. In cross-validation, the data is split

into ten approximately equal partitions and each in turn

is used for testing and th e remainder is used for training.

That is, use nine-tenths for training and one-tenth for

testing and repeat the procedure ten times so that, in the

end, every instance has been used exactly once for test-

ing. This is called tenfold cross-validation.

To get a reliable error estimate, we repeat the cross-

validation process 10 times, and then average th e results.

This involves invoking the learning algorithm 100 times

on datasets that are all nine-tenths the size of the origin al.

2.2.3. Classification Method

The overarching goal of classification is to build a model

that can be used for prediction. In our study, the goal of

classification is to predict whether a patient is in decom-

pensation stage and whether he is in active period and

which grade is he in. We use five typical classifiers: lo-

gistic, BayesNet, NaiveBayes, RBF and C4.5 to analysis

how the number of attributes affects the accuracy of clas-

sification. The five classifiers are used widely in the

many areas especially in medicine area.

For example: Yanan Sun shows that naive Bayes and

Bayenet have better classification capability in Chinese

traditional medical clinical diagnosis model [11], Qu

haibin apples the decision tree to classify 194 patient

records , the results show that the decision tree method is

likely a promising method to self-extract diagnostic rules

from patient records of Chinese medicine [12]. By using

logistic method, Hua Cong researches the development

tendencies of lung disease, physiological functions of

lung that are inversely proved by clinical symptoms.

Based on the classification accuracy, we filter some un-

important attributes. The main attributes will help us to

research the relationship between the indicator of West-

ern medicine and the signs, symptoms of TCM and im-

prove the predication accuracy.

3. RESULTS

We compute the information gain of each attribute while

choosing the decompensation or compensation stage as

the classify label, the child-pugh grade as the classify

label, the active or inactive period as the classify label

respectively. Based on the five general classifiers such as

logistic, BayesNet, NaiveBayes, RBF and C4.5, we con-

struct five classification models.

decompensation and compensation

2712 17 2227 32 3742 47 52 5762 67

attribute number

accuracy(%)

Logistic Bayesnet NaiveBayes RBF c4.5

Figure 1. Decompensation or compensation stage.

Y. Wang et al. / J. Biomedical Science and Engineering 1 (2008) 104-109 107

active period and inactive period

2712 17 22 2732 37 42 47 5257 62 67

attribute number

accuracy(%)

logistic bayesnet naivebayes RBFNetwork c4.5

child grade

2712172227 323742 47 52 576267

attribute number

accuracy(%)

logisticbayesnetnaivebayes RBFNetwork C4.5

Figure 2. Child-pugh grade.

Figure 3. Active or inactive period.

Figure 1, Figure 2, and Figure 3 shows the relation-

ship between the stage, the child-pugh grade, the period

of liver cirrhosis and the number of attributes respec-

tively. From three figures we can see that the accuracy of

the classifiers will change with the number of attribute.

Logistic algorithm is the most sensitive to the number

ofattribute. The classification capability of BayesNet is

close to the NaiveBayes method, it shows the attributes

are in dependently. When we analysis the relationship

between the obj ective standard such as child-pugh grade,

active or inactive period, decompensation or compensa-

tion stage and the signs, symptoms of TCM, we should

filter some week attributes first.

Based on the above figures, we also know the key at-

tributes related to the indicators of Western medicine

about liver cirrhosis. Table1 shows the main signs and

symptoms of TCM related to the corr esponding indictors.

Attributes selected are in accord with the research of a

post- doctoral of Shanghai University of traditional Chi-

nese medicine.

After filter some weak attributes, we compare the clas-

sification accuracy with the original dataset. Table 2

shows the result.

Using the original dataset, to predicting whether the

patient is in compensation stage the accuracy is only

58.8435% based on logistic model. After filtering some

attributes, the prediction accuracy is over 80%. While

predicting which child-pugh grade is the patient, the ac-

curacy is 54.4218% based on BayesNet model. After

filtering some attributes the accuracy is improved 10%.

From table 2, we can know if we try to predict

whether the patient is in compensation stage or not, the

best predictor is logistic model.

When predicting which ch ild-pugh grade is the patient

in, the best predictor is c45 .

Logistic model is the suitable to predict whether the

108 Y. Wang et al. / J. Biomedical Science and Engineering 1 (2008) 104-109

Table 1. The main attributes of the Wetsern medicine indictors.

Main signs and symptoms of TCM

Compensation or de-

compensation stage Abdominal distension, low limbs puffy swelling, yellow body, yellow eyes, and yellow urine

Child-pugh grade Abdominal distension, scant urine, fatigued and heavy limbs, low limbs puffy swelling, yellow body,

yellow eyes, and yel lo w urin e

Active or inactive pe-

riod Lassitude and fatigue, vexing heat in the five hearts, abdominal distension, constipation, sloppy stool,

fatigued and heavy limbs, distending pain in flanks, yellow body, yellow eyes, yellow urine, dim

complexion

Table 2. The classification accuracy(%) after or before filter attributes.

Dataset before filtering Dataset after filtering

Compensation Child-pugh Active Compensation Child-pugh Active

Logistic 58.8435 43.8776 63.9456 80.9524 64.2857 79.932

BayesNet 74.4898 54.4218 71.4286 78.2313 64.966 77.551

NaiveBayes 73.4694 54.7619 70.4082 77.8912 64.6259 77.551

RBF 75.8503 54.7619 73.8095 79.5918 63.6054 78.9116

C4.5 73.4694 60.2041 69.7279 76.8707 64.966 78.5714

patient is in active period or not. When we research

onthe liver cirrhosis, we can choose the best method to

pre dict the patient situation.

4. CONCLUSION

Chinese medicine becoming vibrant around the world as

an important alternative source of health care, researches

on modernizing TCM become to attract substantial atten-

tions from the practitioners of TCM. But the subjective

indicators of TCM limit its develop ment.

In this paper, we express the disease severity objec-

tively in term of the symptoms and signs of TCM. But

too much attributes often confuse the practitioners of

TCM. We use the date mining method to filter some un-

important attributes. From the experiment, the main signs,

the symptoms about the active period, compensation

stage, and child-pugh grade are showed. Also we show

that the compact dataset obtain the improved classifica-

tion accuracy. At present, the accuracy is only about 70%.

Improving the classification accuracy is our research

goal. If the accuracy is improved, we can express the

TCM theory objectively.

ACKNOWLEDGEMENT

We gratefully acknowledge all the researchers from the Shanghai Uni-

versity of Traditional Chinese Medicine for the TCM databases and

discussion of TCM topics. This research is partly supported by the

traditional Chinese medicine etiologic study on the theory of insuffi-

ciency and damage causing stasis and blockage in liver cirrhosis of

China 973 project (No. 2006CB504801), National Science Fund of

China (No. 60521 0 0 2).

REFERENCE

[1] http:// digestive.niddk.nih.gov /disease s / pubs/ cirrhosis.

[2] Ping Liu. (2002) Contemporary hepatology in traditional Chinese

medicine. People’ s Me d i c a l Publishing House, Beijing.

[3] Qin Zhang, Hong Qiu, Lei Wang, et al. (2007) Correlation between

syndromes of posthepatitic cirrhosis and biological parameters: a report

of 355 cases. J o ur n a l o f C h i n e s e Integrative Medicine, 5 (2), 130-133.

[4] Qin Zhang, Ping Liu, HuiFen Chen, et al. (2003) Multi-analysis

characteristics of traditional Chinese medical syndrome of hepatocir-

rhosis. Chinese Journal of integrated traditional and western medicine

on liver diseases, 13 (2), 69-72.

[5] Bob Flaws & Philippe Sionneau, (2005) The treatment of modern

western medical diseases with Chinese medicine, Blue Popp y Press.

[6] Xuewei Wang, Haibin Qu, Ping Liu, Yiyu Cheng. (2004) A self-

learning expert system for diagnosis in traditional Chinese medicine.

Expert systems with applications, 26, 557-566.

[7] Guanhua He, Lanping Zhu. (2002) Exploration on relationship

between cirrhosis of liver’s Traditional Chinese Medical Syndrome

Differentiation Typing and Child-pugh degree, Complication. Liaoning

Journal of Traditional Chinese Medicine, 29(1), 12-12.

[8] Fangshi Zhu, Ping Pu, Haihang Zhu, et al. (1997) Study on correla-

tion between the syndrome type of TCM and the classification of

Y. Wang et al. / J. Biomedical Science and Engineering 1 (2008) 104-109 109

Child-Pugh in patients with cirrhosis. Chinese journal of integrated

traditional and wester n m e di c i n e o n l i v e r d i s e a s es, 7(4).

[9] Qin Zhang. (2005) Study on disease-pattern-efficacy integration in

posthepatitic cirrhosis. Shanghai University of traditional Chinese

medicine, Shanghai.

[10] Ian H. Witten, Eibe Frank. (2006) Data Mining Practical Machine

Learning Tools and Techniques, China Machine Press, Beijing.

[11] Yanan Sun, Shiyong Ning, Mingyu Lu, et al. (2006) Chinese

traditional medical clinical diagnosis for coronary heart disease based

on bayes classification. Appli cat io n research of computer, 11, 164-166.

[12] Haibin Qu, Lifeng Mao, Jie Wang. (2005) Method for self-

extracting diagnostic rules of blood stasis syndrome based on decision

tree. Chinese Journal of Biomedical Engineering, 24(6), 709-727.

[13] Hua Cong, Qiming Zhang. (2002) Logistic regression on the diag-

nosis and prescription of lung disease. Journal of Shandong University

of TCM, 26(5), 322-327.

[14] Mark A. Hall. (1999) correlation-based feature selection for ma-

chine learning, University of Waikato, New Zealand.

[15] G Ercolani. (2006) predictive indices of morbidity and mortality

after liver resection. annals of surgery, 244(4),635-637.