Paper Menu >>
Journal Menu >>
Engineering, 2009, 1, 211-215 doi:10.4236/eng.2009.13025 Published Online November 2009 (http://www.scirp.org/journal/eng). Copyright © 2009 SciRes. ENGINEERING Applications of Data Mining Theory in Electrical Engineering Yagang ZHANG, Jing MA, Jinfang ZHANG, Zengping WANG Key Laboratory of Power System Protection and Dynamic Security Monitoring and Control under Ministry of Education, North China Electric Power University, Baoding, China E-mail: yagangzhang@gmail.com Received January 10, 2009; revised February 21, 2009; accepted February 23, 2009 Abstract In this paper, we adopt a novel applied approach to fault analysis based on data mining theory. In our re- searches, global information will be introduced into the electric power system, we are using mainly cluster analysis technology of data mining theory to resolve quickly and exactly detection of fault components and fault sections, and finally accomplish fault analysis. The main technical contributions and innovations in this paper include, introducing global information into electrical engineering, developing a new application to fault analysis in electrical engineering. Data mining theory is defined as the process of automatically ex- tracting valid, novel, potentially useful and ultimately comprehensive information from large databases. It has been widely utilized in both academic and applied scientific researches in which the data sets are gener- ated by experiments. Data mining theory will contribute a lot in the study of electrical engineering. Keywords: Fault Analysis, Data Mining Theory, Classification, Electrical Engineering 1. Introduction Data mining is the efficient discovery of valuable, non-obvious information from a large collection of data. It is also referred to as exploratory data analysis, deals with extraction of knowledge from data. Data mining is the process of discovering interesting knowledge, such as patterns, associations, changes, anomalies and significant structures, from large amounts of data stored in databases, data warehouses, or other information repositories [1]. And data mining is usually used for very large databases, where it is normally not possible to comprehend or ana- lyze the data because of the complexity and the immen- sity of the size of database. It aims at the discovery of useful information from these large databases, and it is also popularly referred to as knowledge discovery in da- tabases (KDD). Data mining involves an integration of techniques from multiple disciplines such as database technology, statistics, machine learning, high-perform- ance computing, pattern recognition, neural networks, data visualization, information retrieval, etc [2–4]. A common problem in data mining is to find associations among attributes of the data. Data mining tasks have the following categories: [5] Class description; Association analysis; Cluster analysis; Outlier analysis; Evolution analysis. A fault is defined as a departure from an acceptable range of an observed variable or calculated parameter associated with equipments, that is, a fault is a process abnormality or symptom. In general, faults are deviations from the normal behavior in the plant or its instrumenta- tion. They may arise in the basic technological equipment or in its measurement and control instruments, and may represent performance deterioration, partial malfunctions or total breakdowns [6]. The analysis procedure locates the process or unit malfunction that caused the symptoms. The goal of fault analysis is to ensure the success of the planned operations by recognizing anomalies of system behavior. As a result of proper process monitoring, down- time is minimized, safety of plant operations is improved, and manufacturing costs are reduced. Generally speaking, the process of fault analysis can be divided into three main steps: alarm, identification, evaluation. Electric power system is one of the most complex arti- ficial systems in this world, which safe, steady, economi- cal and reliable operation plays a very important part in guaranteeing socioeconomic development, even in safe- Y. G. ZHANG ET AL. Copyright © 2009 SciRes. ENGINEERING 212 guarding social stability. In order to resolve this difficult problem, some methods and technologies that can reflect modern science and technology level have been intro- duced into this domain. Of course, no matter what kind of new analytical method or technical means we adopt, we must have a distinct recognition of electric power system itself and its complexity, and increase continuously analysis, operation and control level [7–11]. When electric power system operates from normal state to failure or abnormal operates, its electric quantities may change significantly. Relay protection is just using the sudden changes of electric to distinguish whether the power system is failure or abnormal operation. After con- trasting the electric variational measurements with the electric parameters of normal system, we can detect fault types and fault locations. Furthermore, we can implement selective failure removal. In our researches, global infor- mation will be introduced into the backup protection sys- tem. After some accidents, utilizing real-time measure- ments of phasor measurement unit (PMU), we will seek after for characters of electrical quantities’ marked chan- ges. Then we can carry out quickly and exactly analysis of fault components and fault sections, finally, we can accomplish fault isolation. Basing on statistical theory, we have carried out large numbers of basic researches in nonlinear complex systems [12–14]. In this paper, we are using mainly cluster analysis technology of data mining theory to resolve fault detection problem in electrical en- gineering. 2. Electric Circuit Principle We consider a circuit with resistors(R), inductors (L), and capacitors(C) [15]. The simplest circuit has one element of each connected in a loop. The part of the circuit containing one element is called a branch. The points where the branches connect are called nodes. In this simplest exam- ple, there are three branches and nodes. See Figure 1. We let R i,and be the current in the resistor, indu- ctor and capacitor respectively. Similarly let , and be the voltage drop across the three branches of the circuit. If we think of water flowing through pipes, then the current is like the rate of flow of water, and the voltage is like water pressure. Kirchhoff’s current law states that the total current flowing into a node must equal the current flowing out of that node. In the circuit L iC i R vL v C v Figure 1. RLC electric circuit. being discussed, this means that R LC iiiwith the correct choice of signs. We orient the branches in the direction given in Figure.1, so, R LC iii i . Kirchhoff’s voltage law states that the sum of the voltage drops around any loop is zero. For the present example, this just means that, 0 RLC vvv Next, we need to describe the properties of the ele- ments and the laws that determine how the variables change. A resistor is determined by a relationship betw- een the current and voltage. In the present sec- tion, we consider only a linear resistor given by R iR v R R vRi where is a constant. This is Ohm’s law. In further discussions, we consider as a nonlinear function of or as a nonlinear function of. 0R R i R v R iR v An inductor is characterized by giving the time deriva- tive of the current L di dt , in terms of the voltage : Fara- day’s law has proved that C v L L di Lv dt where the constant is called the inductance. Classically, an inductor was constructed by making a coil of wire. Then, the magnetic field induced by the change of current in the coil creates a voltage drop across the coil. 0L A capacitor is characterized by giving the time deriva- tive of the voltageC dv dt , in terms of the current , C i C C dv Ci dt where the constant is called the capacitance. 0C 3. Classification in the Data Mining Classification is one of the classical topics in the data mining field. Clustering is the process of grouping data objects into a set of disjoint classes, called clusters, so that objects within a class have high similarity to each other, while objects in separate classes are more dissimi- lar. Clustering is an example of unsupervised classifica- tion. “Classification” refers to a procedure that assigns data objects to a set of classes. “Unsupervised” means that clustering does not rely on predefined classes and training examples while classifying the data objects. Theories of classification come from philosophy, Y. G. ZHANG ET AL.213 )m mathematics, statistics, psychology, computer science, linguistics, biology, medicine, and other areas. Cluster analysis encompasses the methods used to: 1) Identify the clusters in the original data; 2) Determine the number of clusters in the original data; 3) Validate the clusters found in the original data. Cluster analysis has great strength in data analysis and has been applied successfully to the researches of various fields. Suppose there are samples, each sample has indexes, the observation data can be expressed as n 1, m (1,,, , ij inj ij d ()i . The most commonly used measurement that describes the degree of relationship is distance, is usually denoted the distance between samples and ()j . The distance definitions in common use include: a. Minkovski distance 1 1 ()[] (,1,2,,) mqq ijit jt t dqij n . b. Lance distance(0 ij ) 1 1 (), (,1,2,,) () mitjt ij titjt dLijn m . c. Mahalanobis distance 1 ()()()( ) ()()() (,1,2,,) ijij ij dMSij n Hereinto, is an inverses matrix of samples’ co- variance matrix. 1 S d. Oblique space distance In order to overcome the influence of relativity be- tween variables, one can define the distance of oblique space: 1 2 2 11 1 [()() ] (,1,2,,) mm ijikjkiljl kl kl dm ij n Hereinto, kl is the correlation coefficient between k and l . 4. Fault Analysis Based on Data Mining Now let us consider IEEE9-Bus system, Figure 2 is its electric diagram. In the structure of electric power net- work, Bus1 appears single-phase to ground fault. By BPA programs, the vector-valued of corresponding vari- ables is only exported one times in each period. Using these actual measurement data of corresponding variable, 1Gen 2Gen 3Gen 1 B us 2 B us 3Bus BusA BusB BusC Figure 2. Electric diagram of IEEE 9-Bus system. we can carry through fault analysis of fault component and non-fault component (fault section and non-fault section). 4.1. Fault Diagnosis Based on Node Phase Voltage After computing IEEE9-Bus system, we can get node phase voltages at 1 T , (Fault), ,and five times, see Table 1. Figure 3 is the dendrogram of cluster analy- sis based on node phase voltage. The entire cluster analysis process is carried out according to the principle of similarity from high to low (distance from near to far), the order is, 0 T1 T2 T3 T Steps 1: BusC combines with BusB and forms the new BusB; Steps 2: Bus3 combines with Bus2 and forms the new Bus2; Steps 3: BusA combines with Bus2 and forms the new Bus2; Steps 4: Bus2 combines with Gen1 and forms the new Gen1; Steps 5: Gen3 combines with Gen2 and forms the new Gen2; Steps 6: Gen2 combines with Gen1 and forms the new Gen1; Steps 7: BusB combines with Bus1 and forms the new Bus1; Steps 8: Bus1 combines with Gen1 and forms the new Gen1. It can be found easily out from Figure 3 that Bus1 has remarkable difference with other buses, and the fault characteristic is obvious. These results are entirely iden- tical to the fault location set in advance, so we can con- firm exactly fault location by the cluster analysis based on node phase voltage. 4.2. Fault Diagnosis Based on Node Negative Sequence Voltage By BPA programs, we can get node negative sequence voltage at 1 T ,(Fault), ,and five times, see Ta 0 T1 T2 T3 T ble 2. Figure 4 is the dendrogram of cluster analysis based on negative sequence voltage. Copyright © 2009 SciRes. ENGINEERING Y. G. ZHANG ET AL. Copyright © 2009 SciRes. ENGINEERING 214 Let us explain the entire process of cluster analysis in detail. The entire cluster analysis process is still carried out according to the principle of similarity from high to low (distance from near to far), the order is, Steps 1: BusA combines with Bus2 and forms the new Bus2; Steps 2: Bus3 combines with Bus2 and forms the new Bus2; Steps 3: BusC combines with BusB and forms the new BusB; Steps 4: Bus2 combines with Gen1 and forms the new Gen1; Steps 5: Gen3 combines with Gen2 and forms the new Gen2; Steps 6: Gen2 combines with Gen1 and forms the new Gen1; Steps 7: BusB combines with Bus1 and forms the new Bus1; Steps 8: Bus1 combines with Gen1 and forms the new Gen1. From the entire hierarchical cluster process analysis, Bus1 has the lowest similarity to other nodes (the farthest distance to other nodes). Figure.4 shows that the differ- ence of Bus-1 and other Buses is more distinct by cluster analysis based on node negative sequence voltage. So, it can also identify effectively fault location that using cluster analysis based on node negative sequence volt- age. These instances have fully proven that the analysis of fault component (fault section) can be performed by data mining theory. 5. Conclusions and Discussions In the control of electric power systems, especially in the wide area backup protection of electric power systems, the prerequisite of protection device’s accurate, fast and Table 1. The node phase voltages a ,(Fault),1 T,and 1 T0 T2 T 3 five times. T Bus Time T-1 T0 (Fault) T1 T2 T3 Gen1 1.0100 0.7275 0.6924 0.6814 0.6747 Gen2 1.0100 0.8762 0.8476 0.8327 0.8134 Gen3 1.0100 0.8449 0.8071 0.7909 0.7710 Bus1 1.0388 0 0 0 0 Bus2 1.0430 0.7622 0.7350 0.7217 0.7049 Bus3 1.0534 0.7600 0.7275 0.7134 0.6960 BusA 1.0319 0.7540 0.7248 0.7114 0.6944 BusB 1.0222 0.2512 0.2404 0.2356 0.2294 BusC 1.0061 0.2470 0.2381 0.2336 0.2276 Table 2. The node negative sequence voltages at 1 T ,(Fault),1 T,and five times. 0 T2 T3 T Bus Time T-1 T0 (Fault) T1 T2 T3 Gen1 0 0.1330 0.1270 0.1247 0.1227 Gen2 0 0.0556 0.0530 0.0521 0.0512 Gen3 0 0.0742 0.0708 0.0696 0.0684 Bus1 0 0.3408 0.3252 0.3196 0.3142 Bus2 0 0.1058 0.1009 0.0992 0.0975 Bus3 0 0.1168 0.1115 0.1096 0.1077 BusA 0 0.1027 0.0980 0.0963 0.0947 BusB 0 0.2419 0.2309 0.2269 0.2231 BusC 0 0.2287 0.2182 0.2144 0.2108 Figure 3. The dendrogram of cluster analysis based on node phase voltage. Figure 4. The dendrogram of cluster analysis based on node negative sequence voltage. reliable performance is its corresponding fault type and fault location can be discriminated quickly and defined exactly. In our researches, global information has been introduced into the backup protection system. Based on data mining theory, we are using mainly cluster analysis technology to seek after for the characters of electrical quantities’ marked changes. Then, we carry out fast and exact identification of faulty components and faulty sec- tions, and finally accomplish fault analysis. The main technical contributions and innovations in this paper in- clude, introducing global information into electrical en- Y. G. ZHANG ET AL. Copyright © 2009 SciRes. ENGINEERING 215 gineering, developing a new application to fault analysis in electrical engineering. Data mining is defined as the process of automatically extracting valid, novel, potentially useful and ultimately comprehensive information from large databases. It has been widely utilized in both academic and applied scien- tific researches in which the data sets are generated by experiments. The most important characteristic of data mining theory is its interdisciplinarity and universality. Data mining is largely connected with machine learning in which scientists develop algorithms and techniques to find and describe potential laws in data. Generally speak- ing, data mining adds useful techniques to many other fields such as information processing, pattern recognition and artificial intelligence etc. 6. Acknowledgment This research was supported partly by Key Program of National Natural Science Foundation of China (50837002, 50907021) and the Science Foundation for the Doctors of NCEPU. 7. References [1] Y. Shi, “Dynamic data mining on multi-dimensional data,” Ph. D. thesis of State University of New York at Buffalo, 2006. [2] J. W. Han and M. Kamber, “Data mining: Concepts and techniques,” Second Edition, Morgan Kaufmann, Elsevier, San Francisco, 2006. [3] D. Dursun, F. Christie, M. Charles and R. Deepa, “Analysis of healthcare coverage: A data mining approach,” Expert Systems with Applications, Vol. 36, No. 2, pp. 995–1003, 2009. [4] Y. J. Kwon, O. A. Omitaomu, and G. N. Wang, “Data mining approaches for modeling complex electronic circuit design activities,” Computers & Industrial Engineering, Vol. 54, No. 2, pp. 229–241, 2008. [5] K. G. Srinivasa, K. R. Venugopal, and L. M. Patnaik, “A self–adaptive migration model genetic algorithm for data mining applications,” Information Sciences, Vol. 177, No. 20, pp. 4295–4313, 2007. [6] J. Cao, “Principal component analysis based fault dection and isolation”, Ph. D. thesis of George Mason University of Virginia, 2004. [7] J. X. Yuan, “Wide area protection and emergency control to prevent large scale blackout,” China Electric Power Press, Beijing, 2007. [8] L. Ye, “Study on sustainable development strategy of electric power in China in 2020,” Electric Power, Vol. 36, No. 10, pp. 1–7, 2003. [9] Y. S. Xue, “Interactions between power market stability and power system stability,” Automation of Electric Power Systems, Vol. 26, No. 21–22, pp. 1–6, pp. 1–4, 2002. [10] Q. X. Yang, “A review of the application of WAMS information in electric power system protective relaying,” Modern Electric Power, No. 3, pp. 1, 2006. [11] J. Yi and X. X. Zhou, “A survey on power system wide-area protection and control,” Power System Technology, Vol. 30, pp. 7–13, 2006. [12] Y. G. Zhang, P. Zhang, and H.F. Shi, “Statistic character in nonlinear systems,” Proceedings of the Sixth International Conference on Machine Learning and Cybernetics (ICMLC), Hong Kong, Vol. 5, pp. 2598– 2602, August 2007. [13] Y. G. Zhang, C. J. Wang, and Z. Zhou, “Inherent randomicity in 4-symbolic dynamics,” Chaos, Solitons and Fractals, Vol. 28, No. 1, pp. 236–243, 2006. [14] Y. G. Zhang and C. J. Wang, “Multiformity of inherent randomicity and visitation density in n-symbolic dynamics,” Chaos, Solitons and Fractals, Vol. 33, No. 2, pp. 685–694, 2007. [15] R. C. Robinson, “An introduction to dynamical systems: Continuous and discrete,” Pearson Education, New Jersey, 2004. |