Paper Menu >>
Journal Menu >>
![]() Journal of Intelligent Learning Systems and Applications, 2010, 2: 1-11 doi:10.4236/jilsa.2010.21001 Published Online February 2010 (http://www.SciRP.org/journal/jilsa) Copyright © 2010 SciRes JILSA 1 A New Neural Network Structure: Node-to-Node-Link Neural Network S. H. Ling Centre for Health Technologies, University of Technology Sydney, Sydney, Australia Email: steve.ling@uts.edu.au Received October 17th, 2009; accepted January 8th, 2010. ABSTRACT This paper presents a new neural network structure and namely node-to-node-link neural network (N-N-LNN) and it is trained by real-coded genetic algorithm (RCGA) with average-bound crossover and wavelet mutation [1]. The N-N-LNN exhibits a node-to-node relationship in the hidden layer and the network parameters are variable. These characteristics make the network adaptive to the changes of the input environment, enabling it to tackle different input sets distributed in a large domain. Each input data set is effectively handled by a corresponding set of network parame- ters. The set of parameters is governed by other nodes. Thanks to these features, the proposed network exhibits better learning and generalization abilities. Industrial application of the proposed network to hand-written graffiti recognition will be presented to illustrate the merits of the network. Keywords: Genetic Algorithm, Hand-Written Graffiti Recognition, Neural Network 1. Introduction Neural networks can approximate any smooth and con- tinuous nonlinear functions in a compact domain to an arbitrary accuracy [2]. Three-layer feed-forward neural networks have been employed in a wide range of appli- cations such as system modelling and control [2], load forecasting [3–5] prediction [6], recognition [1,4,5,7–10], etc. Owing to its specific structure, a neural network can realize a learning process [2]. Learning usually consists of two steps: designing the network structure and defin- ing the learning process. The structure of the neural net-work affects the non-linearity of its input-output rela- tionship. The learning algorithm governs the rules to op- timize the connection weights. A typical structure has a fixed set of connection weights after the learning process. However, a fixed set of connection weights may not be suitable to learn the information behind the data that are distributed in a vast domain separately. Traditionally, two major classes of learning rules, namely the error correction rules [11] and gradient meth- ods [2], were used. The error correction rules [11], such as the -LMS algorithm, perception learning rules and May’s rule, adjust the network parameters to correct the network output errors corresponding to the given input patterns. Some of the error correction rules are only ap- plicable to linear separable problems. The gradient rules [2], such as the MRI, MRII, MRIII rules and back-propagation techniques, adjust the network pa- rameters based on the gradient information of the cost function. One major weakness of the gradient methods is that the derivative information of the cost function is needed, meaning that it has to be continuous and differ- entiable. Also, the learning process is easily trapped in a local optimum, especially when the problem is multimo- dal and the learning rules are network structure depend- ent. To tackle this problem, some global search evolu- tionary algorithms [12], such as the real-coded genetic algorithm (RCGA) [1,13], is more suitable for searching in a large, complex, non-differentiable and multimodal domain. Recently, neural or neural-fuzzy networks train- ed with RCGA are reported [5,6,14]. The same GA can be used to train many different networks regardless of whether they are feed-forward, recurrent, wavelet or of other structure types. This generally saves a lot of human efforts in developing training algorithms for different types of networks. In this paper, modifications are made to neural net- works such that the parameters of the activation func- tions in the hidden layer are changed according to the network inputs. To achieve this, node-to-node links are introduced in the hidden layer. The node-to-node links interconnect the hidden nodes with connection weights. The structure of the N-N-LNN is shown in Figure 1. ![]() A New Neural Network Structure: Node-to-Node-Link Neural Network 2 Conceptually, the introduction of the node-to-node links increases the degree of freedom of the network model. It should be noted that the way to realize the node-to-node links is also governed by the tuning algorithm. The re- sulting neural net-work is found to have better learning and generalization abilities. The enhancements are due to the fact that the parameters in the activation functions of the hidden nodes are allowed to change in order to cope with the changes of the network inputs of different oper- ating sub-domains. As a result, the N-N-LNN seems to have a dedicated neural network to handle the inputs of different operating sub-domain. This characteristic is good for tackling problems with input data sets distrib- uted in a large spatial domain. In this paper, hand-written graffiti recognition (which is a pattern recognition pro- blem with a large number of data set) is given to show the merit of the proposed network. The proposed network is found to perform well experimentally. This paper is organized as follows: The N-N-LNN will be presented in Section 2. In Section 3, the training of the parameters of the N-N-LNN using RCGA [1] will be discussed. The application example on hand-written graffiti recognition system will be given in Section 4. A conclusion will be drawn in Section 5. 2. Node-to-Node Link Neural Network Model A neural network with node-to-node relations between nodes in the hidden layer is shown in Figure 1. An in- ter-node link with weight i m ~ is connected from the ()-th node to the i-th node. Similarly, an inter-node link with weight m di i r ~ is connected from the (r di )-th node to the i-th node, i = 1, 2, …, nh.. and are the node-to-node distance, i.e. if =2, the link with weight m dr d m d 3 ~ m will be connected from node 5 to node 3. Similarly, if = 3, the link with weight r d6 ~ r will be connected from node 3 to node 6. As a result, the total number of node-to-node links is 2nh, where nh is the total number of hidden nodes. An example of the node-to-node link connections is shown in Figure 2. The node-to-node relationship enhances the degree of free- dom of the neural network model if it is made adaptive to the changes of the inputs. Consequently, the learning and the generalization abilities of the N-N-LNN can be in- creased. Figure 3 illustrates the inadequacy of a traditional neural network. In this figure, S1 and S2 are the two sets of data in a spatial domain. To solve a mapping problem using a neural network, the weight of the network can be trained to minimize the error between the network out- puts and the desired values. However, the two data sets z1(t) z2(t) )(tz in n tf1(.) tf1(.) tf1(.) tf2(.) tf2(.) tf2(.) y1(t) y2(t) )( t yout n )1( 11 w )1( 21 w )1( 1 h n w)1( 12 w )1( 22 w )1( 2 h n w )1( inhnn w )1( 1in n w )1( 2in n w )2( 11 w )2( 21 w )2( 1 out n w )2( 12 w )2( 22 w )2( 1h n w )2( 2h n w )2( houtnn w Hidden LayerInput LayerOutput Layer )2( 2 out n w 1 b1 b2 out n b 1 ~ m 1 ~ r m d m1 ~ r d r1 ~ 2 ~ r m d m2 ~ r d r2 ~ 2 ~ m h n r ~ mhdn m ~ rh dn r ~ h n m ~ Figure 1. Variable node-to-node link neural network 2 ~ m 4 ~ m 5 ~ m 6 ~ m 1 ~ m 3 ~ m 1 ~ r 2 ~ r 3 ~ r 4 ~ r 5 ~ r 6 ~ r Figure 2. Example node-to-node link connections in the hidden layer (number of hidden node = 6, =2, = 3) m dr d are separated too far apart for a single neural network to Copyright © 2010 SciRes JILSA ![]() A New Neural Network Structure: Node-to-Node-Link Neural Network3 model. As a result, the neural network may only model the data set S (average of S1 and S2) after the training (unless we employ a large number of network parame- ters.) To improve the learning and generalization abilities of the neural network, the proposed N-N-LNN adopts a structure as shown in Figure 4. It consists of two units, namely the parameters-set (PS) and the data-processing (DP) neural networks. The PS is realized by the node- to-node links which store the parameters (m, r are the parameters which will be described later) governing how the DP neural network handles the input data. Referring back to Figure 3, when the input data belongs to S1, the PS will provide the parameters (network parameters cor- responding to S1) for the DP neural network to handle the S1 data. Similarly, when the input data belongs to S2, the DP neural network will obtain another set of parame- ters to handle them. In other words, it operates like two individual neural networks handling two different sets of input data. Consequently, the proposed N-N-LNN is suitable for handling large numbers of data. Referring to Figure 1, denotes the input vector; nin denotes the number of input nodes; t denotes the current input number which is a non-zero integer; , i = 1, 2, …, nh; j = 1, 2, …, nin, denote the connection weights between the j-th node of the input layer and the i-th node of the hidden layer; nh denotes the number of hidden nodes; , k = 1, 2, …, nout; i = 1, 2, …, nh, denote the connection weights be- tween the i-th node of the hidden layer and the k-th node of the output layer; nout denotes the number of output nodes. )()()()( 21 tztztztz in n )2( ki w )1( ij w i m ~ and i r ~ are the connection weights of the links between hidden nodes (there are 2nh inter-node links); is the node-to-node distance between the ()-th node and the i-th node, is the node-to-node distance between the i-th node and the ()-th node. bk denotes the bias of the output nodes; tf1() and tf2() denote the activation functions of the hid- den and output nodes respectively. m d m di r di r d ()t y denotes the output vector. The input-output relationship of the proposed neural network is governed by the following equation: 12 y()yt() ()t t out n y h i n ikskik btfwtfty 1 )2( 2))(()( z, k = 1, 2, …, nout (1) Figure 5 shows the proposed neuron at node i of the hidden layer. Its output is given by )( i s f trtmttftfiiisi ~ , ~ ,))((1 z, i = 1, 2, …, nh (2) in n jjiji tzwt 1 )1( )( , (3) S2 S1 z1 z2 S Figure 3. Diagram showing two sets of data in a spatial do- main Parameters-Set (PS) Data-Processing (DP) Neural Network Input Output Figure 4. Proposed structure of the neural network in m n jjjdiii tzwmtm1 )1( )( ~~, (4) in r n jjjdiii tzwrtr 1 )1( )( ~~ , (5) where and are parameters to be trained. Refer- ring to Figure 4, these parameters are stored in the PS. i mi r otherwise for ~ )1( )1( )1( jdi hmjndi jdi m hm mw ndiw w, (6) otherwise 1for ~ )1( )1( )1( jdi rjdin jdi r rh rw diw w, (7) trtmttf iii ~ ~ , 1, 1 1 2 2 ~ 2 ~ tr tmt i ii e 111 1 2 2 1 )1( 1 )1( 1 )1( )( ~ 2 )( ~ )( in n jj j r di i in n jj j m di i in n jj ij tzwr tzwmtzw e , (8) tf2() can be any commonly used activation function such as the purely linear, hyperbolic tangent sigmoid, or loga- rithmic sigmoid functions [2,11]. As mentioned earlier, the node-to-node links enhance the degree of freedom of the modelled function. In each neuron of the hidden layer, the input from the lower neighbour’s output ( tmi ~ ) in- fluences the bias term while the input from the upper Copyright © 2010 SciRes JILSA ![]() A New Neural Network Structure: Node-to-Node-Link Neural Network 4 neighbour’s output ( tri ~ ) influences the sharpness of the edges of the hyper-planes in the search space. It can be seen from (8) that the proposed activation function 1 tf is characterized by the varying mean ( tmi ~ ) and the varying standard deviation ( tri ~ ) respectively. Their values will be changed according to changes in the net- work inputs. Figure 6 shows that the means control the bias while Figure 7 shows that the standard deviations control the sharpness. Referring to Figure 3, when the input data belongs to S1, the corresponding t i will drive the other nodes (with i m m d ~ and r d i r ~ ) to manipulate the characteristics of the S1 data. Similarly, when the input data belongs to S2, the corresponding t i will drive the other nodes to handle it accordingly. Figure 8 explains the operating principle of the proposed neuron. In this figure, P1, P2, and P3 are three sets of input patterns. , , and are the inputs from the upper neighbour with the corresponding input patterns. Simi- larly, , , and are the inputs from the lower neighbour with the corresponding input patterns. When the proposed neuron manipulates the input pattern P1, the shape of the activation function is characterized by and , and eventually outputs the pattern P’1. Simi- larly, when the neuron manipulates the input pattern P2, the shape of the activation function is characterized by and . So, the activation function is variable and is dynamically dependent on the input pattern. Hence, the degree of freedom of the modelled function is in- creased. Comparing with the conventional feed-forward neural network, the N-N-LNN should be able to offer a better performance. In the N-N-LNN, the values of the parameters , , mi, ri, bk, and are trained by an improved RCGA [1]. 1 ˆr P 1ˆm P 1 ˆm P ( ij w 2 ˆr P 2 2 )1 3 ˆr P 3 ˆm P ˆm ˆm P )2( ki w 1 ˆr P r size P 2 ˆr P m pop d d _ 3. Network Parameters Tuned by Real-Coded Genetic Algorithm In this paper, all parameters of the neural networks are trained by the improved RCGA with average-bound crossover and wavelet mutation [1]. The RCGA process is as follows: First, a set of population of chromosomes P = [P1, P2, …, Ppop_size] is created (where is the number of chromosomes in the population). Each chromosome p contains some genes (variables). Second, the chromosomes are evaluated by a defined fitness fun- ction. The better chromosomes will return higher values in this process. The form of the fitness function depends on the application. Third, some of the chromosomes are selected to undergo genetic operations for reproduction by the method of normalized geometric ranking. Fourth, z1 z2 w1 w2 i r ~ r di r ~ m di m ~ i m ~ i s f 1 tf in n z in n w i Figure 5. Proposed neuron at node i of the hidden layer -20 -15 -10-505 10 1520 -1 -0.8 -0.6 -0.4 -0.2 0 0. 2 0. 4 0. 6 0. 8 1 0 2 4 -2 -4 Figure 6. Samples of the activation function 1 tf of the proposed neuron with different i m ~ (0 ~ i r) -20 -15 -10 -5 0510 15 20 -1 -0.8 -0.6 -0.4 -0.2 0 0. 2 0. 4 0. 6 0. 8 1 4 2 1.5 1.3 1.1 0.9 0.7 0.5 0.3 0.1 Figure 7. Samples of the activation function 1 tf of the proposed neuron with different (). i r ~0 ~ i m genetic operations of crossover are performed. The crossover operation is mainly for exchanging information between the two parents that are obtained by the selec- tion operation. In the crossover operation, one of the pa- rameters is the probability of crossover c which gives us the expected number of chromosomes that undergo Copyright © 2010 SciRes JILSA ![]() A New Neural Network Structure: Node-to-Node-Link Neural Network5 P1 P2 5 ~ r r d r 5 ~ m d m 5 ~ 5 ~ m 5 P'1 node 5 P3 P'2 P'3 1 ˆ r P 2 ˆ r P 3 ˆ r P 1 ˆ m P2 ˆ m P 3 ˆ m P Figure 8. Operating example of the proposed neuron with 3 set of data patterns for hidden node 5 the crossover operation. Crossover operation in [1] is described as follows: 1) four chromosomes (instead of two in the conventional RCGA) will be generated; 2) the best two offspring in terms of the fitness value are se- lected to replace their parents. The crossover operation is called the average-bound crossover (ABX), which com- bines the average crossover and bound crossover. The average crossover manipulates the genes of the selected parents, the minimum, and the maximum possible values of the genes. The bound crossover is capable of moving the offspring near the domain boundary. On realizing the ABX operation, the offspring spreads over the domain so that a higher chance of reaching the global optimum can be obtained. After the crossover operation, the mutation operation follows. It operates with the parameter of the probability of mutation (m ), which gives an expected number ( pop_size m no_vars) of genes that undergo the mutation. The mutation operation is for changing the genes of the chromosomes in the population such that the features inherited from their parents can be changed. The mutation operation is called the wavelet mutation (WM), which applies the wavelet theory to realize the mutation. Wavelet is a tool to model seismic signals by combining dilations and translations of a simple, oscillatory function (mother wavelet) of a finite duration. The wavelet func- tion has two properties: 1) the function integrates to zero, and 2) it is square integrable, or equivalently has finite energy. Thanks to the properties of the wavelet, the con- vergence and solution stability are improved. After going through the mutation operation, the new offspring will be evaluated using the fitness function. The new population will be reproduced when the new offspring replace the chromosomes with the smallest fitness value. After the operations of selection, crossover and mutation, a new population is generated. This new population will repeat the same process iteratively until a defined condition is met. One superior characteristic of RCGA is that the de- tailed information of the nonlinear system to be opti- mized, e.g. the derivative of the cost function, need not been known. Hence, RCGA is suitable for handling complex optimization problems. In this paper, RCGA is employed to optimize the fitness function characterized by the network parameters of the N-N-LNN. The fitness function is a mathematical expression that quantitatively measures the performance of the RCGA tuning process. A larger fitness value indicates a better tuning perform- ance. By adjusting the values of the network parameters, the fitness function is maximized (the cost value is minimized) based on the RCGA. During the tuning process, offspring with better fitness values evolve. The mutation operation will contract gradually with respect to the iteration number. After the tuning process, the ob- tained network parameter values will be used by the proposed neural network. As the proposed neural net- work is a feed-forward one, the outputs are bounded if its inputs are bounded, which happens for most of the real-life applications. Consequently, no convergence problem is present for the neural network itself. The input-output relationship of the proposed N-N- LNN can be described by, )()( tgt dd zy , t = 1, 2, …, ., (9) d n where )()()()( 21 tztztzt d n ddd in z and )()()()( 21tytytyt d n ddd out y are the given inputs and the desired outputs of an un- known nonlinear function respectively; de- notes the number of input-output data pairs. The fitness function of the RCGA depends on the application. The most common fitness function is given by, )(gd n err fitness 1 1, (10) where err is the error. The objective is to maximize the fitness value of (10) (minimize err) using RCGA by setting the chromosome to be rmiikkiij ddrmbww 21 para n m r d for all i, j and k. The range of the fitness value of (10) is (0,1]. After training, the values of these network parameters will be fixed during the operation. The total number of tuned parameters () of the proposed N-N-LNN is the sum of the number of parameters between the input and hid- den layers, the number of parameters between the hidden and output layers, and the number of parameters for, , , . Hence, i m i r d 221 hhouthinpara nnnnnn , 22 outhoutin nnnn (11) Copyright © 2010 SciRes JILSA ![]() A New Neural Network Structure: Node-to-Node-Link Neural Network 6 4. Industrial Application and Results In this section, industrial application example will be given to illustrate the merits of the proposed network. The application is on hand-written graffiti recognition. A hand-written graffiti pattern recognition problem is used to illustrate the superior learning and generalization abilities of the proposed network on a classification problem with a large number of input data sets. In gen- eral, the neural network approaches are model-free. Dif- ferent kinds of neural model applied for hand-written recognition system are reported in [8,10,12,15,16]. 4.1 Neural Network Based Hand-Written Graffiti Recognition System In this example, the digits 0 to 9 and three control cha- racters (backspace, carriage return and space) are rec- ognized by the N-N-LNN. These graffiti are shown in Figure 9. A point in each graffiti is characterized by a number based on the x-y coordinates on a writing area. The size of the writing area is xmax by ymax. The bottom left corner is set as (0, 0). Ten uniformly sampled points of the graffiti will be taken in the following way. First, the input graffiti is divided into 9 uniformly distanced segments characterized by 10 points, including the start and the end points. Each point is labeled as (xi, yi), i = 1, 2, …, 10. The first 5 points, (xi, yi), i = 1, 3, 5, 7 and 9 taken alternatively are converted to 5 numbers zi respec- tively by using the formula zi = xixmax+ y i. The other 5 points, (xi, yi), i = 2, 4, 6, 8 and 10 are converted to 5 numbers respectively by using the formula zi = yiymax+ xi. These ten numbers, zi, i = 1, 2, …, 10, are used as the inputs of the proposed graffiti recognizer. The hand-written graffiti recognizer as shown in Figure 10 is proposed. Its inputs are defined as follows, )( )( )( t t tz z z, (12) where )()()()( 1021tztztzt z Digits/ Chars Strokes Digits/ Chars Strokes Digits/ Chas Strokes 0(a) 5(a) 9 0(b) 5(b) Back-space 1 6 Carriage Return 2 7 Space 3 8(a) 4 8(b) Figure 9. Graffiti digits and characters (with the dot indi- cating the starting point of the graffiti) Neural Network with Adaptive Activation Functions )( 1tz )( 2tz )( 10 tz )( 1 t y )( 2ty )( 16 ty Graffiti Determiner Output NNAAF Figure 10. Graffiti digits and characters (with the dot indi- cating the starting point of the graffiti) denotes the normalized input vectors of the proposed graffiti recog- nizer; denotes the ten points in the writing area; )()()()( 1021 tztztzt z denotes the l2 vector norm. The 16 outputs, , k = 1, 2, …, 16, indicates the similarity between the input pattern and the 16 standard patterns respectively. The input-output relationship of the training patterns is defined such that the output )(tyk 1)( tyi and others are zero when the input vector belongs to pat- tern i, i = 1, 2, …, 16. For example, the desired outputs of the pattern recognition system are [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] for the digit “0(a)”, [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0] for the digit “0(b)”, and [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1] for the character “space” respectively. After training, a graffiti determiner is used to determine the output of the graffiti. A larger value of implies that the input pattern matches more closely to the corresponding graf- fiti pattern. For instance, a large value of implies that the input pattern is recognized as “0”. )(ty j )( 0ty 4.2 Results and Analysis To train the neural network of the hand-written graffiti recognition system, a set of training pattern governing the input-output relationship will be used. 1600 training patterns (100 patterns for each graffiti) will be used in this example. The training patterns consist of the input vectors and its corresponding expected output. The fit- ness function is given by (10), with 16 1 100 1 2 10016 )( )( )( )( k td d kk t ty t ty err y y , (13) Copyright © 2010 SciRes JILSA ![]() A New Neural Network Structure: Node-to-Node-Link Neural Network Copyright © 2010 SciRes JILSA 7 where )()()()( 1621tytytyt ddddy Table 1. Training results on doing the hand-written graffiti recognition denotes the expected output vector and )()()()( 1621 tytytyt y 18 h n 20 h n 22 h n24 h n para n 520 576 632 688 MSE 0.01850.0157 0.0169 0.0179Ave. training Acc. 96.50% 97.38% 96.85% 96.62% MSE 0.01680.0139 0.0145 0.0143 N-N-LNN Best trainingAcc.96.88% 98.06% 97.31% 97.38% para n 1004 1112 1220 1328 MSE0.03370.0328 0.0314 0.0322Ave. training Acc. 92.46% 92.62% 93.25% 93.18% MSE0.03090.0293 0.0282 0.0288 FSNLS Best training Acc. 93.40% 93.75% 94.00% 93.86% para n 486 540 594 648 MSE0.03490.0321 0.03160.0309 Ave. training Acc. 92.35% 92.42% 93.10%93.23% MSE0.03150.0292 0.02800.0278 WNN Best training Acc. 93.31% 93.81% 94.00%94.13% para n 502 556 610 664 MSE0.03930.0385 0.03800.0360 Ave. trainingAcc.90.17% 90.46% 90.73%91.50% MSE0.03700.0388 0.03610.0326 FFCNN Best trainingAcc.90.50% 91.69% 92.56%93.06% is the actual network output defined as, h i n ikskik btfwtfty 1 )2( 2))(()( z, k = 1, 2, …, 16, (14) where trtmtzwtftzf ii jjijsi ~ , ~ ,)())(( 10 1 )1( 1, i = 1, 2, …,nh, (15) where tf2() is a pure linear transfer function in this app- lication. For comparison purposes, a conventional 3-layer fully connected feed-forward neural network (FFCNN) [11], a fixed-structure network with link switches (FSNLS) [6], and a wavelet neural network (WNN) [14, 17–18 ] (which combines feed-forward neural networks with the wavelet theory, providing a multi-resolution approximation for discriminate functions) trained by the improved RCGA [1] are also used in this example. For all cases, the initial values of the parameters of the neural network are ran- domly generated. In this application, the lower and upper bounds of the network parameters of the N-N- LNN are , and . For the FSNLS, WNN and FFCNN, the network parameters are ranged from –4 to 4. The number of iterations to train the neural networks is 15000. For the RCGA [1], the probability of crossover ( 44 )2()1( ikkiij mbww 11 hrm nd c 25.0 i r d ) and the probability of mutation (m ) are 0.8 and 0.05 respectively; the weights of the average-bound crossover and are set at 0.5 and 1 respectively; the shape parameter of wavelet mutation a wb w is 2, and the population size is 50. All the results are the averaged ones out of 20 runs. In order to test the generalization ability of the proposed neural networks, a set of testing patterns consisting of 480 input patterns (30 patterns for each graffiti) is used. Table 2. Training results on doing the hand-written graffiti recognition 18 h n 20 h n 22 h n24 h n para n 520 576 632 688 MSE520 576 632 688 Ave. trainingAcc. 0.02280.0186 0.0199 0.0211 MSE95.21% 96.96% 95.49% 95.40% N-N-LNN Best training Acc.0.02040.0171 0.01850.192 para n 95.93% 97.29% 96.25% 96.05% MSE1004 1112 1220 1328 Ave. trainingAcc. 0.03630.0350 0.0331 0.0349 MSE92.28% 92.50% 93.21%93.00% FSNLS Best trainingAcc. 0.03300.0322 0.0310 0.0312 para n 92.92% 93.13% 93.96%93.75% MSE486 540 594 648 Ave. trainingAcc. 0.03650.0346 0.03440.0329 MSE92.08% 92.22% 92.71%93.92% WNN Best trainingAcc. 0.03290.0320 0.03220.0308 para n 92.59% 93.54% 93.75%94.38% MSE502 556 610 664 Ave. trainingAcc.0.04100.0404 0.03930.0374 MSE90.07% 90.58% 90.68%91.25% FFCNN Best trainingAcc.0.04040.0393 0.03880.0361 The average training, best training, average testing and best testing results in terms of mean square error (MSE), and the recognition accuracy rate of all approaches are summarized in Table 1 and Table 2. It can be seen from these two tables that the recognition system implemented by the N-N-LNN outperforms those by the FSNLS, WNN and FFCNN. The best results are achieved when the number of hidden nodes (nh) is set at 20 for the N-N-LNN, nh = 22 for the FSNLS, and nh = 24 for the WNN and FFCNN. In comparison with the FSNLS, WNN and FFCNN, the average training and testing er- rors of N-N-LNN at nh = 20 are 0.0157 and 0.0186 re- spectively. They imply 77.96% improvement over FSNLS at nh = 22, 96.82% and 76.90% improvement over WNN at nh = 24, and 129.3% and 101.1% im- provement over FFCNN at nh = 24, respectively. In terms of the average testing recognition accuracy rate, the N-N-LNN (96.96%) gives a better result than the FSNLS (93.21%), WNN (93.92%) and FFCNN (91.25%). Figure 11 shows the selected output values of the ![]() A New Neural Network Structure: Node-to-Node-Link Neural Network 8 N-N-LNN 030 6090 120 150 180 210240 270300 330 360 390 420450 480 -0.5 0 0.5 1 pattern number output y 030 6090120 150 180 210 240 270 300 330 360390 420 450480 -0.5 0 0.5 1 pattern number output y FSNLS 030 60 90 120 150 180 210 240 270 300 330360 390 420450 480 -0. 5 0 0. 5 1 pattern number output y 030 60 90120 150180 210 240 270 300330 360390 420 450480 -0. 5 0 0.5 1 pattern number output y WNN 030 6090 120 150 180210240 270300 330360 390 420450 480 -0.5 0 0.5 1 pattern number output y 030 6090120 150 180 210 240 270 300 330 360390 420 450480 -0.5 0 0.5 1 pattern number output y FFCNN 030 6090 120 150 180210240 270 300 330360 390 420450 480 -0.5 0 0.5 1 pattern number output y 030 60 90120150180 210240 270 300330 360390 420 450480 -0.5 0 0.5 1 pattern number output y (a). Digit ‘0’(b). (b). Digit ‘6’. Figure 11. Output values of the N-N-LNN, FSNLS, WNN, and FFCNN for the 480 (30 for each type) testing graffiti patterns Copyright © 2010 SciRes JILSA ![]() A New Neural Network Structure: Node-to-Node-Link Neural Network Copyright © 2010 SciRes JILSA 9 N-N-LNN 030 6090120 150180 210240 270 300330 360390 420 450480 -0.5 0 0.5 1 pattern number output y 030 6090120 150180 210240 270 300330 360390 420 450480 -0.5 0 0.5 1 pattern number output y FSNLS 030 60 90120 150 180 210240 270 300330 360390 420 450480 -0.5 0 0.5 1 pattern number output y 030 6090120 150180 210240 270 300330 360390 420 450480 -0.5 0 0.5 1 pattern number output y WNN 030 60 90120 150 180 210240 270 300330 360390 420 450480 -0.5 0 0.5 1 pattern number output y 030 6090120 150180 210240 270 300330 360390 420 450480 -0.5 0 0.5 1 pattern number output y FFCNN 030 6090120 150 180 210 240 270 300 330 360390 420 450480 -0.5 0 0.5 1 pattern number output y 030 6090120 150180 210240 270 300330 360390 420 450480 -0.5 0 0.5 1 pattern number output y (c). Digit ‘carriage return’. (d). Digit ‘space’.. Figure 11 (continued). Output values of the N-N-LNN, FSNLS, WNN, and FFCNN for the 480 (30 for each type) testing graf- fiti patterns ![]() A New Neural Network Structure: Node-to-Node-Link Neural Network 10 N-N-LNN, FSNLS, WNN and FFCNN for the 480 (30 for each digit/character) testing graffiti. In this figure, the x-axis represents the pattern number for corresponding digit/character. The pattern numbers 1 to 30 are for the digit “0(a)”, the numbers 31-60 are for the digit “0(b)”, and so on. The y-axis represents the output yi. As men- tioned before, the input-output relationship of the pat- terns will drive the output and other outputs are zero when the input vector belongs to pattern i, i = 1, 2, …, 16. For instance, the desired output y of the pattern recognition system is [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0] for 1)( tyi digit “0(b)”. Referring to Figure 11(d), we can see that the output y16 of the N-N-LNN for pattern numbers within 451-480 (the character “space”) is nearest to 1 and other outputs are nearest to zero. It shows that the recog- nition accuracy rate achieved by the N-N-LNN is good. 5. Conclusions A new neural network has been proposed in this paper. The parameters of the proposed neural network are trained by the RCGA. In this topology, the parameters of the activation function in the hidden nodes are changed according to the input to the network and the outputs of other hidden-layer nodes in the network. Thanks to the variable property and the node-to-node links in the hid- den layer, the learning and generalization abilities of the proposed network have been increased. Application on hand-written graffiti recognition has been given to illus- trate the merits of the proposed N-N-LNN. The proposed network is effectively an adaptive network. By adaptive, we mean the network parameters are variable and depend on the input data. For example, when the proposed neu- rons of the N-N-LNN manipulate an input pattern, the shapes of the activation functions are characterized by the inputs from the upper and lower neighbour’s outputs, which depend on the input pattern itself. In other words, the activation functions, or the parameters of the N-N- LNN, are adaptively varying with respect to the input patterns to produce the outputs. All network parameters of the N-N-LNN depend only on the present state. That means the network is a feed-forward one, causing no stability problem to the network dynamics. REFERENCES [1] S. H. Ling and F. H. F Leung, “An improved genetic algorithm with average-bound crossover and wavelet mutation operations,” Soft Computing, Vol. 11, No.1, pp. 7–31, January 2007. [2] F. M. Ham and I. Kostanic, “Principles of neurocomput- ing for science & engineering,” McGraw Hill, 2001. [3] R. C. Bansal and J. C. Pandey, “Load forecasting using artificial intelligence techniques: A literature survey,” In- ternational Journal of Computer Applications in Tech- nology, Vol. 22, Nos. 2/3, pp. 109–119, 2005. [4] S. H. Ling, F. H. F. Leung, H. K. Lam, and P. K. S. Tam, “Short-term electric load forecasting based on a neural fuzzy network,” IEEE Transactions on Industrial Elec- tronics, Vol. 50, No. 6, pp. 1305–1316, December 2003. [5] S. H. Ling, F. H. F. Leung, L. K. Wong, and H. K. Lam, “Computational intelligence techniques for home electric load forecasting and balancing,” International Journal Computational Intelligence and Applications, Vol. 5, No. 3, pp. 371–391, 2005. [6] F. H. F. Leung, H. K. Lam, S. H. Ling, and P. K. S. Tam, “Tuning of the structure and parameters of neural network using an improved genetic algorithm,” IEEE Transactions on Neural Networks, Vol. 14, No. 1, pp. 79–88, January 2003. [7] K. F. Leung, F. H. F. Leung, H. K. Lam, and S. H. Ling, “On interpretation of graffiti digits and commands for eBooks: Neural-fuzzy network and genetic algorithm app- roach,” IEEE Transactions on Industrial Electronics, Vol. 51, No. 2, pp. 464–471, April 2004. [8] D. R. Lovell, T. Downs, and A. C. Tsoi, “An evaluation of the neocognitron,” IEEE Transactions on Neural Net- works, Vol. 8, No. 5, pp. 1090–1105, September 1997. [9] Z. Michalewicz, “Genetic algorithm + data structures = evolution programs,” 2nd extended edition, Springer-Ver- lag, 1994. [10] C. A. Perez, C. A. Salinas, P.A. Estévez, and P. M. Valenzuela, “Genetic design of biologically inspired re- ceptive fields for neural pattern recognition,” IEEE Transactions on Systems, Man, and Cybernetics — Part B: Cybernetics, Vol. 33, No. 2, pp. 258–270, April 2003. [11] B. Widrow and M. A. Lehr, “30 years of adaptive neural networks: Perceptron, madaline, and backpropagation,” Proceedings of the IEEE, Vol. 78, No. 9, pp. 1415–1442, September 1990. [12] R. Buse, Z. Q. Liu, and J. Bezdek, “Word recognition using fuzzy logic,” IEEE Transactions on Fuzzy Systems, Vol. 10, No. 1, February 2002. [13] L. Davis, “Handbook of genetic algorithms,” NY: Van Nostrand Reinhold, 1991. [14] X. Yao, “Evolving artificial networks,” Proceedings of the IEEE, Vol. 87, No. 7, pp. 1423–1447, 1999. [15] P. D. Gader and M. A. Khabou, “Automatic feature gen- eration for handwritten digit recognition,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence, Vol. 18, No. 12, pp. 1256–1261, December 1996. [16] L. Holmström, P. Koistinen, J. Laaksonen, and E. Oja, “Neural and statistical classifiers — Taxonomy and two case studies,” IEEE Transactions on Neural Networks, Vol. 8, No. 1, pp. 5–17, January 1997. [17] J. Zhang, G. G. Walter, Y. Miao, and W. W. N. Lee, “Wavelet neural networks for function learning,” IEEE Transactions on Signal Processing, Vol. 43, No. 6, pp. 1485–1497, June 1995. [18] B. Zhang, M. Fu, H. Yan, and M. A. Jabri, “Handwritten digit recognition by adaptive-subspace self-organizing map (ASSOM),” IEEE Transactions on Neural Networks, Copyright © 2010 SciRes JILSA ![]() A New Neural Network Structure: Node-to-Node-Link Neural Network Copyright © 2010 SciRes JILSA 11 Vol. 10, No. 4, pp. 939–945, July 1997. [19] C. K. Ho and M. Sasaki, “Brain-wave bio potentials based mobile robot control: Wavelet-neural network pat- tern recognition approach,” in Proceedings IEEE Interna- tional Conference on System, Man, and Cybernetics, Vol. 1, pp. 322–328, October 2001. [20] S. Yao, C. J. Wei, and Z. Y. He, “Evolving wavelet neu- ral networks for function approximation,” Electron Letter, Vol.32, No.4, pp. 360–361, February 1996. [21] Q. Zhang and A. Benveniste, “Wavelet networks,” IEEE Transactions on Neural Networks, Vol. 3, No. 6, pp. 889–898, November 1992. |