Function Approximation Using Robust Radial Basis Function Networks

doi:10.4236/jilsa.2011.31003

Paper Menu >>

Journal Menu >>

Journal of Intelligent Learning Systems and Applications, 2011, 3, 17-25

doi:10.4236/jilsa.2011.31003 Published Online February 2011 (http://www.SciRP.org/journal/jilsa)

Function Approximation Using Robust Radial

Basis Function Networks

Oleg Rudenko, Oleksandr Bezsonov

The Department of Computer Engineering and Control, Kharkov National University of Radio Electronics, Kharkiv, Ukraine.

Email: o.bezsonov@gmail.com

Received April 9th, 2010; revised July 20th, 2010; accepted August 4th, 2010

ABSTRACT

Resistant training in radial basis function (RBF) networks is the topic of this paper. In this paper, one modification of

Gauss-Newton training algorithm based on the theory of robust regression for dealing with outliers in the framework of

function approximation, system identification and control is proposed. This modification combines the numerical ro-

bustness of a particular class of non-quadratic estimators known as M-estimators in Statistics and dead-zone. The al-

gorithms is tested on some examples, and the results show that the proposed algorithm not only eliminates the influence

of the outliers but has better convergence rate then the standard Gauss-Newton algorithm.

Keywords: Neural Network, Robust Training, Basis Function, Dead Zone

1. Introduction

Function approximation involves estimating (approxi-

mating) the underlying relationship from a given finite

input-output data set







yxf x



 (1)

where 1M



 is an input vector;



 is the arbi-

trary nonlinear fun ction , unknown in the general case;



is the unobserved disturbance with unknown characteris-

tics; has been the fundamental problem for a variety of

applications in system identification, pattern classifica-

tion, data mining and signal reconstruction [1-4].

Feedforward neural networks such as multilayer per-

ceptrons (MLP) have been widely used as an approach to

function approximation since they provide a generic

black-box functional representation and have been shown

to be capable of approximating any continuous function

defined on a compact set in

R with arbitrary accuracy

[5-7].

It has been proved that a radial basis function network

(RBF) can approximate arbitrarily well any multivariate

continuous function on a compact domain if a sufficient

number of ra dial basis functi on units are gi ven [8].

In contrast to MLPs, RBF networks use a localized re-

presentation of information. The RBF network requires

less computation time for the learning and more compact

topology than MLP. The network can be co nfigured with

one radial basis function centre at each training data

point. Thus, the complexity of the network is of the same

order as the dimensionality of the training data and the

network has a poor generalization capability. The RBF

decomposition of





x is

 

ˆ,,

xwxrwxr







 (2)

where 1N



 is a vector of linear weights, 1







is a vector of RBFs and r is a distance.

An important advantage of RBFN from viewpoint of

practitioners is, therefore, clear and understandable in-

terpretation of the functionality of basis functions.

The traditional RBF basis function is defined by Euc-

lidian distance

rxt



 and Gaussian activation

function by









exp 0.5

ji E





 , where i

is the

input sample number i, j

t is the center of j-th radial

basis function (radii),



is the standard deviation. If we

use the Mahalanobis distance







ij ij

rxtRxt



 

where 1k











 is weight matrix, M is the dimension

of input vector i

, N is the number of neurons, for the

RBF activation function we have





exp T

jjjj

xtR xt







 







 (3)

where

R is the covariance matrix. Geometrically

represents the center and

R the shape of the j-th basis

Function Approximation Using Robust Radial Basis Function Networks

function. A hidden unit function can be represented as a

hyper-ellipsoid in the N-dimensional space.

All the network parameters (weights, centers and radii)

may be determined using various learning algorithms

have been used in order to find the most appropriate pa-

rameters for the RBF decomposition.

A network iteratively adjusts parameters of each node

by minimizing some cost function which can be defined

as an ensemble average errors.

 



Fek ei











 (4)

where



,ei





is a scalar loss function;

 

eiyifi



 represents the residual error be-

tween the desired



i, and the actual network outputs,



i; i—indicates the index of the series;



comprises

all the unknown parameters of the network,



0111 11

11,1,

,,,, , , ,,,

,,,,,,

MMM

NNN NN

MMM

kcct trr

ctt rr















The problem of neural network training (estimating



)

which approximates the function (1) “well”, has essen-

tially been tackled, based on the following two different

assumptions [9]:

(A1) The noise has some probabilistic and/or statistical

properties.

(A2) Regardless of the disturbance nature, a noise

bound is available, i.e. 22





.

Assumption (A1) leads to different stochastic training

methods that are based on minimization of some loss

function. Different choices of loss functions arise from

various assumptions about the distribution of the noise in

measurement. The most common loss function is the

quadratic function corresponding to a Gaussian noise

model with zero mean, and a standard deviation that does

not depend on the inputs. The Gaussian loss function is

used popularly as it has nice analytical properties. How-

ever, one of the potential difficulties of the standard qua-

dratic loss function is that it receives the large contribu-

tions from outliers that have particularly large errors. The

problems in the neural network training are that when the

training data sets contain outliers, traditional supervised

learning algorithms usually cannot come up acceptable

performance. Since traditional training algorithms also

adopt the least-square cost function (4), those algorithms

are very sensitive to outliers.

Techniques that attempt to solve these problems are

referred to robust statistics [10,11]. In recent years, vari-

ous robust learning algorithms based on M-estimation

have been proposed to overcome the outlier’s problems

[12-17].

The basic idea of M-estimators is to replace the quad-

ratic function in the cost function (4) by the loss function

so that effect of those outliers may be degraded.

Traditional approaches of solving such a problem are

to introduce a robust cost function (4), and then, a steep-

est descent approach is applied. The idea of such an ap-

proach is to identify outliers and then to reduce the effect

of outliers directly.

Alternative approaches have been formulated in a de-

terministic framework based on Assumption (A2). In this

context th e training prob lem is then to find a θ belonging

to the class of models (2) for which the absolute value of

the difference between the function (1) and model is

smaller than k



for all times k.

Three different types of solutions to this problem have

mainly been explored in literature. The first method is to

formulate the estimation problem in a geometrical setting.

Different proposals result from this approach but Fogel

and Huang [18] proposed a minimal volume recursive

algorithm (FHMV) which minimizes the size of an ellip-

soid and was attractive for on-line estimation.

The second alternative is to derive estimation algo-

rithm for stability consideration together with the geo-

metrical (ellipsoidal-outer-bounding algorithm by Lozano-

Leal and Ortega) [19]. The third approach is to obtain

estimation (training) algorithm from modifying the ex-

ponentially weighted recursive least squares algorithm

(EW-RLS) [9].

All these algorithms have a dead zone. The dead zone

scheme guarantees convergence of the neural network

training algorithm in the present of noise 22





.

It should be noted that this dead zone may serve as

value that limits the accuracy of the obtained solutions,

i.e. determines its acceptable inaccuracy.

The proposed method combines the numerical robust-

ness of a particular class of non-quadratic M-estimators

and dead-zone.

2. Robust Gauss-Newton Training

Algorithm

The estimation ˆ



is the solution of the following set of

equations







1,0

Fe ei















 (5)

where











 



,,,

eiei ei













 is the

influence function and









,ei





is the weight function.

For quadratic function



,ei





in the maximum

likehood estimation case (5) has a closed form solution,

the sample mean. The sample mean is substantially af-

Function Approximation Using Robust Radial Basis Function Networks

fected by the presence of outliers.

For many non-quadratic loss functions, the Equation

(5) does not have closed form solution, but can be solved

by the some iterative or recursive m ethods.

The minimization of the criterion (4) can therefore be

performed using Gauss-Newton algorithm.

  



 

ˆˆ

1ˆˆ

Pkfkek

kk ek fkPkfk



 



 

 

(6)

 

 



 



ˆˆ

Pk Pk

Pkfk fkPkek

ekf kPkf k





 



 

(7)

where

     

  

01111

1,1 1,2

11 1

1,,1,1 ,

ˆˆˆˆˆ

ˆ,,,,,,

ˆ ˆ ˆˆˆ ˆ

,, ,, , , ,,

NN NN

MMM MM

fk fk fk fk fk

fk cctrr

fkfkfk fk fkfk

rr ctrr

















 







 



ˆ1







ˆ,,

iii







ˆA

fk A











ˆA

ij ij

fk A









with

 

xtR xt



 .

The initial value of the matrix



0P is chosen as in

the recursive MLS (RMLS), i.e.



0PI



, where



 , and the initial dimension of the identity matrix

is given as SS, where



11SMM is

number of adjustable parameters of a network contain-

ing 1 neuron. Because after the introduction into the

network a new n-th neuron the dimension of the





increases, the values of elements in matrix





Pk are

reset and initialized again, then S becomes equal to



11SnMM , where n—the current number

of neurons in the network.

The influence function







measures the influence

of a datum on the value of the parameter estimate. For

example, for the least-squares with



0.5ee



, the

influence function is





, that is, the influence of a

datum on the estimate increases linearly with the size of

its error and without bound, which confirms the non-

robustness of the least-squares estimate.

Huber proposed a robust estimator so-called an M-es-

timator, M for maximum likelihood. M-estimator is the

solution of (5) where different non-quadratic loss func-

tion



,ei





are used.

Following Huber [10], a distribution of the noise con-

taminated by outliers expressed by a mixture of two

probability density functions







 

1pxp xqx



 (8)

where





is the density of basic distribution of a

measurement noise;





qx is the density of a distribu-

tion of outliers;





0,1



 is the probability of occurring

a large error.

Even if the basic





px and contaminating





distributions are Gaussian with zero mean and variances



and 2



, 22



 hence, than optimal for the

Gaussian distribution estimations (6)-(7), obtained by

choosing





0.5ee



, will be unstable.

The density distribution *

p for



—contaminated pro-

bability distributions (8), which gives the minimum Fisher

information, contains a central region









1pp







and tails with exponentially decreasing density











. Usage of these distributions allowed to ob-

tain nonlinear robust Maximum likelihood estimates, that

are workable for almost all the noise distributions. This

algorithm combines the conventional least mean

square (LMS) if





3ek



 and least absolute devia-

tion (LAD) if





3ek



 stochastic gradient algo-

rithms and called the mixed-norm LMS algorithm

[10,20,21].

On the other hand, the choice of loss function, differ-

ent from the quadratic, ensures the robustness of esti-

mates, i.e. their workability for almost all distributions of

noises. Currently, there are many such functions







however, keeping in mind that











ek ek





 

 is

used in the learning algorithm (6)-(7) it is advisable to

choose such functions









, which have nonzero

second derivatives. As these functions can be taken, for

example [22,23],

 

Fekc













(9)

 

2lncosh ek

Fek cc





 





(10)

 



322

Fek cek



  (11)

graphs of which are shown in Table 1.

It should be noted that in case of using functionals as

(9) and (10) a problem of the selection (evaluation) of

parameter с (in Table 1 shapes of the functionals with c =

5 are shown) arises.

The standard deviations 2



and 2



in (8) are

usually unknown and must be estimated and they can be

Current Distortion Evaluation in Traction 4Q Constant Switching Frequency Converters

Table 1. Graphs of functions (9)-(11), their first and second derivatives and weight functions.



ek























ek





ek





ek



taken into account in the learning algorithm. If 2



and



do not change over time, this evaluation can be car-

ried out by stochastic approximation:



  



 



222

ˆˆ

for3 1

ˆ1 otherwise

kekk

k ekk





 











(12)



  



 



222

ˆˆ

for3 1

ˆ1 otherwise

kekk

k ekk





 











where









lkk l k

  



0 31

11 otherwise

for ekk

lk lk













The total variance of noise, calculated as

 







ˆˆ

for31

ˆ otherwise

k ekk











 (13)

can be used for normalizing the selected functional



,, ek













(14)

It should be noted that as estimation of the parameter с

in the functionals (9) and (10) 3



can be used.

3. Modification of Robust Gauss-Newton

Algorithm with Dead Zone

Dead zone, which determines the degree of permissible

errors, can be set as follows:









for

,0 for

ek ek





 











(15)

and









for

,0 for

for .

ek ek





 



















(16)

The forms of functions (12) and (13) are shown in the

Table 2 (columns 2 and 3, respectively).

In this case, the robust Gauss-Newton algorithm takes

the form

Current Distortion Evaluation in Traction 4Q Constant Switching Frequency Converters

Table 2. Graphs of (9)-(11) functions derivatives with dead zones.



ek





1,ek











2,ek







ek





ek





ek



 

 



 

ˆˆ

1,1

Pkf kek

e kfkPkfk















(17)

 

 



 



ˆˆ

1,1

Pk Pk

Pkf kfkPkek

e kfkPkfk











 





(18)

where



for

,0 for

for

ek ek





 





















1 for

0 otherwise















Table 1 (column 3) shows that for the functional (11)

there are areas where 0



. This can lead to instabil-

ity of estimates ˆ



. In this case, in the algorithm (17),

(18) instead of



*,ek





 the weighting function



*,ek





should be used, which, as seen from the

Table 1 (column 4) is always greater than zero. In this

case, algorithm (17, 18) takes the form

 

 



 

ˆˆ

1, 1

Pkf kek

ekfkPkfk













(19)









 



 



ˆˆ

1, 1

Pk Pk

Pkfk fkPkek

e kfkPkfk











 



(20)

4. Experimental Results

Consider using an RBF network to approximate the func-

tion [24]

 

12 12

16 8

0.725sin0.2 0.2

34 4

ykxx k















(21)

where



xx is an input signal that was generated

using uniformly distributed random data in range [–1, 1].

The additive noise







is a Gaussian mixture that

smixes two types of noises, a large portion of normal

Function Approximation Using Robust Radial Basis Function Networks

noise with smaller variance and a smaller portion of noise

with higher variance, i.e.

  





1kqkqk





 ,

where 00.2



 is a small number to denote the con-

tamination ratio and (





qk,





—normally distrib-

uted noises with variances 2



and 2



respectively).

50000 training data points were used for investigation of

the given function. A surface described by function (21)

without noise is shown on the Figure 1(a), on the Figure

1(b) the same surface with noise







(10.6





and

212



) is shown. On the Figure 2 the cross-sections of

the function (21) are given (dashed line denotes the re-

constructed function). The results of approximation of

the function (21) with different values of



, 2



and



are given in the Table 3. Here are the values of the

RMS error, calculated after train ing the networ k for 2500

reference values using the formula

 



2500 2

1ˆ

2500 i

yi yi









where *

—the reference value of the output signal in

the absence of interference measurements; ˆ

y—real

networks output.

Graphs of the adjustment 1



and 2



estimations at

(a) (b)

Figure 1. A surface described by function (21), (a) without noise







k; (b) with noise







Figure 2. The cross-sections of the function (21).

Function Approximation Using Robust Radial Basis Function Networks

(a) (b)

Figure 3. Results of the estimation 10.6



 and 212





with 0.2





Table 3. The results of function (21) approximation.

Table 4. Estimations of 1



, 2



and N.

Given parameters Estimations



ref



ref



Real number of outliers 1

est



est



Estimated number of outliers N

0.0 0 0 0 - - -

0.1 0.6

3 5061 0.6369 4.0902 4758

6 5008 0.6166 6.7468 4984

12 4991 0.6073 12.5611 4969

0.2 0.6

3 10013 0.7351 4.3658 9957

6 10020 0.6151 6.8815 9897

12 10111 0.6220 12.8381 10005

each step of training the network are shown in Figure 3.

Estimations of 1



, 2



and number of outliers are given

in Table 4.

As seen from the simulation results, the algorithm (12)

gives reasonably accurate estimates of 2



and 2



(assuming 22



 ) that are used in the normalization

of the loss function, which ensures high accuracy of ap-

Given parameters







Fekc



















ln coshek

Fekcc





 

 









Fek ek







(with a weight function)



ref



ref



Number of

outliers



without

dead zone



With dead

zone (15)



with dead

zone (16)



without

dead zone



with dead

zone (15)



with dead

zone (16)



without

dead zone



with dead

zone (15)



with dead

zone (16)

0.0 0 0 0 0.6286 - - - - - - - -

0.1 0.6

3 5061 1.5252 2.7339 2.6047 1.5556 2.4468 2.3937 2.0836 2.8747 2.8137

6 5008 1.6415 2.4909 2.4697 1.6553 2.2052 2.2047 1.8936 2.7882 2.7199

12 4991 1.9389 1.9634 1.9491 1.7256 1.7386 1.7379 1.6365 2.3665 2.3088

0.2 0.6

3 10013 1.6497 2.1061 2.0698 2.3438 3.0111 2.9940 2.9080 2.9365 2.9198

6 10020 2.0402 2.1209 2.0813 2.2875 2.4361 2.4113 2.2054 2.7103 2.5998

12 10111 1.9863 2.2117 2.1887 2.3682 2.7750 2.7217 2.5152 2.7012 2.6260

Function Approximation Using Robust Radial Basis Function Networks

proximation of very noisy nonlinear functions. Also it

should be noted that the usage of dead zones has reduced

training time by about 20%.

5. Conclusions

This paper proposes a resistant radial function network

on-line training algorithm based on the theory of robust

regression for dealing with outliers in the framework of

function approximation.

The proposed algorith m minimizes an M-estimate cost

functions instead of the conventional mean square error

and represents one modification of recursive Gauss-

Newton algorithm with dead-zone. These dead zone may

serve as value that limits the accuracy of the obtained

solutions.

Utilization of dead zones can decrease training time of

the network.

If the distribution of the noise contaminated by outliers

expressed by a mixture of two Gaussian distributions with

unknown standard deviations 2



and 2



, 22





these can be estimated and taken into account in the

training algorithm.

It is an efficient algorithm for practical using in inves-

tigation of real nonlinear systems. It is expedient to

develop this approach further and to investigate other

robust cost functions and training algorithms such as

Levenberg-Marquardt algorithm.

REFERENCES

[1] B. Kosko, “Neural Network for Signal Processing,” Pren-

tice-Hall Inc., New York, 1992.

[2] S. Haykin, “Neural Networks. A Comprehensive Founda-

tion,” 2nd Edition, Prentice Hall Inc., New York, 1999.

[3] C. M. Bishop, “Neural Network for Pattern Recognition,”

Clarendon Press, Oxford, 1995.

[4] H. Wang, G. P. Liu, C. J. Harris and M. Brown, “Ad-

vanced Adaptive Control,” Pergamon, Oxford, 1995.

[5] R. Hecht-Nielsen, “Kolmogorov’s Mapping Neural Net-

works Existence Theorem,” First IEEE International

Conference on Neural Networks, San Diego, Vol. 3, 1987,

pp. 11-14.

[6] G. Cybenko, “Approximation by Superpositions of a

Sigmoidal Function,” Mathematics of Control, Signals

and Systems, Vol. 2, No. 4, 1989, pp. 303-314. doi:10.

1007/BF02551274

[7] T. Poggio and F. Girosi, “Networks for Approximation

and Learning,” Proceeding of the IEEE, Vol. 78, No. 9,

1990, pp. 1481-1497. doi:10.1109/5.58326

[8] J. Park and I. W. Sandberg, “Universal Approximation

Using Radial-Basis-Function Network,” Neural Compu-

tation, Vol. 3, No. 2, 1991, pp. 246-257. doi:10.1162/

neco.1991.3.2.246

[9] C. C. de Wit and J. Carrillo, “A Modified EW-RLS Algo-

rithms for Systems with Bounded Disturbances,” Auto-

matica, Vol. 26, No. 3, 1990, pp. 599-606. doi:10.1016/

0005-1098(90)90032-D

[10] P. J. Huber, “Robust Statistics,” John Wiley, New York,

1981. doi:10.1002/0471725250

[11] R. E. Frank, M. Hampel, M. Rohchetti and W. A. Stanel,

“Robust Statistics: The Approach Based on Influence

Functions,” John Wiley & Sons Inc., Hoboken, 1986.

[12] C. C. Chang, J. T. Jeng and P. T. Lin, “Annealing Robust

Radial Basis Function Networks for Function Approxi-

mation with Outliers,” Neurocomputing, Vol. 56, 2004,

pp. 123-139. doi:10.1016/S0925-2312(03)00436-3

[13] S.-C. Chan and Y.-X. Zou, “A Recursive Least M-Esimate

Algorithm for Robust Filtering in Impulsive Noise: Fast

Algorithm and Convergence Perf ormance Analysis,” IEEE

Transactions on Signal Processing, Vol. 52, No. 4, 2004,

pp. 975-991. doi:10.1109/TSP.2004.823496

[14] D. S. Pham and A. M. Zoubir, “A Sequential Algorithm

for Robust Parameter Estimation,” IEEE Signal Proce-

ssing Letters, Vol. 12, No. 1, 2005, pp. 21-24. doi:10.11

09/LSP.2004.839689

[15] J. Ni and Q. Soug, “Pruning Based Robust Backpropaga-

tion Training Algorithm for RBF Network Training Con-

troller,” Intelligent and Robotic Systems, Vol. 48, No. 3,

2007, pp. 375-396. doi:10.1007/s10846-006-9093-x

[16] G. Deng, “Sequential and Adaptive Learning Algorithms

for M-Estimation,” EURASIP Journal on Advances in

Signal Processing, Vol. 2008, 2008, ID 459586.

[17] C.-C. Lee, Y.-C. Chiang, C.-Y. Shin and C.-L. Tsai,

“Noisy Time Series Prediction Using M-Estimator Based

Robust Radial Basis Function Network with Growing and

Pruning Techniques,” Expert Systems with Applications,

Vol. 36, No. 3, 2008, pp. 4717-4724. doi:10.1016/j.eswa.

2008.06.017

[18] E. Fogel and Y. E. Huang, “On the Value of Information

in System Identification Bounded-Noise Case,” Auto-

matica, Vol. 18, No. 2, 1982, pp. 229-238. doi:10.1016/

0005-1098(82)90110-8

[19] R. Lozano-Leal and R. Ortega, “Reformulation of the

Parameter Identification Problem for Systems with

Bounded Disturbances,” Automatica, Vol. 23, No. 2,

1987, pp. 247-251. doi:10.1016/0005-1098(87)90100-2

[20] J. Chambers and A. Alvonitis, “A Robust Mixed-Norm

Adaptive Filter Algorithm,” IEEE Signal Proceeding

Letters, Vol. 4, No. 22, 1997, pp. 46-48. doi:10.1109/97.

554469

[21] Y. Zou, S. C. Chan and T. S. Ng, “A Recursive Least

M-Estimate (RLM) Adaptive Filter for Robust Filtering

in Impulse Noise,” IEEE Signal Proceeding Letters, Vol.

7, No. 11, 2000, pp. 324-326. doi:10.1109/97.873571

[22] P. W. Holland and R. E. Welsh, “Robust Regression Us-

ing Iteratively Reweighted Least Squares,” Communica-

tions in Statistics-Theory Mathematics, Vol. A6, 1977, pp.

813-827. doi:10.1080/03610927708827533

[23] S. Geman and D. McClure, “Statistical Methods for To-

mographic Image Reconstruction,” Bulletin of the Inter-

Function Approximation Using Robust Radial Basis Function Networks

national Statistical Institut, Vol. L2, No. 4, 1987, pp. 4-5.

[24] K. S. Narendra and K. Parthasarathy, “Identification and

Control of Dynamical Systems Using Neural Networks,”

IEEE Transactions on Neural Networks, Vol. 1, No. 1,

1990, pp. 4-26. doi:10.1109/72.80202