J. Intelligent Learning Systems & Applications, 2010, 2, 147-155
doi:10.4236/jilsa.2010.23018 Published Online August 2010 (http://www.SciRP.org/journal/jilsa)
Copyright © 2010 SciRes. JILSA
147
Identification and Prediction of Internet Traffic
Using Artificial Neural Networks
Samira Chabaa1, Abdelouhab Zeroual1, Jilali Antari1,2
1Department of Physics Cadi Ayyad University, Faculty of Sciences Semlalia, Marrakech, Morocco; 2Ibn Zohr University-Agadir,
Polydisciplinaire Faculty of Taroudant, Morocco.
Email: s.chabaa@ucam.ac.ma
Received March 16th, 2010; revised July 16th, 2010; accepted July 25th, 2010.
ABSTRACT
This paper presents the development of an artificial neural network (ANN) model based on the multi-layer perceptron
(MLP) for analyzing internet traffic data over IP networks. We applied the ANN to analyze a time series of measured
data for ne twork response evaluation. Fo r this reason , we used the input and output data of an internet traffic o ver IP
networks to identify the ANN model, and we studied the performance of some training algorithms used to estimate the
weights of the neuron. The compar ison between some training algorithm s demonstrates the efficiency and the accu racy
of the Levenberg-Marquardt (LM) and the Resilient back propagation (Rp) algorithms in term of statistical criteria.
Consequently, the obtained results show that the developed models, using the LM and the Rp algorithms, can success-
fully be used for analyzing internet traffic over IP networks, and can be applied as an excellent and fundamental tool
for the management of the internet traffic at different times.
Keywords: Artificial Neural Network, Multi-Layer Perceptron, Training Algorithms, Internet Traffic
1. Introduction
Recently, much attention has been paid to the topic of
complex networks, which characterize many natural and
artificial systems such as internet, airline transport sys-
tems, power grid infrastructures, and the World Wide
Web [1-3]. Indeed, Traffic modeling is fundamental to
the network performance evaluation and the design of
network control scheme which is crucial for the success
of high-speed networks [4]. This is because network traf-
fic capacity will help each webmaster to optimize their
website, maximize online marketing conversions and
lead campaign tracking [5,6]. Furthermore, monitoring
the efficiency and performance of IP networks based on
accurate and advanced traffic measurements is an impor-
tant topic in which research needs to explore a new
scheme for monitoring network traffic and then find out
its proper approach [7]. So, a traffic model with a simple
expression is significant, which is able to capture the
statistical characteristics of the actual traffic accurately.
Since the 1950s, many models have been developed to
study complex traffic phenomena [5]. The need for ac-
curate traffic parameter prediction has long been recog-
nized in the international scientific literature [8].
The main motivation here is to obtain a better under-
standing of the characteristics of the network traffic. One
of the approaches used for the preventive control is to
predict the near future traffic in the network and then
take appropriate actions such as controlling buffer sizes
[9]. Several works developed in the literature are inter-
ested to resolve the problem of improving the efficiency
and effectiveness of network traffic monitoring by fore-
casting data packet flow in advance. Therefore, an accu-
rate traffic prediction model should have the ability to
capture the prominent traffic characteristics, e.g. short-
and long- range dependence, self-similarity in large-time
scale and multifractal in small-time scale [10]. Several
traffic prediction schemes have been proposed [11,19].
Among the proposed schemes on traffic prediction, neu-
ral network (NN) based schemes brought our attention
since NN has been shown more than acceptable per-
formance with relatively simple architecture in various
fields [19-17]. Neural networks (NNs) have been suc-
cessfully used for modeling complex nonlinear systems
and forecasting signals for a wide range of engineering
applications [20-26]. Indeed, the literature has shown that
neural networks are one of the best alternatives for mod-
eling and predicting traffic parameters possibly because
they can approximate almost any function regardless of
its degree of nonlinearity and without prior knowledge of
its functional form [27]. Several researchers have dem-
Identification and Prediction of Internet Traffic Using Artificial Neural Networks
148
onstrated that the structure of neural network is charac-
terized by a large degree of uncertainty which is pre-
sented when trying to select the optimal network struc-
ture. The most distinguished character of a neural network
over the conventional techniques in modeling nonlinear
systems is learning capability [19]. The neural network
can learn the underlying relationship between input and
output of the system with the provided data [19-26].
Among the various NN-based models, the feed-forward
neural network, also known as the Multi Layer Percep-
tron Type Neural Network (MLPNN), is the most com-
monly used and has been applied to solve many difficult
and diverse problems [27-30].
The aim of this paper is to use artificial neural net-
works (ANN) based on the multi-layer perceptron (MLP)
for identifying and developing a model that is able to
analyze and predict the internet traffic over IP networks
by comparing some training algorithms using statistical
criteria.
2. Artificial Neural Networks
Artificial neural networks (ANN) are an abstract simula-
tion of a real nervous system that consists of a set of
neural units connected to each other via axon connec-
tions which are very similar to the dendrites and the ax-
ons in biological nervous systems [31].
Furthermore, artificial neural networks are a large
class of parallel processing architecture which can mimic
complex and nonlinear processing units called neurons
[32]. An ANN, as function approximator, is useful be-
cause it can approximate a desired behavior without the
need to specify a particular function. This is a big advan-
tage of artificial neural networks compared to multivari-
ate statistics [33]. ANN can be trained to reach, from a
particular input, a specific target output using a suitable
learning method until the network output matches the
target [34]. In addition, neural networks are trained by
experience, when an unknown input is applied to the
network it can generalize from past experiences and
product a new result [35-37].
ANN is constituted by a tree layer: an input layer, an
output layer, and an intermediate hidden layer, with their
corresponding neurons. Each layer is connected to the
next layer with a neuron giving rise to a large number of
connections. This architecture allows ANNs to learn
complicated patterns. Each connection has a weight as-
sociated with it. The hidden layer learns to provide a
representation for the inputs. The output of a neuron in a
hidden or output layer is calculated by applying an acti-
vation function to the weighted sum of the input to that
neuron [20] (Figure 1). ANN model must first be
“trained” by using cases with known outcomes and it will
then adjust its weighting of various input variables over
time to refine the output data [10]. The validation data
Figure 1. Neural network model
are used for evaluating the performance of the ANN
model.
In this work, we used the back-propagation based
Multi Layer Perceptron (MLP) neural network. The multi
layer perceptron is the most frequently used neural net-
work technique, which makes it possible to carry out the
most various applications. The identification of the MLP
neural networks requires two types of stages. The first is
the determination of the network structure. Different net-
works with one layer hidden have been tried, and the
activation function used in this study is the sigmoid func-
tion described as:
1
() 1(
fx exp x
)
(1)
The second stage is the identification of parameters
(learning of the neural networks).
The suite of the used back-propagation neural net-
works are part of the MATLAB neural network toolbox
which assisted in appraising each of the above individual
neural network models for predictive purposes [38,39].
In this study various training algorithms are used.
3. Training Algorithms
The MLP network training can be viewed as a function
approximation problem in which the network parameters
(weights and biases) are adjusted during the training, in
an effort to minimize (optimize) error function between
the network output and the desired output [40]. The issue
of learning algorithm is very important for MLP neural
network [41]. Most of the well known ANN training al-
gorithms are based on true gradient computations. Am-
ong these, the most popular and widely used ANN train-
ing algorithm is the Back Propagation (BP) [42,43]. The
BP method, also known as the error back propagation
algorithm, is based on the error correlation learning rule
[44]. The BP neural networks are trained with different
training algorithms. In this section we describe some of
these algorithms. The BP algorithm uses the gradients of
the activation functions of neurons in order to back-
Copyright © 2010 SciRes. JILSA
Identification and Prediction of Internet Traffic Using Artificial Neural Networks 149
propagate the error that is measured at the output of a
neural network and calculate the gradients of the output
error over each weight in the network. Subsequently,
these gradients are used in updating the ANN weights
[45].
3.1 Gradient Descent Algorithm
The standard training process of the MLP can be realized
by minimizing the error function E defined by:

2
,,
11 1
M
pN p
M
j
pjp
pjp
Eyt
 

p
E (2)
where ,,
M
j
pjp
yt is the squared difference between the
actual output value at the jth output layer neuron for pat-
tern p and the target output value. The scalar p is an in-
dex over input–output pairs. The general purpose of the
training is to search an optimal set of connection weights
in the manner that the errors of the network output can be
minimized [46].
In order to simplify the formulation of the equations,
let w be the n-dimensional weight vector of all connec-
tion weights and biases. Accordingly, the weight update
equation for any training algorithm has the iterative form.
In each iteration, the synaptic weights are modified in the
opposing direction to those of the gradient of the cost
function. The on-line or off-line versions are applied
where we use the instantaneous gradient of the partial
error function Ep, or on the contrary the gradient of the
total error function E respectively.
To calculate the gradient for the two cases, the Error
Back Propagation (EBP) algorithm is applied. The pro-
cedure in the mode off-line sums up as follows

1kk
wkwk d
 (3)
where, is the weight
vector in k iterations, n is the number of synaptic connec-
tions of the network, k is the index of iteration,
 
1,... ... ..,T
n
wkw kwk
k
is the
learning rate which adjusts the size of the step gradient,
and dk is a search direction which satisfies the descent
condition.
The steepest descent direction is based to minimize the
error function, namely
kw
dg
where
  
1
,... ... ... ..,
T
wn
E
gk wk wk

E
is the gra-
dient of the estimated error in w. throughout the training
with the standard steepest descent, the learning rate is
held constant, which makes the algorithm very sensitive
to the proper setting of the learning rate. Indeed, the al-
gorithm may oscillate and become unstable, if the learn-
ing rate is set too high. But, if the learning rate is too
small, the algorithm will take a long time to converge.
3.2 Conjugate Gradient Algorithm
The basic back propagation algorithm adjusts the weights
in the steepest descent direction in which the perform-
ance function decreases most rapidly. Although, the
function decreases most rapidly along the negative of the
gradient, this does not necessarily produce the fastest
convergence [44]. In the conjugate gradient algorithms, a
search is made along the conjugate gradient direction to
determine the step size which will minimize the per-
formance function along that line [41].
The conjugate gradient (CG) methods are a class of
very important methods for minimizing smooth functions,
especially when the dimension is large [47].
The principal advantage of the CG is that they do not
require the storage of any matrices as in Newton’s
method, or as in quasi-Newton methods, and they are
designed to converge faster than the steepest descent
method [46].
There are four types of conjugate gradient algorithms
which can be used for training.
All of the conjugate gradient algorithms start out by
searching in the steepest descent direction (negative of
the gradient) on the first iteration [41,44,46]:
p0 = –gw(0)
where p0 is the initial search gradient and g0 is the initial
gradient.
A line search is then performed to determine the opti-
mal distance to move along the current search direction:

1kk
wkwk d

The next search direction is determined so that it is
conjugated to previous search directions. The general
procedure for determining the new search direction is to
combine the new steepest descent direction with the pre-
vious search direction:
1kw kk
dgk d
 
where k
is a parameter to be determined so that
becomes the k-the conjugate direction.
k
d
The way in which the k
constant is computed dis-
tinguishes the various versions of conjugate gradient,
namely Fletcher-Reeves updates (Cgf), Conjugate gradi-
ent with Polak-Ribiere updates (Cgp), Conjugate gradient
with Powell-Beale restarts (Cgb) and scaled conjugate
gradient algorithm (Scg).
1) Conjugate gradient with Fletcher-Reeves updates
The procedure to evaluate the constant k
with the
Fletcher-Reeves update is [48]


11
T
ww
kT
ww
gkgk
gk gk
k
represents the ratio of the norm squared of the
Copyright © 2010 SciRes. JILSA
Identification and Prediction of Internet Traffic Using Artificial Neural Networks
150
current gradient to the norm squared of the previous gra-
dient.
2) Conjugate gradient with Polak-Ribiere updates
The constant k
is computed by the Polak-Ribiére
update as [49]:

11
T
wk
kT
ww
gky
gk gk

where is the inner product of
the previous change in the gradient with the current gra-
dient divided by the norm squared of the previous gradi-
ent.
 
1
kw w
ygkgk
3) Conjugate gradient with Powell-Beale resta r ts
In conjugate gradient algorithms, the search direction
is periodically reset to the negative of the gradient. The
standard reset point occurs when the number of iterations
is equal to the number of network parameters (weights
and biases), but there are other reset methods that can
improve the efficiency of the training process [50]. This
technique restarts if there is a very little orthogonality left
between the current and the previous gradient:
 
2
10.2
T
ww w
kgk gk
If this condition is satisfied, the search direction is re-
set to the negative of the gradient [41,44,51].
4) Scaled conjugate gradient (Scg)
The scaled conjugate gradient algorithm requires a line
search at any iteration which is computationally expen-
sive since it requires computing the network response for
all training inputs at several times for each search.
The Scg combines the model-trust region approach
(used in the Levenberg-Marquardt algorithm described in
the following section), with the conjugate gradient ap-
proach. This algorithm was designed to avoid the
time-consuming line search. It is developed by Moller
[52], where the constant k
is computed by:
 

2
2
11
T
ww
k
w
w
g
kgkg
gk
 
k
3.3 One Step Secant
Battiti proposes a new memory-less quasi-Newton method
named one step secant (OSS) [53], which is an attempt to
bridge the gap between the conjugate gradient algorithms
and the quasi-Newton (secant) algorithms. This algo-
rithm does not store the complete Hessian matrix. It as-
sumes that at any iteration, the previous Hessian was the
identity matrix. This has the additional advantage that the
new search direction can be calculated without comput-
ing a matrix inverse [41,44].
3.4 Levenberg-Marquardt Algorithm
The Levenberg-Marquardt (LM) algorithm [54,55] is the
most widely used optimization algorithm. It is an itera-
tive technique that locates the minimum of a multivari-
ante function that is expressed as the sum of squares of
non linear real valued functions [56-58]. The LM is the
first algorithm shown to be blend of steepest gradient
descent and Gauss-Newton iterations. Like the quasi-New-
ton methods, the LM algorithm was designed to approach
second-order training speed without having to compute
the Hessian matrix [44]. The LM algorithm provides a
solution for non linear least squares minimization prob-
lems. When the performance function has the form of a
sum of squares, then the Hessian matrix can be approxi-
mated as [44]:
T
JJH
where J is the Jacobian matrix that contains the first
derivates of network errors and the gradient can be com-
puted as:
T
J
w
g
e
where the Jacobian matrix contains the first derivatives
of the network errors with respect to the weights and
biases, and e is a vector of network errors.
The Levenberg–Marquardt (LM) algorithm uses the
approximation to the Hessian matrix in the following
Newton-like update [41,44]:
1
TT
1JJ J
kk
ww I

 

e
where I is the identity matrix and µ is a constant.
µ decreases after each successful step (reduction in
performance function) and increases only when a tenta-
tive step would increase the performance function. In this
way, the performance function will always be reduced at
each iteration of the algorithm [44].
3.5. Resilient back Propagation (Rp) Algorithm
There has been a number of refinements made to the BP
algorithm with arguably the most successful in general
being the Resilient Back Propagation method or Rp
[59-61]. Furthermore, the goal of the algorithm of Rp is
to eliminate the harmful effects of the magnitudes of the
partial derivatives. Therefore, only the sign of the deriva-
tive is used to determine the direction of the weight up-
date. Indeed, the Rp modifies the size of the weight step
that is adaptively taken. The adaptation mechanism in Rp
does not take into account the magnitude of the gradient
(
w
g
k) as seen by a particular weight, but only the sign
of the gradient (positive or negative) [44,61].
The Rp algorithm is based on the modification of each
weight by the update value (or learning parameter) in
such a way as to decrease the overall error. The update
value for each weight and bias is increased whenever the
derivative of the performance function with respect to
that weight has the same sign for two successive iterations.
Copyright © 2010 SciRes. JILSA
Identification and Prediction of Internet Traffic Using Artificial Neural Networks 151
The principle of this method is as follows:

11
kk w
ww signgk
 
k
k


1 1*0
kkww
nifgkgk


1 1*0
kkww
nifgkgk

else where
1k01nn

k
is the update value of the weight, which evolves
according to changes of sign of the difference (1kk
ww
)
of the same weight in k iterations. The update values and
the weights are changed after each iteration.
All update values are initialized to the value D0. The
update value is modified in the following manner: if the
current gradient (

w
g
k
n
) multiplied by the gradient of
the previous step is positive (that is the gradient direction
has remained the same), then the update value is multi-
plied by a value (which is greater than one). Simi-
larly, if the gradient product is negative, the update value
is multiplied by the value n
(which is less than one)
[61].
4. Results and Discussion
In this part, we are interested in appling the MPL neural
networks for developing a model able to identify and
predict the internet traffic. The considered data are com-
posed of 1000 points (Figure 2). The databases were
divided in two parts training (750 points) and testing
(250 points) data as required by the application of MLP.
Additionally, the training data set is used to train the
MLP and must have enough size to be representative for
overall problem. The testing data set should be inde-
pendent of the training set and are used to assess the
classification accuracy of the MLP after the training
process [62,63].
The error analysis was used to check the performance
0100 200 300400500 600700 800 900 1000
0. 05
0. 06
0. 07
0. 08
0. 09
0. 1
0. 11
0. 12
0. 13
Measured Values
(
s
)
Sample sizes
Figure 2. Real data
of the developed model. The accuracy of correlations
relative to the measured values is determinated by vari-
ous statistical means. The criteria exploited in this study
were the Root Mean Square Error (RMSE), the Scatter
Index (SI), the Relative Error and Mean Absolute Per-
centage Error (MAPE) [64-66] given by:

1/2
22
ˆ
R
errorm mm
Eyyy/

(2)

2
1
1ˆ
RMSE =N
mm
myy
N
(3)
m
RMSE
SI y
(4)
1
100 N
m
iˆm
M
APE y
N
y (5)
where and represent respectively real and es-
timated data,
m
yˆm
y
m
y is the mean values of real data and N
represents the sample size. Table 1 shows the obtained
results of each statistical indicator for the different algo-
rithms:
From these results, we conclude that the Leven-
berg-Marquardt (LM) and the Resilient back propagation
(Rp) algorithms give more precision using the statistical
criteria than the other training algorithms. The Gd, Scg,
Cgf, Cgp, Cgb and Oss training algorithm give big values
in term of the used statistical criteria, which prove that
these training algorithms are not significant for predic-
tion purpose. For this reason, we used the LM and the Rp
training algorithms in the next paragraph.
To agreement the efficiency of the developed model
based on the LM and the Rp training algorithms, we draft
in Figure 3 the evolution of measured and predicted time
series of the internet traffic for the two algorithms where
we represent just 100 points. We notice that the two time
series have the same behaviour.
Table 1. Values of different statistical indicators for differ-
ent algorithms
Training
algorithms Rerror RMSE SI MAPE
LM 0.0230 0.0019 0.0222 4.2563
Gd 0.1666 0.0142 0.1642 4.2580
Rp 0.0371 0.0031 0.1327 4.3584
Scg 0.1279 0.0128 0.0357 4.1235
Cgf 0.1448 0.0128 0.1300 4.2528
Cgp 0.1339 0.0118 0.1485 4.2246
Cgb 0.1480 0.0128 0.1480 4.2621
Oss 0.1480 0.0128 0.1480 4.2622
Copyright © 2010 SciRes. JILSA
Identification and Prediction of Internet Traffic Using Artificial Neural Networks
Copyright © 2010 SciRes. JILSA
152
010 2030 40 50 60 70 80 90 100
0. 04
0. 05
0. 06
0. 07
0. 08
0. 09
0. 1
0. 11
0. 12
Sample sizes
Measured values (s)
Measured dat a
Predi cted data
010 20 30 4050 60 70 8090100
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
Sample sizes
Measured values (s)
Meas ured dat a
predic ted dat a
(a) (b)
Figure 3. Comparison between measured and predicted data (a) Rp algorithm, (b) LM
0.040.05 0.060.07 0.08 0.090.10.110.12 0.13 0.14
0
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1
Sample sizes(s)
Cumul at ive dist ributi on
meas ured
ANN
0.05 0.06 0.07 0.080.090.1 0.11 0.120.13
0
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1
Sample sizes(s)
Cumulative distribution
me as ured
ANN
(a) (b)
Figure 4. Cumulative distribution of measured and predicted data (a) Rp algorithm, (b) LM algorithm
0.05 0.06 0.07 0.080.090.10.11 0.12 0.13
0. 05
0. 06
0. 07
0. 08
0. 09
0. 1
0. 11
0. 12
0. 13
m easured data
predicted data
0.04 0.05 0.06 0.070.080.090.10.11 0.12 0.130.14
0. 05
0. 06
0. 07
0. 08
0. 09
0. 1
0. 11
0. 12
0. 13
measured dat a
predic t ed data
(a) (b)
Figure 5. Scattering diagram of measured and predicted data for (a) Rp algorithm, (b) LM algorithm
Identification and Prediction of Internet Traffic Using Artificial Neural Networks 153
On the other hand, we present in Figure 4 the cumula-
tive distributions of measured and predicted data. Figure
4 demonstrates clearly the similarity between measured
and predicted values. So, the identified ANN model can
be used for predicting data of internet traffic. Further-
more, the scattering diagram (Figure 5) presents a com-
parison between measured and predicted data using ANN
model which constitutes another means to test the per-
formance of the model.
5. Conclusions
In this paper we present an artificial neural network
(ANN) model based on the multi-layer perceptron (MLP)
for analyzing internet traffic over IP networks. We used
the input and output data to describe the ANN model,
and we studied the performance of the training algo-
rithms which are used to estimate the weights of the
neuron. The comparison between some training algo-
rithms demonstrates the efficiency of the Levenberg-
Marquardt (LM) and the Resilient back propagation (Rp)
algorithms using statistical criteria. Consequently, the
obtained model using the LM and the Rp can success-
fully be used as an adequate model for the identification
and the management of internet traffic over IP networks.
In addition, it can be applied as an excellent fundamental
tool to management of the internet traffic at different
times, and as a practical concept to install the computer
material in a high industrial area.
REFERENCES
[1] J.-J. Wu, Z.-Y. Gao and H.-J. Sun, “Statistical Properties
of Individual Choice Behaviors on Urban Traffic Net-
works,” Journal of Transportation Systems Engineering
and Information Technology, Vol. 8, No. 2, 2008, pp.
69-74.
[2] R. Albert and A. L. Barabási, “Statistical Mechanics of
Complex Networks,” Review Modern Physics, Vol. 74,
No. 1, 2002, pp. 47-97.
[3] J.-J. Wu, H.-J. Sun and Z.-Y. Gao, “Cascade and Break-
down in Scale-Free Networks with Community Struc-
ture”, Physical Review E-Statistical, Nonlinear, and Soft
Matter Physics, Vol. 74, No. 6, 2006.
[4] W. E. Leland, M. S. Taqqu, W. Willinger and D. V. Wil-
son, “On the Self Similar Nature of Ethernet Traffic,”
Proceedings of ACM Sigcomm, San Francisco, 1993, pp.
183-193.
[5] R. Yunhua, “Evaluation and Estimation of Second-Order
Self-Similar Network Traffic,” Computer Communica-
tions, Vol. 27, No. 9, 2004, pp. 898-904.
[6] B. R. Chang and H. F. Tsai, “Novel Hybrid Approach to
Data-Packet-Flow Prediction for Improving Network
Traffic Analysis,” Applied Soft Computing, Vol. 9, No. 3,
2009, pp. 1177-1183.
[7] B. R. Chang and H. F. Tsai, “Improving Network Traffic
Analysis by Foreseeing Data-Packet-Flow with Hybrid
Fuzzy-Based Model Prediction,” Expert Systems with Ap-
plications, Vol. 36, No. 3, 2009, pp. 6960-6965.
[8] A. Stathopoulos and M. G. Karlaftis, “A Multivariate
State Space Approach for Urban Traffic Flow Modeling
and Prediction,” Transportation Research Part C, Vol. 11,
No. 2, 2003, pp. 121-135.
[9] D.-C. Park, “Prediction of MPEG video traffic over ATM
Networks Using Dynamic Bilinear Recurrent Neural
Network,” Applied Mathematics and Computation, Vol.
205, No. 2, 2008, pp. 648-657.
[10] B. Zhou, D. He, Z. Sun and W. H. Ng, “Network Traffic
Modeling and Prediction with ARIMA/GARTH,” HET-
NETs’ 06 Conference, Ilkley, 11-13 September 2006, pp.
1-10.
[11] A. Taraf, I. Habib and T. Saadawi, “Neural Networks for
ATM Multimedia Traffic Prediction,” Proceedings of the
International Workshop on Applications of Neural Net-
works to Telecommunications, Princeton, 1993, pp. 85-91.
[12] P. Chang and J. Hu, “Optimal Non-Linear Adaptive Pre-
diction And Modeling Of MPEG Video in ATM Net-
works Using Pipelined Recurrent Neural Networks,”
IEEE Journal on Selected Areas in Communications, Vol.
15, No. 6, 1997, pp. 1087-1100.
[13] A. Abdennour, “Evaluation of Neural Network Architec-
tures for MPEG-4 Video Traffic Prediction,” IEEE Trans-
actions on Broadcasting, Vol. 52, No. 2, pp. 184-192,
2006.
[14] A. F. Atiya, M. A. Aly and A. G. Parlos, “Sparse Basis
Selection: New Results and Application to Adaptive Pre-
diction of Video Source Traffic,” IEEE Transactions on
Neural Networks, Vol. 16, No. 5, 2005, pp. 1136-1146.
[15] Z. Fang, Y. Zhou and D. Zou, “Kalman Optimized Model
for MPEG-4 VBR Sources,” IEEE Transactions on Con-
sumer Electronics, Vol. 50, No. 2, 2004, pp. 688-690.
[16] V. Alarcon-Aquino and J. A. Barria, “Multi Resolution
Fir Neural Network-Based Learning Algorithm Applied
to Network Traffic Prediction,” IEEE Transactions on
Systems, Man and Cybernetics-Part C: Applications and
Reviews, Vol. 36, No. 2, 2006, pp. 208-220.
[17] E. S. Yu and C. Y. R. Chen, “Traffic Prediction Using
Neural Networks,” Proceedings of the IEEE Global Tele-
communications Conference (GLOBCOM), Vol. 2, 1993,
pp. 991-995.
[18] H. Lin and Y. Ouyang, “Neural Network Based Traffic
Prediction for Cell Discarding Policy,” Proceedings of
IJCNN’97, Vol. 4, 1997.
[19] A. Tarraf, I. Habib and T Saadawi, “Characterization of
Packetized Voice Traffic In ATM Networks Using Neural
Networks,” Proceeding of the IEEE Global Telecommu-
nications Conference (GLOBCOM), Vol. 2, 1993, pp
996-1000.
[20] W. M. Moh, M.-J. Chen, N.-M. Chu and C.-D. Liao,
“Traffic Prediction and Dynamic Bandwidth Allocation
over ATM: A Neural Network Approach,” Computer
Communications, Vol. 18, No. 8, 1995, pp. 563-571.
[21] C. Looney, “Pattern Recognition Using Neural Net-
Copyright © 2010 SciRes. JILSA
Identification and Prediction of Internet Traffic Using Artificial Neural Networks
154
works,” Oxford Press, Oxford, 1997.
[22] V. Vemuri and R. Rogers, “Artificial Neural Networks:
Forecasting Time Series,” The IEEE Computer Society
Press, Los Alamitos, 1994.
[23] D. C. Park, M. A. El-Sharkawi and R. J. Marks II,
“Adaptively Trained Neural Network,” IEEE Transac-
tions on Neural Networks, Vol. 2, No. 3, 1991, pp.
34-345.
[24] D. C. Park and T. K. Jeong, “Complex Bilinear Recurrent
Neural Network for Equalization of a Satellite Channel,”
IEEE Transactions on Neural Networks, Vol. 13, No. 3,
2002, pp. 711-725.
[25] D. C. Park, M. A. El-Sharkawi, R. J. Marks II, L. E. Atlas
and M. J. Damborg, “Electronic Load Forecasting Using
an Artificial Neural Network,” IEEE Transactions on
Power Systems, Vol. 6, No. 2, 1991, pp. 442-449.
[26] D.-C. Park, “Structure Optimization of Bi-Linear Recur-
rent Neural Networks and its Application to Ethernet
Network Traffic Prediction,” Information Sciences, 2009,
in press.
[27] E. I. Vlahogianni, M. G. Karlaftis and J. C. Golias, “Opti-
mized and Meta-Optimized Neural Networks for Short-
Term Traffic Flow Prediction: A Genetic Approach,”
Transportation Research Part C, Vol. 13, No. 3, 2005, pp.
211-234.
[28] A. Eswaradass, X.-H. Sun and M. Wu, “A Neural Net-
work Based Predictive Mechanism for Available Band-
width,” Proceeding of 19th IEEE International. Confer-
ence on Parallel and Distributed Proceeding Symposium,
Denver, 2005.
[29] N. Stamatis, D. Parthimos and T. M. Griffith, “Forecast-
ing Chaotic Cardiovascular Time Series with an Adaptive
Slope Multilayer Perceptron Neural Network,” IEEE
Transactions on Biomedical Engineering, Vol. 46, No. 2,
1999, pp. 1441-1453.
[30] H. Yousefi’zadeh, E. A. Jonckheere and J. A. Silvester,
“Utilizing Neural Networks to Reduce Packet Loss in
Self-Similar Tele Traffic,” Proceeding of IEEE Interna-
tional. Conference on Communications, Vol. 3, 2003, pp.
1942-1946.
[31] R. Muñoz, O. Castillo and P. Melin, “Optimization of
Fuzzy Response Integrators in Modular Neural Networks
with Hierarchical Genetic Algorithms: The Case of Face,
Fingerprint and Voice Recognition,” Evolutionary Design
of Intelligent Systems in Modeling, Simulation and Con-
trol, Vol. 257, 2009, pp. 111-129.
[32] Y. C. Lin, J. Zhang and J. Zhong, “Application of Neural
Networks to Predict the Elevated Temperature Flow Be-
havior of a Low Alloy Steel,” Computational Materials
Science, Vol. 43, 2008, pp. 752-758.
[33] R. Wieland and W. Mirschel, “Adaptive Fuzzy Modeling
Versus Artificial Neural Networks,” Environmental Mod-
eling & Software, Vol. 23, No. 2, 2008, pp. 215-224.
[34] H. M. Ertunc and M. Hosoz, “Comparative Analysis of an
Evaporative Condenser Using Artificial Neural Network
and Adaptive Neuro Fuzzy Inference System,” Interna-
tional Journal of Refrigeration, Vol. 31, No. 8, 2009, pp.
1426-1436.
[35] C. L. Zhang, “Generalized Correlation of Refrigerant
Mass Flow Rate Through Adiabatic Capillary Tubes Us-
ing Artificial Neural Network,” International Journal of
Reference, Vol. 28, No. 4, 2005, pp. 506-514.
[36] A. Sencan and S. A. Kalogirou, “A New Approach Using
Artificial Neural Networks for Determination of the
Thermodynamic Properties of Fluid Couples,” Energy
Conversion and Management, Vol. 46, No. 15-16, 2005,
pp. 2405-2418.
[37] H. Esen, M. Inalli, A. Sengur and M. Esen, “Artificial
Neural Networks and Adaptive Neuro-Fuzzy Assessments
for Ground-Coupled Heat Pump System,” Energy and
Buildings, Vol. 40, No. 6, 2008, pp. 1074-1083.
[38] A. Sang and S.-Q. Li, “A Predictability Analysis of Net-
work Traffic,” Computer Networks, Vol. 39, No. 1, 2002,
pp. 329-345.
[39] M. Çınar, M. Engin, E. Z. Engin and Y. Z. Ateşçi, “Early
Prostate Cancer Diagnosis by Using Artificial Neural
Networks And Support Vector Machines,” Expert Sys-
tems with Applications, Vol. 36, No. 3, 2009, pp. 6357-
6361.
[40] R. Pasti and L. N. de Castro, “Bio-Inspired and Gradient-
Based Algorithms to Train Mlps: The Influence of Diver-
sity,” Information Sciences, Vol. 179, No. 10, 2009, pp.
1441-1453.
[41] A. Ebrahimzadeh and A. Khazaee, “Detection of Prema-
ture Ventricular Contractions Using MLP Neural Net-
works: A Comparative Study,” Measurement, Vol. 43, No.
1, 2010, pp. 103-112.
[42] D. E. Rumelhart and J. L. McClelland, “Parallel Distrib-
uted Processing Foundations,” MIT Press, Cambridge,
1986.
[43] Y. Chauvin, D. E. Rumelhart, (Eds.), “Backpropagation:
Theory, Architectures, and Applications,” Lawrence Erl-
baum Associates, Inc., Hillsdale, 1995.
[44] J. Ramesh, P. T. Vanathi and K. Gunavathi, “Fault Clas-
sification in Phase-Locked Loops Using Back Propaga-
tion Neural Networks,” ETRI Journal, Vol. 30, No. 4,
2008, pp. 546-553.
[45] D. Panagiotopoulos, C. Orovas and D. Syndoukas, “A
Heuristically Enhanced Gradient Approximation (HEGA)
Algorithm for Training Neural Networks,” Neurocom-
puting, Vol. 73, No. 7-9, 2010, pp. 1303-1323.
[46] A. E. Kostopoulos and T. Grapsa, “Self-Scaled Conjugate
Gradient Training Algorithms,” Neurocomputing, Vol. 72,
No. 13-15, 2009, pp. 3000-3019.
[47] R. Fletcher and C. M. Reeves, “Function Minimization by
Conjugate Gradient,” Computer Journal, Vol. 7, No. 2,
1964, pp. 149-154.
[48] J. Nocedal and S. J. Wright, “Numerical Optimization”,
Series in Operations Research, Springer Verlag, Heidel-
berg, Berlin, New York, 1999.
[49] E .Polak and G. Ribiéres, “Note sur la convergence de
méthodes de directions conjuguées,” Revue Française
Copyright © 2010 SciRes. JILSA
Identification and Prediction of Internet Traffic Using Artificial Neural Networks
Copyright © 2010 SciRes. JILSA
155
dInformatique et de Recherche Opérationnelle, Vol. 16,
1969, pp. 35-43.
[50] M. J. D. Powell, “Restart Procedures for the Conjugate
Gradient Method,” Mathematical Programming, Vol. 12,
No. 1, 1977, pp. 241-254.
[51] D. Yuhong and Y. Yaxiang, “Convergence Properties of
Beale-Powell Restart Algorithm,” Science in China (Se-
ries A), Vol. 41, No. 11, 1998, pp. 1142-1150.
[52] M. Moller, “A Scaled Conjugate Gradient Algorithm for
Fast Supervised Learning,” Neural Networks, Vol. 6, No.
4, 1993, pp. 525-533.
[53] S. M. A. Burney, T. A. Jhilani and C. Adril, “Levenberg
Mauquardt Algorithm for Karachi Stock Exchange Share
Rates Forecasting,” Proceeding of World Academy of
Science, Engineering, and Technology, Vol. 3, 2005, pp.
171-176.
[54] R. Battiti, “First- and Second-Order Methods for Learn-
ing: Between Steepest Descent and Newton’s Method,”
Neural Computation, Vol. 4, No. 2, 1992, pp. 141-166.
[55] M. T. Hagan and M. B. Menhaj, “Training Feed-Forward
Networks with the Marquardt Algorithm,” IEEE Trans-
action Neural Networks, Vol. 5, No. 6, 1994, pp. 989-
993.
[56] M. I. A. Lourakis and A. A. Argyros, “The Design and
Implementation of a Generic Sparse Buddle Adjustment
Software Package Based on the Levenberg-Marquardt
Algorithmy,” Technical Report FORTH-ICS/TR, No. 340,
Institute of Computer Science, August 2004.
[57] K. Levenberg, “A Method for the Solution of Certain
Non-linear Problems in Least Squares,” Quarterly of Ap-
plied Mathematics, Vol. 2, No. 2, 1944, pp. 164-168.
[58] D. W. Marquardt, “An Algorithm for the Least-Squares
Estimation of Non linear Parameters”, SIAM Journal of
Applied Mathematics, Vol. 11, No. 2, 1963, pp. 431-441.
[59] M. Riedmiller and H. Braun, “A Direct Adaptive Method
for Faster Backpropagation Learning: The RPROP Algo-
rithm,” In: H. Ruspini, Ed., Proceeding of the IEEE in-
ternational conference on neural networks (ICNN), San
Francisco, 1993, pp. 586-591.
[60] M. Riedmiller, “Rprop—Description and Implementation
Details,” Technical Report, University of Karlsruhe,
Karlsruhe, 1994.
[61] N. K. Treadgold and T. D. Gedeon, “The SARPROP
Algorithm: A Simulated Annealing Enhancement to Re-
silient Back Propagation,” Proceedings International
Panel Conference on Soft and Intelligent Computing,
Budapest, 1996, pp. 293-298.
[62] M. M. Mostafa, “Profiling Blood Donors in Egypt: A
Neural Network Analysis,” Expert Systems with Applica-
tions, Vol. 36, No. 3, 2009, pp. 5031-5038.
[63] S. Chabaa, A. Zeroual and J. Antari, “MLP Neural Net-
works for Modeling non Gaussian Signal,” Workshop
STIC Wotic’09 in Agadir, 24-25 December 2009, p. 58
[64] H. Demuth and M. Beale, “Neural Network Toolbox for
Use with MATLAB: Computation, Visualization, Pro-
gramming, User’s Guide, Version 3.0,” The Mathworks
Inc., Asheboro, 2001.
[65] V. Karri, T. Ho and O. Madsen, “Artificial Neural Net-
works and Neuro-Fuzzy Inference Systems as Virtual
Sensors for Hydrogen Safety Prediction,” International
Journal of Hydrogen Energy, Vol. 33, No. 11, 2008, pp.
2857-2867.
[66] J. Antari, R. Iqdour and A. Zeroual, “Forecasting the
Wind Speed Process Using Higher Order Statistics and
Fuzzy Systems,” Review of Renewable Energy, Vol. 9,
No. 4, 2006, pp. 237-251.