Journal of Computer and Communications, 2014, 2, 24-30
Published Online September 2014 in SciRes. http://www.scirp.org/journal/jcc
http://dx.doi.org/10.4236/jcc.2014.211003
How to cite this paper: Velarde-Alvarado, P., Martinez-Pelaez, R., Ruiz-Ibarra, J. and Morales-Rocha, V. (2014) Information
Theory and Data-Mining Techniques for Network Traffic Profiling for Intrusion Detection. Journal of Computer and Com-
munications, 2, 24-30. http://dx.doi.org/10.4236/jcc.2014.211003
Information Theory and Data-Mining
Techniques for Network Traffic Profiling for
Intrusion Detection
Pablo Velarde-Alvarado1, Rafael Martinez-Pelaez1, Joel Ruiz-Ibarra1,
Victor Morales-Rocha2
1Autonomous University of Nayarit, Area of Basic Sciences and Engineering, Ciudad de la Cultura Amado
Nervo , Tepic, Nayarit, Mexico
2Department of Electrical and Computer Engineering, Autonomous University of Ciudad Juarez, Ciudad Juarez,
Mexico
Email: pvelarde@uan.edu.mx, rafael_uvm @ho tmail .co m, joelruizibarra@gmail.com, victor.morales@uacj.mx
Received June 2014
Abstract
In this paper, information theory and data mining techniques to extract knowledge of network
traffic behavior for packet-level and flow-level are proposed, which can be applied for traffic pro-
filing in intrusion detection systems. The empirical analysis of our profiles through the rate of re-
maining features at the packet-level, as well as the three-dimensional spaces of entropy at the
flow-level, provide a fast detection of intrusions caused by port scanning and worm attacks.
Keywords
Intrusion Detection, Traffic Profiling, Entropy, and Network Worms
1. Introduction
Network Intrusion Detection Systems [1], or NIDSs, have become an important component to detecting attacks
against information systems. However, they offer only a limited defense. For instance, a signature-based NIDS
(S-NIDS) monitors packets on the network and compares them against a database of signatures or attributes
from known malicious threats. A weakness of this type of NIDS is that there will always be a lag between a new
threat being discovered and the signature for detecting that threat being applied to the NIDS. During that lag
time the NIDS would be unable to detect the new threat.
A second type of NIDS is the anomaly-based NIDS (A-NIDS), which monitors network traffic and compares
it against an established baseline, [2]. The baseline helps to identify what is normal behavior for that network. If
a deviation from the established baseline reaches a specified threshold, an alarm is generated. Therefore, ano-
maly detection techniques have the potential to detect new and unforeseen types of attacks. Traditional anomaly
based IDSs, employ algorithms that focus primarily on changes in the traffic volume at specific points on the
network, and promptly alert the operator of a sudden increase. However, such systems can be evaded through
sophisticated attacks that focus on compromising significant hosts, causing them a collapse of memory or CPU
P. Velarde-Alvarado et al.
25
and maintaining a level of traffic within the normal threshold.
Recently, a new generation of anomaly based IDSs have emerged, which focus on gaining knowledge in the
structure and composition of the traffic and not just its volume. Such systems are based on the fact that the mali-
cious activities affect the natural randomness of the network, e.g., they change significantly the entropy of the
network [4] [5]. The composition of traffic is related to its probability distribution, and can be characterized by
its entropy; a malicious activity changes that composition and the shape of the distribution and therefore its en-
tropy. By means of entropy measures to a set of traffic features, we can establish the profiles of normal activity
of the network and determine intrusions to the system.
This paper presents an analysis at the packet and the flow level on traces obtained through measurements
conducted in a campus network under real attacks of the Blaster [6] and Sasser [7] worms, as well as a port scan
attack to the proxy server of that network. The captured traces during a week of normal operation, helped to de-
velop a profile of normal behavior that is useful to be compared to attack conditions.
The paper is organized as follows. In Section 2, we present our profiling approach and the context of this pa-
per. Section 3 describes the test environment; Sections 4 and 5 explain the methodology: the rate of remnant
items and spaces of entropy and results. Section 6 gives concluding remarks.
2. Profiling Approach
We propose two methods for the creation of profiles based on entropy. The analysis applies primarily to the
packet-level for the method of the rate of remnant elements and to the flow-level for the spaces of entropy. In-
itially, there is a set of captured traffic traces corresponding to five days in typical working hours in an academic
LAN. The traces have been inspected to be considered free of anomalies, so they may serve as a baseline.
We use traffic features to build the profiles. A traffic feature is a field in a header of a packet (at the packet
level) or a field in a five-tuple (at the flow level), respectively. Four fields will be used: source address (srcI P),
destination address (d stIP), source port (srcPrt), and destination port (dstPrt).
After the feature extraction, an essential part in the builder profile block is the measurement of entropy. For a
discrete set of symbols {
123
,,,,
n
aaa a
} with probabilities
i
p
,
1, 2,,in=
, the entropy of the discrete
distribution of a random variable
X
associated, is a measure of randomness in the set of symbols and represented
as
()( )
22
1log, 0log
MAX
n
ii
i
HXppHX Hn
=
= ≤≤=
(1)
The relative uncertainty (RU) provides a measure of variety or uniformity that is independent of the sample
size. For a random variable XRU is defined as, [3],
( )( )
,0 1
MAX
HX
RURU X
H
= ≤≤
(2)
means that observed values of X are closer to being uniformly distributed, thus less distinguishable
from each other, whereas
( )
0RU X
indicates that the distribution is highly concentrated.
3. Experimental Platform
The worm propagation and port scanning were carried out on academic LAN which is subdivided into four sub-
nets (192.168.1.0, 192.168.2.0, 192.168.4.0, and 10.253.253.0). There are 100 hosts running Windows XP SP2
mainly. One router (192.168.1.1) connects the subnets with 10 Ethernet switches and 18 IEEE 802.11b/g wire-
less access points. The data rate of the core network is 100Mbps. A sector of the network is left vulnerable for
worm propagation, with ten no Windows XP service pack installed at all (192.168.1.104 - 113). In the experi-
ments Blaster and Sasser worms where released in the vulnerable sector. The scanning port attack was observed
on the proxy server (192.168.4.253).
Data-Set and Tools
The benign traffic traces in typical work hours for a period of five days were labeled with a number from one to
five. The anomalous traffic for port scanning attack was labeled as 6-P1. Blaster and Sasser worm attacks were
labeled as 6-P2 and 6-P4, respectively. The data-set was collected by a network sniffer tool based on libpcap
P. Velarde-Alvarado et al.
26
library used by tcpdump, [8]. All traces were cleaned to remove spurious data using plab, a platform for packet
capture and analysis, [9]. Traces were split into segments using tracesplit which is a tool that belongs to Libtrace,
[10]. The traffic-files in ASCII format suitable for MATLAB processing were created with ipsumdump, [11].
The flow generation was done with flow analyzer, a tool based on perl and developed by us.
4. Rate of Remnant Elements
We define a traffic trace χ of a duration tD seconds with a total of N packets, χ is divided into M non-overlapping
slots of
dD
t tM=
seconds each one. The i-th slot has Wi packets for
1, 2,,iM=
. In each i-slot, four fea-
tures are extracted that we associate with a value of r, namely r = 1 for source IP address, r = 2 for destination IP
address, r = 3 for source TCP port, and r = 4 for destination TCP port. Let S be a finite sequence of r = 1 values
or IP source addresses in a slot-i. This sequence with elements in an alphabet set A, is a function from
{ }
1, 2,3,,A
to A for some
0A
. The generated sequence S is denoted by (
123
,,,,
W
i
aaa a
), and the
length of S is Wi. The elements of S belong to an alphabet A with cardinality
nA=
. From A an ordered set
( )( )()( )
{ }
12
, ,,
o ooo
n
A aaa=
is created,
( )
o
A
contains the n-source IP addresses in decreasing order sorted by
frequency. With the associated frequencies of A, we define a probability mass function (pmf)
()
( )
1 ()
1
Pr ,0 rest
j
ro
i jj
X
f jn
j pa
=
=
≤≤
=
(3)
where
123 n
fff f
≥≥≥≥
. Ordered set
( )
o
A
is transferred to an iterative process P to create l subsets of
()
o
A
denoted as
()
,ok
A
,
1kl≤≤
. This family of l subsets is shown in Equat i ons (4)-(6) and holds
()()( )
{ }
, ,1okoko
k
AAa
+
=
( )
()( )( )( )( )
{ }
,1
123
,,,,
o oooo
on
A Aaaaa= =
(4)
()( )( )( )( )
{ }
,1 234
,,,,
o oooo
n
A aaaa=
(5)
()()( )( )()
{ }
,
12
,,, ...,
olo ooo
n
lll
A aaa a
++
=
(6)
When in a k-iteration, the relative uncertainty of a partial pmf reaches a threshold β, i.e .,
( )
,
r
i
RU Xk
β
>
,
we say that the iterative process P reached its latest iteration, and hence, k = l. An estimator of relative uncer-
tainty for a discrete random variable
r
i
X
in the k-iteration is defined in terms of its partial pmf as:
( )( )
( )
( )
( )
( )
( )
( )( )
22
22
log log
ˆ
,ˆlog log
oo
MAX
jj jjj
nn
rj
ijk jk
r
i
papaf f
HPX
RU Xknk nk
H
= =
= ==
−−
∑∑
(7)
Selecting a
1
β
, the resultant subset
( )
,ol
A
is closer to being uniformly distributed. Then, for a given β,
and a number l of iterations carried out, it is possible to calculate the remnant
r
i
R
for a subset
( )
,
ol
A
. Genera-
lizing this for an i-slot and a r-traffic feature we have the rate of remnant elements:
( )
( )
1
when all,1
for 1
o
jj
r
i
n
pn
n
Rnl l
a
= ≥
=
(8)
In other words,
r
i
R
is the cardinality of the subset
( )
,ol
A
. We found that this feature under normal condi-
tions presents regularities that allow creating behavioral traffic profiles. Ta bl e 1 summarizes the
r
i
R
behavior
with β = 0.95 for our data-set.
Through of mean, variance, the intensity factor (
2
σµ
), and maximum value we can define a threshold for
normal behavior of
r
i
R
. For instance, by averaging the means of
r
i
R
and its maximum values during benign
traffic, we can define an average threshold of 28.5 with a maximum of 114.8 units. We denoted these thresholds
for each r by
( )
( )
1
28.5;114.8
i
TR =
,
( )
( )
2
31.7;115.6
i
TR =
,
( )
( )
3
92.0;342.6
i
TR =
, and
( )
( )
4132.3;542.2
i
TR =
.
Figure 1 shows the patterns
1
i
R
and
2
i
R
for benign traffic in Trace 5 and their variation are inside of stan-
dard behavior mentioned above.
P. Velarde-Alvarado et al.
27
Table 1. Values of mean, variance, and intensity factor for the rate of remnants.
Trace
Mean
Variance
Intensity Factor
srcIP
dstIP
srcPrt
dstPrt
srcIP
dstIP
srcPrt
dstPrt
srcIP
dstIP
srcPrt
dstPrt
1
29.1
33.1
103.5
136.5
328.2
377.5
4502
8818
11.29
11.42
43.5
64.61
2
29.5
35.1
88.8
138.6
566.5
688.1
3830
12,918
19.23
19.62
43.16
93.24
3
31.1
33.0
96.2
141.7
497.9
595.2
4598
11,972
15.99
18.06
47.78
84.51
4
31.1
35.9
103.2
141.9
732.8
727.9
6209
18,868
23.59
20.26
60.16
133
5
21.5
21.5
69.0
102.88
277.8
351.1
3208
9191
12.94
16.33
46.5
89.3
6-P1 14.5 1 6.6 76.1 74.5 145.1 166.8 4916 4958 10.0 1 0.05 64.6 66.53
6-P2
27.9
19,618
3107
1005
3464
6.2e07
1.7e06
4.3e04
12.42
3209
549.7
43.24
6-P4
3149
5214
3045
2243
97,058
1.2e06
2.4e05
2.4e05
310.9
237.5
90.14
111.24
Figure 1. Rate of remnants for (a) srcIP and (b) dstIP for standard traffic in Trace 5
in typical work hours. (td = 60 s, and β = 0.95).
An anomaly related with a port scan attack directed to the proxy server was possible to detect it since the first
slot that appeared (i.e . i = 2, 3) in trace 6-P1. The attack was carried out across a large number of TCP packets
with source addresses supplanted. The growth of
1
i
R
is possible to observe in Figure 2(a) and is far away from
( )
( )
1
28.5;114.8
i
TR =
.
r
i
R
patterns during worms attacks are presented in Figure 2(b) and Figure 2(c). There is a significant in-
crease in the amount of remnants for r = 3 during the Blaster and Sasser Worm propagation. It is important to
note that the anomaly detection is done from the earliest slots that the intrusion appears.
5. Three-Dimensional Spaces of Entropy
The construction of a space of entropy is carried out at flow level, and through these spaces is possible to create
profiles of behavior for the traffic of a network. Four three-dimensional spaces are generated for each one of the
features extracted from the flows. We define a traffic trace χ of a duration tD seconds that is divided into M
non-overlapping slots of
dD
t tM=
seconds each one. In an i-slot
i
K
flows are generated with a given in-
ter-flow gap (IFG). All the flows for each slot are stored on indexed text files. The traffic features used in this
technique are the flow’s fields and are identified as r = 1 for source IP address, r = 2 for destination IP address, r
= 3 for source TCP port, and r = 4 for destination TCP port.
Once that flows in an i-slot are generated, they should be clustering according to a r-flow feature. For instance,
with a cluster key or pivot r = 1 the flows are aggregated into those flows that share the same source IP address.
The number of clusters depends on
1r
i=
, where
1r
i=
is the alphabet set of all source IP addresses seen in
the slot i. Thus, each cluster has flows with the same source IP address, but the rest of fields or features (r = 2, 3,
4) have freedom of variation. In this context, we can estimate the entropy for each r = 2, 3, 4 of each cluster. If
we join these three values and associate them with a coordinate, we have a cloud of data points in a 3-D Eucli-
dean space, where the axis are
( )
srcPrt dstPrt dstIP
ˆ ˆ ˆ
,,
HHH
for r = 1. Finally, the
1r
i=
points in the slot i, that is,
( )
srcPrtdstPrtdstIP 1
ˆ ˆ ˆ
,,HHH
,
( )
srcPrtdstPrtdstIP 2
ˆ ˆ ˆ
,,HHH
,…
( )
srcPrt dstPrt dstIP1
ˆ ˆ ˆ
,,
r
i
HHH
=
are plotted in the 3D-space.
When we apply this procedure to the rest of cluster keys and all slots, we get four spaces of entropy.
Figure 3 shows the spaces of entropy for three. First, in Figur e 3(a), we see the point pattern for Trace-1,
which corresponds to normal traffic conditions being typical for Traces 1 - 5. Figure 3(b) and Figure 3(c) show
P. Velarde-Alvarado et al.
28
a marked difference with regard to benign traffic, since the data points move away from positions typically ob-
served.
The characterization of the spaces of entropy represented by the vector
3
r
X
for a cluster key r was rea-
lized applying initially a technique of multivariable analysis, the Principal Component Analysis. PCA provides a
roadmap for how reduce a complex data-set to a lower dimension
,3
rd
d∈≤Z
to reveal the sometimes
hidden, simplified structure that often underlie it. PCA is mathematically defined as an orthogonal linear trans-
formation that transforms the data to a new coordinate system such that the greatest variance by any projection
of the data comes to lie on the first coordinate (called the first principal component, PCA 1), the second greatest
variance on the second coordinate, and so on, [12].
To obtain accurate analysis estimation, tools like Kernel Density Estimation (KDE) may be useful. This anal-
ysis was applied on data-points at the slot level on the PCA 1 by using a Gaussian kernel of 200 points, with a
bandwidth of
15
1.06hJ
σ
=
, given by the Silverman’s criteria, where J is the number of observations and σ is
the standard deviation of the set of observations. KDE shows that the traffic slots have Gaussian bimodality be-
havior in its pdf. Each mode was labeled as principal mode and far mode respectively, as shown in Figure 4.
Figure 4(a) shows that the densities of PCA 1 in benign traffic traces present a clear regularity in their shapes.
This pattern of behavior changes drastically for anomalous traffic traces, see Figure 4(b) and F ig ure 4(c).
Figure 2. Rate of remnant for: (a) srcIP, under port scan attack using
spoofed IP addresses which is observable in slots i = 2 and i = 3 on Trace
6-P1 (b) srcPrt during Blaster Worm on Trace 6-P2. (c) srcPrt during
Sasser Worm on Trace 6-P4. All cases td = 60 s and β = 0.95.
Figure 3. Spaces of entropy for three srcIP cluster keys: (a) Traffic Trace-1 (benign
traffic) in typical working hours. (b) Anomalous traffic, Trace 6-P2 (Blaster Worm) dur-
ing 38 min period. (c) Anomalous traffic, Trace 6-P4 (Sasser Worm) during 38 min pe-
riod.
P. Velarde-Alvarado et al.
29
This procedure was applied to i-slot level for a given r-, feature. The transformed data points are denoted by
r
i
Z
. Densities of
r
i
Z
under normal conditions presents bimodality, the second mode (far mode) is situated on
the left side of the main mode, at an average of minus five units. On the other hand, the empirically calculated
average variance of
r
i
Z
for benign traffic is three units.
During analysis of the trace P1 was discovered that slots two and three had a variance of 0.32 and 0.74 units,
respectively. These values are anomalous with regard to the threshold of three units. Thus, an anomaly (Figure
5(a)) was early detected in Trace 6-P1. The analysis of these traffic slots with Wireshark, was able to identify a
port scan attack.
In another case, Figure 5(b) shows the first three anomalous slots, where far mode moves away from the typ-
ical value. The far mode displace to 9, 11 and 13 units, representing an anomaly with regard to the threshold
of 5. This anomalous behavior in the far mode was caused by the spread of the Blaster worm.
Similarly, the Sasser worm attack caused anomalous values in the variance of the transformed data-points,
these anomalous values were close to 0.5 units.
Figure 4. Kernel density estimation for PCA1 for: (a) Traces 1-5 (Benign Traffic), (b)
Trace 6-P2 (Blaster attack), and (c) Trace 6-P4 (Sasser attack).
Figure 5. Anomaly Detection on PCA1 (a) Anomaly caused by port scan detected in first slot in
Trace 6-P1, (b) Anomaly caused by the Blaster worm that deviate the far mode out of the typical
position.
P. Velarde-Alvarado et al.
30
In all three cases studied, the attack leads to changes in traffic patterns that are detected from the first slots
containing the malicious traffic.
6. Conclusions and Future Work
The generation of behavioral profiles based on entropy offers an effective support for the Intrusion Detection
Systems. The results of this study in a campus network show that under the Blaster and Sasser worm attacks as
well as the port scanning, an A-NIDS employing profiles generated by the Rate of remnant elements or Three-
Dimensional Spaces of Entropy methodologies can provide a rapid response detecting deviations from an estab-
lished baseline in the early slots that the attack appears.
As a future work, we will investigate the effect on variation of the slot duration td, smaller values of slot dura-
tion represent faster response times, but also represent a smaller data set where to obtain representative traffic
features, finding the optimum value is an important objective design.
Acknowledgements
This research was supported by the National Counsel of Science and Technology (CONACyT) of Mexico as
part of the project SEP-2011 Ciencia Basica No. 167859.
References
[1] Fung, C. and Boutaba, R. (2013) Intrusion Detection Networks: A Key to Collaborative Security. Auerbach Publica-
tions.
[2] Kruegel, C., Valeur, F. and Vigna, G. (2005) Intrusion Detection and Correlation. Advances in Information Security.
Springer.
[3] Xu, K., Zhang, Z. and Bhattacharyya, S. (2005) Profiling Internet Backbone Traffic: Behavior Models and Applica-
tions. SIGCOMM, 2005, 22-26 .
[4] Nucci, A. and Bannerman, S. (2007) Controlled Chaos. IEEE Spectrum, 44, 42-48.
http://dx.doi.org/10.1109/MSPEC.2007.4390022
[5] V elarde-Alvarado, P., Vargas-Rosales, C., Torres-Roman, D. and Munoz-Rodriguez, D. (2008 ) Entropy Based Analy-
sis of Worm Attacks in a Local Network. Research in Computing Science, 34, 225-235.
[6] Copley, D., Has sell , R., Jack, B., Lynn, K., Permeh, R. and Soeder, D. (2003) ANALYSIS: Blaster Worm. eEye Digi-
tal Security Research. http://research.eeye.com/html/advisories/published/AL20030811.html
[7] Ukai, Y. and Soeder, D. (2004) ANALYSIS: Sasser. eEye Digital Security Research.
http://research.eeye.com/html/advisories/published/AD20040501.html
[8] Jacobson, V., Leres, C. and McCanne, S. Tcpdump/libpcap. http://www.tcpdump.org/
[9] A. Peppo, plab. Tool for Traffic Traces. http://www.grid.unina.it/software/Plab/
[10] Trac Project. Libtrace. h ttp://www.wand.net.nz/trac/libtrace
[11] E. Kohler, ipsumdump. Traffic tool. http://www.cs.ucla.edu/~kohler/ipsumdump
[12] Jolliffe, I.T. (2002) Principal Component Analysis, Series: Springer Series in Statistics. 2nd Edition, Springer, XXIX,
487 pp. 28.