Information Theory and Data-Mining Techniques for Network Traffic Profiling for Intrusion Detection

doi:10.4236/jcc.2014.211003

Paper Menu >>

Journal Menu >>

Journal of Computer and Communications, 2014, 2, 24-30

Published Online September 2014 in SciRes. http://www.scirp.org/journal/jcc

http://dx.doi.org/10.4236/jcc.2014.211003

How to cite this paper: Velarde-Alvarado, P., Martinez-Pelaez, R., Ruiz-Ibarra, J. and Morales-Rocha, V. (2014) Information

Theory and Data-Mining Techniques for Network Traffic Profiling for Intrusion Detection. Journal of Computer and Com-

munications, 2, 24-30. http://dx.doi.org/10.4236/jcc.2014.211003

Information Theory and Data-Mining

Techniques for Network Traffic Profiling for

Intrusion Detection

Pablo Velarde-Alvarado1, Rafael Martinez-Pelaez1, Joel Ruiz-Ibarra1,

Victor Morales-Rocha2

1Autonomous University of Nayarit, Area of Basic Sciences and Engineering, Ciudad de la Cultura “Amado

Nervo ”, Tepic, Nayarit, Mexico

2Department of Electrical and Computer Engineering, Autonomous University of Ciudad Juarez, Ciudad Juarez,

Mexico

Email: pvelarde@uan.edu.mx, rafael_uvm @ho tmail .co m, joelruizibarra@gmail.com, victor.morales@uacj.mx

Received June 2014

Abstract

In this paper, information theory and data mining techniques to extract knowledge of network

traffic behavior for packet-level and flow-level are proposed, which can be applied for traffic pro-

filing in intrusion detection systems. The empirical analysis of our profiles through the rate of re-

maining features at the packet-level, as well as the three-dimensional spaces of entropy at the

flow-level, provide a fast detection of intrusions caused by port scanning and worm attacks.

Keywords

Intrusion Detection, Traffic Profiling, Entropy, and Network Worms

1. Introduction

Network Intrusion Detection Systems [1], or NIDSs, have become an important component to detecting attacks

against information systems. However, they offer only a limited defense. For instance, a signature-based NIDS

(S-NIDS) monitors packets on the network and compares them against a database of signatures or attributes

from known malicious threats. A weakness of this type of NIDS is that there will always be a lag between a new

threat being discovered and the signature for detecting that threat being applied to the NIDS. During that lag

time the NIDS would be unable to detect the new threat.

A second type of NIDS is the anomaly-based NIDS (A-NIDS), which monitors network traffic and compares

it against an established baseline, [2]. The baseline helps to identify what is normal behavior for that network. If

a deviation from the established baseline reaches a specified threshold, an alarm is generated. Therefore, ano-

maly detection techniques have the potential to detect new and unforeseen types of attacks. Traditional anomaly

based IDSs, employ algorithms that focus primarily on changes in the traffic volume at specific points on the

network, and promptly alert the operator of a sudden increase. However, such systems can be evaded through

sophisticated attacks that focus on compromising significant hosts, causing them a collapse of memory or CPU

P. Velarde-Alvarado et al.

and maintaining a level of traffic within the normal threshold.

Recently, a new generation of anomaly based IDSs have emerged, which focus on gaining knowledge in the

structure and composition of the traffic and not just its volume. Such systems are based on the fact that the mali-

cious activities affect the natural randomness of the network, e.g., they change significantly the entropy of the

network [4] [5]. The composition of traffic is related to its probability distribution, and can be characterized by

its entropy; a malicious activity changes that composition and the shape of the distribution and therefore its en-

tropy. By means of entropy measures to a set of traffic features, we can establish the profiles of normal activity

of the network and determine intrusions to the system.

This paper presents an analysis at the packet and the flow level on traces obtained through measurements

conducted in a campus network under real attacks of the Blaster [6] and Sasser [7] worms, as well as a port scan

attack to the proxy server of that network. The captured traces during a week of normal operation, helped to de-

velop a profile of normal behavior that is useful to be compared to attack conditions.

The paper is organized as follows. In Section 2, we present our profiling approach and the context of this pa-

per. Section 3 describes the test environment; Sections 4 and 5 explain the methodology: the rate of remnant

items and spaces of entropy and results. Section 6 gives concluding remarks.

2. Profiling Approach

We propose two methods for the creation of profiles based on entropy. The analysis applies primarily to the

packet-level for the method of the rate of remnant elements and to the flow-level for the spaces of entropy. In-

itially, there is a set of captured traffic traces corresponding to five days in typical working hours in an academic

LAN. The traces have been inspected to be considered free of anomalies, so they may serve as a baseline.

We use traffic features to build the profiles. A traffic feature is a field in a header of a packet (at the packet

level) or a field in a five-tuple (at the flow level), respectively. Four fields will be used: source address (srcI P),

destination address (d stIP), source port (srcPrt), and destination port (dstPrt).

After the feature extraction, an essential part in the builder profile block is the measurement of entropy. For a

discrete set of symbols {

123

,,,,

aaa a

} with probabilities

1, 2,,in=

, the entropy of the discrete

distribution of a random variable

associated, is a measure of randomness in the set of symbols and represented

()( )

1log, 0log

MAX

HXppHX Hn

= ≤≤=

∑

(1)

The relative uncertainty (RU) provides a measure of variety or uniformity that is independent of the sample

size. For a random variable XRU is defined as, [3],

( )( )

,0 1

MAX

RURU X

= ≤≤

(2)

( )

1RU X≈

means that observed values of X are closer to being uniformly distributed, thus less distinguishable

from each other, whereas

( )

0RU X≈

indicates that the distribution is highly concentrated.

3. Experimental Platform

The worm propagation and port scanning were carried out on academic LAN which is subdivided into four sub-

nets (192.168.1.0, 192.168.2.0, 192.168.4.0, and 10.253.253.0). There are 100 hosts running Windows XP SP2

mainly. One router (192.168.1.1) connects the subnets with 10 Ethernet switches and 18 IEEE 802.11b/g wire-

less access points. The data rate of the core network is 100Mbps. A sector of the network is left vulnerable for

worm propagation, with ten no Windows XP service pack installed at all (192.168.1.104 - 113). In the experi-

ments Blaster and Sasser worms where released in the vulnerable sector. The scanning port attack was observed

on the proxy server (192.168.4.253).

Data-Set and Tools

The benign traffic traces in typical work hours for a period of five days were labeled with a number from one to

five. The anomalous traffic for port scanning attack was labeled as 6-P1. Blaster and Sasser worm attacks were

labeled as 6-P2 and 6-P4, respectively. The data-set was collected by a network sniffer tool based on libpcap

P. Velarde-Alvarado et al.

library used by tcpdump, [8]. All traces were cleaned to remove spurious data using plab, a platform for packet

capture and analysis, [9]. Traces were split into segments using tracesplit which is a tool that belongs to Libtrace,

[10]. The traffic-files in ASCII format suitable for MATLAB processing were created with ipsumdump, [11].

The flow generation was done with flow analyzer, a tool based on perl and developed by us.

4. Rate of Remnant Elements

We define a traffic trace χ of a duration tD seconds with a total of N packets, χ is divided into M non-overlapping

slots of

t tM=

seconds each one. The i-th slot has Wi packets for

1, 2,,iM=

. In each i-slot, four fea-

tures are extracted that we associate with a value of r, namely r = 1 for source IP address, r = 2 for destination IP

address, r = 3 for source TCP port, and r = 4 for destination TCP port. Let S be a finite sequence of r = 1 values

or IP source addresses in a slot-i. This sequence with elements in an alphabet set A, is a function from

{ }

1, 2,3,,A

to A for some

0A≥

. The generated sequence S is denoted by (

123

,,,,

aaa a

), and the

length of S is Wi. The elements of S belong to an alphabet A with cardinality

nA=

. From A an ordered set

( )( )()( )

{ }

, ,,

o ooo

A aaa=

is created,

( )

contains the n-source IP addresses in decreasing order sorted by

frequency. With the associated frequencies of A, we define a probability mass function (pmf)

()

( )

1 ()

Pr ,0 rest

i jj

f jn

j pa

≤≤







(3)

where

123 n

fff f

≥≥≥≥

. Ordered set

( )

is transferred to an iterative process P to create l subsets of

()

denoted as

()

,ok

1kl≤≤

. This family of l subsets is shown in Equat i ons (4)-(6) and holds

()()( )

{ }

, ,1okoko

AAa

( )

()( )( )( )( )

{ }

123

,,,,

o oooo

A Aaaaa= =

(4)

()( )( )( )( )

{ }

,1 234

,,,,

o oooo

A aaaa=

(5)

()()( )( )()

{ }

,,, ...,

olo ooo

lll

A aaa a

(6)

When in a k-iteration, the relative uncertainty of a partial pmf reaches a threshold β, i.e .,

( )

RU Xk

we say that the iterative process P reached its latest iteration, and hence, k = l. An estimator of relative uncer-

tainty for a discrete random variable

in the k-iteration is defined in terms of its partial pmf as:

( )( )

( )

( )( )

log log

,ˆlog log

MAX

jj jjj

ijk jk

papaf f

HPX

RU Xknk nk

= =

= ==

−−

∑∑

(7)

Selecting a

≈

, the resultant subset

( )

,ol

is closer to being uniformly distributed. Then, for a given β,

and a number l of iterations carried out, it is possible to calculate the remnant

for a subset

( )

. Genera-

lizing this for an i-slot and a r-traffic feature we have the rate of remnant elements:

( )

when all,1

for 1

Rnl l

= ≥

=−≥











(8)

In other words,

is the cardinality of the subset

( )

,ol

. We found that this feature under normal condi-

tions presents regularities that allow creating behavioral traffic profiles. Ta bl e 1 summarizes the

behavior

with β = 0.95 for our data-set.

Through of mean, variance, the intensity factor (

σµ

), and maximum value we can define a threshold for

normal behavior of

. For instance, by averaging the means of

and its maximum values during benign

traffic, we can define an average threshold of 28.5 with a maximum of 114.8 units. We denoted these thresholds

for each r by

( )

28.5;114.8

TR =

( )

31.7;115.6

TR =

( )

92.0;342.6

TR =

, and

( )

4132.3;542.2

TR =

Figure 1 shows the patterns

and

for benign traffic in Trace 5 and their variation are inside of stan-

dard behavior mentioned above.

P. Velarde-Alvarado et al.

Table 1. Values of mean, variance, and intensity factor for the rate of remnants.

Trace

Mean

Variance

Intensity Factor

srcIP

dstIP

srcPrt

dstPrt

srcIP

dstIP

srcPrt

dstPrt

srcIP

dstIP

srcPrt

dstPrt

29.1

33.1

103.5

136.5

328.2

377.5

4502

8818

11.29

11.42

43.5

64.61

29.5

35.1

88.8

138.6

566.5

688.1

3830

12,918

19.23

19.62

43.16

93.24

31.1

33.0

96.2

141.7

497.9

595.2

4598

11,972

15.99

18.06

47.78

84.51

31.1

35.9

103.2

141.9

732.8

727.9

6209

18,868

23.59

20.26

60.16

133

21.5

69.0

102.88

277.8

351.1

3208

9191

12.94

16.33

46.5

89.3

6-P1 14.5 1 6.6 76.1 74.5 145.1 166.8 4916 4958 10.0 1 0.05 64.6 66.53

6-P2

27.9

19,618

3107

1005

3464

6.2e07

1.7e06

4.3e04

12.42

3209

549.7

43.24

6-P4

3149

5214

3045

2243

97,058

1.2e06

2.4e05

310.9

237.5

90.14

111.24

Figure 1. Rate of remnants for (a) srcIP and (b) dstIP for standard traffic in Trace 5

in typical work hours. (td = 60 s, and β = 0.95).

An anomaly related with a port scan attack directed to the proxy server was possible to detect it since the first

slot that appeared (i.e . i = 2, 3) in trace 6-P1. The attack was carried out across a large number of TCP packets

with source addresses supplanted. The growth of

is possible to observe in Figure 2(a) and is far away from

( )

28.5;114.8

TR =

patterns during worms attacks are presented in Figure 2(b) and Figure 2(c). There is a significant in-

crease in the amount of remnants for r = 3 during the Blaster and Sasser Worm propagation. It is important to

note that the anomaly detection is done from the earliest slots that the intrusion appears.

5. Three-Dimensional Spaces of Entropy

The construction of a space of entropy is carried out at flow level, and through these spaces is possible to create

profiles of behavior for the traffic of a network. Four three-dimensional spaces are generated for each one of the

features extracted from the flows. We define a traffic trace χ of a duration tD seconds that is divided into M

non-overlapping slots of

t tM=

seconds each one. In an i-slot

flows are generated with a given in-

ter-flow gap (IFG). All the flows for each slot are stored on indexed text files. The traffic features used in this

technique are the flow’s fields and are identified as r = 1 for source IP address, r = 2 for destination IP address, r

= 3 for source TCP port, and r = 4 for destination TCP port.

Once that flows in an i-slot are generated, they should be clustering according to a r-flow feature. For instance,

with a cluster key or pivot r = 1 the flows are aggregated into those flows that share the same source IP address.

The number of clusters depends on



, where



is the alphabet set of all source IP addresses seen in

the slot i. Thus, each cluster has flows with the same source IP address, but the rest of fields or features (r = 2, 3,

4) have freedom of variation. In this context, we can estimate the entropy for each r = 2, 3, 4 of each cluster. If

we join these three values and associate them with a coordinate, we have a cloud of data points in a 3-D Eucli-

dean space, where the axis are

( )

srcPrt dstPrt dstIP

ˆ ˆ ˆ

HHH

for r = 1. Finally, the



points in the slot i, that is,

( )

srcPrtdstPrtdstIP 1

ˆ ˆ ˆ

,,HHH

( )

srcPrtdstPrtdstIP 2

ˆ ˆ ˆ

,,HHH

,…

( )

srcPrt dstPrt dstIP1

ˆ ˆ ˆ

HHH



are plotted in the 3D-space.

When we apply this procedure to the rest of cluster keys and all slots, we get four spaces of entropy.

Figure 3 shows the spaces of entropy for three. First, in Figur e 3(a), we see the point pattern for Trace-1,

which corresponds to normal traffic conditions being typical for Traces 1 - 5. Figure 3(b) and Figure 3(c) show

P. Velarde-Alvarado et al.

a marked difference with regard to benign traffic, since the data points move away from positions typically ob-

served.

The characterization of the spaces of entropy represented by the vector

∈X

for a cluster key r was rea-

lized applying initially a technique of multivariable analysis, the Principal Component Analysis. PCA provides a

roadmap for how reduce a complex data-set to a lower dimension

d∈≤Z

to reveal the sometimes

hidden, simplified structure that often underlie it. PCA is mathematically defined as an orthogonal linear trans-

formation that transforms the data to a new coordinate system such that the greatest variance by any projection

of the data comes to lie on the first coordinate (called the first principal component, PCA 1), the second greatest

variance on the second coordinate, and so on, [12].

To obtain accurate analysis estimation, tools like Kernel Density Estimation (KDE) may be useful. This anal-

ysis was applied on data-points at the slot level on the PCA 1 by using a Gaussian kernel of 200 points, with a

bandwidth of

1.06hJ

, given by the Silverman’s criteria, where J is the number of observations and σ is

the standard deviation of the set of observations. KDE shows that the traffic slots have Gaussian bimodality be-

havior in its pdf. Each mode was labeled as principal mode and far mode respectively, as shown in Figure 4.

Figure 4(a) shows that the densities of PCA 1 in benign traffic traces present a clear regularity in their shapes.

This pattern of behavior changes drastically for anomalous traffic traces, see Figure 4(b) and F ig ure 4(c).

Figure 2. Rate of remnant for: (a) srcIP, under port scan attack using

spoofed IP addresses which is observable in slots i = 2 and i = 3 on Trace

6-P1 (b) srcPrt during Blaster Worm on Trace 6-P2. (c) srcPrt during

Sasser Worm on Trace 6-P4. All cases td = 60 s and β = 0.95.

Figure 3. Spaces of entropy for three srcIP cluster keys: (a) Traffic Trace-1 (benign

traffic) in typical working hours. (b) Anomalous traffic, Trace 6-P2 (Blaster Worm) dur-

ing 38 min period. (c) Anomalous traffic, Trace 6-P4 (Sasser Worm) during 38 min pe-

riod.

P. Velarde-Alvarado et al.

This procedure was applied to i-slot level for a given r-, feature. The transformed data points are denoted by

. Densities of

under normal conditions presents bimodality, the second mode (far mode) is situated on

the left side of the main mode, at an average of minus five units. On the other hand, the empirically calculated

average variance of

for benign traffic is three units.

During analysis of the trace P1 was discovered that slots two and three had a variance of 0.32 and 0.74 units,

respectively. These values are anomalous with regard to the threshold of three units. Thus, an anomaly (Figure

5(a)) was early detected in Trace 6-P1. The analysis of these traffic slots with Wireshark, was able to identify a

port scan attack.

In another case, Figure 5(b) shows the first three anomalous slots, where far mode moves away from the typ-

ical value. The far mode displace to −9, −11 and −13 units, representing an anomaly with regard to the threshold

of −5. This anomalous behavior in the far mode was caused by the spread of the Blaster worm.

Similarly, the Sasser worm attack caused anomalous values in the variance of the transformed data-points,

these anomalous values were close to 0.5 units.

Figure 4. Kernel density estimation for PCA1 for: (a) Traces 1-5 (Benign Traffic), (b)

Trace 6-P2 (Blaster attack), and (c) Trace 6-P4 (Sasser attack).

Figure 5. Anomaly Detection on PCA1 (a) Anomaly caused by port scan detected in first slot in

Trace 6-P1, (b) Anomaly caused by the Blaster worm that deviate the far mode out of the typical

position.

P. Velarde-Alvarado et al.

In all three cases studied, the attack leads to changes in traffic patterns that are detected from the first slots

containing the malicious traffic.

6. Conclusions and Future Work

The generation of behavioral profiles based on entropy offers an effective support for the Intrusion Detection

Systems. The results of this study in a campus network show that under the Blaster and Sasser worm attacks as

well as the port scanning, an A-NIDS employing profiles generated by the Rate of remnant elements or Three-

Dimensional Spaces of Entropy methodologies can provide a rapid response detecting deviations from an estab-

lished baseline in the early slots that the attack appears.

As a future work, we will investigate the effect on variation of the slot duration td, smaller values of slot dura-

tion represent faster response times, but also represent a smaller data set where to obtain representative traffic

features, finding the optimum value is an important objective design.

Acknowledgements

This research was supported by the National Counsel of Science and Technology (CONACyT) of Mexico as

part of the project SEP-2011 Ciencia Basica No. 167859.

References

[1] Fung, C. and Boutaba, R. (2013) Intrusion Detection Networks: A Key to Collaborative Security. Auerbach Publica-

tions.

[2] Kruegel, C., Valeur, F. and Vigna, G. (2005) Intrusion Detection and Correlation. Advances in Information Security.

Springer.

[3] Xu, K., Zhang, Z. and Bhattacharyya, S. (2005) Profiling Internet Backbone Traffic: Behavior Models and Applica-

tions. SIGCOMM, 2005, 22-26 .

[4] Nucci, A. and Bannerman, S. (2007) Controlled Chaos. IEEE Spectrum, 44, 42-48.

http://dx.doi.org/10.1109/MSPEC.2007.4390022

[5] V elarde-Alvarado, P., Vargas-Rosales, C., Torres-Roman, D. and Munoz-Rodriguez, D. (2008 ) Entropy Based Analy-

sis of Worm Attacks in a Local Network. Research in Computing Science, 34, 225-235.

[6] Copley, D., Has sell , R., Jack, B., Lynn, K., Permeh, R. and Soeder, D. (2003) ANALYSIS: Blaster Worm. eEye Digi-

tal Security Research. http://research.eeye.com/html/advisories/published/AL20030811.html

[7] Ukai, Y. and Soeder, D. (2004) ANALYSIS: Sasser. eEye Digital Security Research.

http://research.eeye.com/html/advisories/published/AD20040501.html

[8] Jacobson, V., Leres, C. and McCanne, S. Tcpdump/libpcap. http://www.tcpdump.org/

[9] A. Peppo, plab. Tool for Traffic Traces. http://www.grid.unina.it/software/Plab/

[10] Trac Project. Libtrace. h ttp://www.wand.net.nz/trac/libtrace

[11] E. Kohler, ipsumdump. Traffic tool. http://www.cs.ucla.edu/~kohler/ipsumdump

[12] Jolliffe, I.T. (2002) Principal Component Analysis, Series: Springer Series in Statistics. 2nd Edition, Springer, XXIX,

487 pp. 28.