Journal of Information Security, 2011, 2, 158-168
doi:10.4236/jis.2011.24016 Published Online October 2011 (http://www.SciRP.org/journal/jis)
Copyright © 2011 SciRes. JIS
Anomalous Network Packet Detection Using
Data Stream Mining
Zachary Miller, William Deitrick, Wei Hu*
Department of Computer Science, Houghton College, Houghton, USA
E-mail: *wei.hu@houghton.edu
Received July 4, 2011; revised July 29, 2011; accepted August 8, 2011
Abstract
In recent years, significant research has been devoted to the development of Intrusion Detection Systems
(IDS) able to detect anomalous computer network traffic indicative of malicious activity. While signature-
based IDS have proven effective in discovering known attacks, anomaly-based IDS hold the even greater
promise of being able to automatically detect previously undocumented threats. Traditional IDS are gener-
ally trained in batch mode, and therefore cannot adapt to evolving network data streams in real time. To re-
solve this limitation, data stream mining techniques can be utilized to create a new type of IDS able to dy-
namically model a stream of network traffic. In this paper, we present two methods for anomalous network
packet detection based on the data stream mining paradigm. The first of these is an adapted version of the
DenStream algorithm for stream clustering specifically tailored to evaluate network traffic. In this algorithm,
individual packets are treated as points and are flagged as normal or abnormal based on their belonging to
either normal or outlier clusters. The second algorithm utilizes a histogram to create a model of the evolving
network traffic to which incoming traffic can be compared using Pearson correlation. Both of these algo-
rithms were tested using the first week of data from the DARPA’99 dataset with Generic HTTP, Shell-code
and Polymorphic attacks inserted. We were able to achieve reasonably high detection rates with moderately
low false positive percentages for different types of attacks, though detection rates varied between the two
algorithms. Overall, the histogram-based detection algorithm achieved slightly superior results, but required
more parameters than the clustering-based algorithm. As a result of its fewer parameter requirements, the
clustering approach can be more easily generalized to different types of network traffic streams.
Keywords: Anomaly Detection, Clustering, Data Stream Mining, Intrusion Detection System, Histogram,
Payload
1. Introduction
Since the 1990’s, internet usage has become an integral
part of our daily lives. As a result, computer networks
have experienced an increased number of sophisticated
malware attacks. Whereas attackers previously attempted
to gain access to restricted resources to demonstrate their
skill, a new wave of internet-based attacks has shifted the
focus primarily towards criminal motives. Due to the
availability of software tools designed to exploit vulner-
abilities, attackers can create viruses with greater struc-
tural complexity and damaging capability using less so-
phisticated skills. The security challenges resulting from
an increasing number of devices connected to the inter-
net has prompted a significant amount of research de-
voted to network security.
1.1. Intrusion Detection Systems
One notable topic of network security research is the
development of Intrusion Detection Systems (IDS),
which attempt to detect threats to a network or host
through signature-based or anomaly-based methods. To
detect intrusions, signature-based IDS generate “signa-
tures” based on characteristics of previous known attacks.
This allows the systems to focus on detecting attacks
regardless of ordinary network traffic. Signature-based
detection is the most common form of intrusion detection
because it is simple to implement once a set of signatures
has been created. Although this approach is effective in
159
Z. MILLER ET AL.
finding known threats to a network, it is unable to iden-
tify new threats until a new signature is made. To gener-
ate an accurate signature, a human expert is generally
needed because this cannot easily be done automatically.
Since the detection of new threats in a signature-based
system is impossible without the aid of a new signature,
an alternative method has been proposed.
In contrast to the signature-based approach, anomaly-
based IDS adaptively detect new attacks by first gener-
ating a “normal” pattern of network traffic. These sys-
tems then find anomalies by comparing incoming pack-
ets with the “normal” model. Anything that is considered
statistically deviant is classified as anomalous. This al-
lows for the systems to automatically detect new attacks
though risking possible misclassification of normal be-
havior(false positive).In addition to the potential for false
positives, anomaly-based systems also fall prey to
“mimicry attacks”, which attempt to evade the IDS by
imitating normal network traffic. One such attack is
known as a Polymorphic Blending Attack (PBA), in
which the attacker uses byte padding and substitution to
avoid detection [1]. Recent research has focused on in-
creasing the efficiency, robustness, and detection rates of
these systems while lowering their often high false-posi-
tive rates.
One of the first well-developed anomaly-based sys-
tems is NIDES [2], which builds a model of normal be-
havior by monitoring the four-tuple header of packets.
The four-tuple contains the source and destination IP
addresses and port numbers of packet headers [3]. An-
other system proposed by Mahoney et al. [4] was com-
prised of two different programs, PHAD and ALAD.
Whereas PHAD monitors the data contained in the
header fields of individual packets, ALAD looks at dis-
tinct TCP connections consisting of multiple packets [3,
4]. To detect anomalies, PHAD and ALAD use port
numbers, TCP flags, and keywords found in the payload.
Yet another approach, known as NETAD [5], monitors
the first 48 bytes of each IP packet header and creates
different models based on each individual network pro-
tocol. Then, using the information recovered from the
packet’s header, NETAD creates different models each
corresponding to a particular network protocol [6].
Two recently developed network anomaly-based in-
trusion detection systems are PAYL and McPAD [6,7].
Both used n-grams, sequences of n consecutive bytes in a
packet’s payload, as features to represent packets
To perform anomaly detection, PAYL utilizes 1-grams.
This system first generates a histogram for normal traffic,
and then a new histogram for each packet’s payload. The
two histograms are compared using the simplified Ma-
halanobis distance. If the distance is above a certain
threshold, the new packet is flagged as anomalous [7].
Despite this approach’s effectiveness, it suffers from a
high false positive rate. To combat this, an extension of
PAYL was proposed to use n-grams, creating a more
precise detection model [8].
McPAD further develops the effectiveness of the n-
gram version of PAYL by using 2-nu-grams, sequences
of two bytes separated by a gap of size nu. The 2-gram
contains the correlation between two bytes, a feature that
1-gram lacks. By combining the 2-gram with nu, McPAD
is able to analyze structural information from higher n-
grams while keeping the number of features the same as
a 2-gram.By varying the value of nu, McPAD builds
multiple one-class support vector machine (SVM) classi-
fiers to detect anomalies as an ensemble. These classifi-
ers are first trained on anomaly free data then tested with
mixed normal and abnormal packets [6]. Using this ap-
proach, McPAD has successfully detected multiple virus
types while maintaining a low false positive rate.
1.2. Stream Data Mining
Although PAYL and McPAD have been able to achieve
desirable results, they are not designed to deal with the
gradual or abrupt change in the data flows. Also because
of the Internet’s high speed, systems such as PAYL and
McPAD can not efficiently store and evaluate the large
amount of traffic generated in real-time. To counter these
issues, we propose the application of data stream mining
techniques to anomaly detection.
Stream mining differs from the traditional batch set-
ting in a number of ways. First and foremost, because
data streams are of extremely large or even infinite size,
individual objects within the stream may only beana-
lyzed once—not repeatedly as is possible in batch mode
[9]. The continuous nature of data streams also places
significant time constraints on stream mining solutions.
For a stream mining approach to be practical and effec-
tive, it must be able to process incoming information as
quickly as it arrives [10].
One of the salient features of any data stream mining
algorithm is the ability to detect fluctuations within a
continuous stream of data over an unknown length of
time. This dynamic tendency of streaming data is called
“concept drift” when a change occurs gradually, or “con-
cept shift” when it occurs more quickly [10]. To deal with
this characteristic of streaming data, many stream mining
algorithms employ a window of time intervals to tempo-
rarily hold the most recent data points in a stream [10,11].
The three types of windows typically implemented are
landmark window, sliding window and damped window
[11].
Copyright © 2011 SciRes. JIS
Z. MILLER ET AL.
160
2. Materials and Methods
2.1. Data
Two publicly available datasets were used to evaluate the
anomaly detection algorithms proposed in this study.
These were the DARPA’99 intrusion detection evalua-
tion dataset (http://www.ll.mit.edu/mission/communica-
tions/ist/corpora/ideval/data/1999data.html), and the at-
tack dataset provided by [6].
The DARPA’99 dataset was used to provide a sam-
pling of normal network traffic. This dataset simulates
network communication from a fictitious United States
Air Force base [12], and provides both attack-free and
attack-containing network traces. Data samples for this
study were obtained from HTTP requests found in out-
side tcp dump data for each day from week one of the
DARPA’99 dataset. This first week of data is provided
for training purposes and contains no anomalous network
traffic. Using Jpcap, a free Java-based library for net-
work packet manipulation, (http://netresearch.ics.uci.edu/
kfujii/Jpcap/doc/), the numeric character values for all
HTTP packet payloads with lengths at least 1400 char-
acters were extracted for each day. The resulting dataset
provided a total of 5594 packets representing normal
network traffic divided by days as is shown in Table 1.
The anomalous data used in this study were compiled
by [6], and are freely available online (http://roberto.
perdisci.com/projects/mcpad). We chose to analyze the
algorithms’ performance in the detection of three out of
the four attack types provided: Generic HTTP Attacks,
Shell-code Attacks, and CLET Shell-code attacks. [6]
obtained 63 of the attacks included in their Generic
HTTP dataset from [13]. These attacks include a variety
of HTTP attacks collected in a live environment from
test web servers, as well as various archives and data-
bases. The attacks fall into several categories, including
buffer overflow, URL decoding error, and input valida-
tion error, and were directed against numerous web serv-
ers such as Microsoft IIS, Apache, Active Perl ISAPI,
CERN 3.0A, etc. [6] further supplements these attacks,
bolstering the dataset to include a total of 66 HTTP
threats. The Shell-code attack dataset includes 11 shell-
Table 1. Packets Extracted from DARPA’99 Week 1.
Day Number of Packets
Monday 688
Tuesday 968
Wednesday 860
Thursday 2308
Friday 770
Total 5,594
code attacks (attacks with packets containing executable
code in their payload), which are also included in the
Generic HTTP attack dataset. Finally, the CLET attacks
were generated by [6] using the CLET polymorphic shell-
code engine [14]. This created 96 polymorphic shell-code
attacks containing ciphered data meant to evade pattern-
matching based detection techniques.
Following the same procedure used to process the
DARPA’99 week one data, the numeric character values
contained in all HTTP packet payloads from each of the
three attack datasets with lengths of at least 1400 char-
acters were extracted using Jpcap. This provided a total
of 843 attack packets, with varying numbers of packets
from each attack type as is detailed in Table 2.
The payload information extracted from the DARPA
and attack datasets was used to create training and testing
datasets for our anomaly detection systems. For each of
the five days in the DARPA dataset, 20% of the day’s
packets were extracted to be used for training, and the
remaining 80% of the day’s data were set aside to be
used for testing. To simulate the network traffic in real
time, anomalous packets were then sporadically inserted
into both the training and testing data after an initial in-
terval consisting of only normal traffic (50 packets for
training data and 200 packets for testing data).In this way,
different datasets were created with each attack type for
all five days of DARPA’99 week one (See Figure 1).
The total number of abnormal packets inserted into both
the training and testing data was no more than 10% of all
normal data for the given day with payload length 1400
characters or more. In some cases, as shown in Table 3
and Table 4, the number of abnormal packets inserted
into data was less than 10% because there was not
enough attack data available for that day. For the training
data, 20% of the abnormal data was mixed with 20% of
the normal data for each day. Likewise, 80% of the ab-
normal data was mixed with 80% of the normal data se-
lected from each day for testing. For each packet, 256 1-
gram and 65,536 2-gram features were extracted to pro-
duce separate representations of the training and testing
datasets detailed in Table 3 and Table 4. The datasets
were stored in the ARFF file format used the by the open
source machine learning software WEKA [15].
The payload information extracted from the DARPA
and attack datasets was used to create training and testing
Table 2. Packets extracted from McPAD attack datasets.
Attack TypeNumber of Packets
Generic HTTP122
Shell-code73
CLET Shell-code648
Total843
Copyright © 2011 SciRes. JIS
Z. MILLER ET AL.
Copyright © 2011 SciRes. JIS
161
Figure 1. Testing and training data stream diagram.
Table 4. Testing dataset.
Table 3. Training dataset.
CLET Generic Shell-code
Mon
Norm.
Abnorm.
Total
138
14
152
138
14
152
138
14
152
Tue
Norm.
Abnorm.
Total
194
19
213
194
19
213
194
15
209
Wed
Norm.
Abnorm.
Total
172
17
189
172
17
189
172
15
187
Thu
Norm.
Abnorm.
Total
462
46
508
462
24
486
462
11
473
Fri
Norm.
Abnorm.
Total
154
15
169
154
15
169
154
15
169
CLET Generic Shell-code
Mon
Norm.
Abnorm.
Total
550
55
605
550
55
605
550
55
605
Tue
Norm.
Abnorm.
Total
774
78
852
774
78
852
774
58
832
Wed
Norm.
Abnorm.
Total
688
68
756
688
68
756
688
54
742
Thu
Norm.
Abnorm.
Total
1846
185
2031
1846
98
1944
1846
58
1904
Fri
Norm.
Abnorm.
Total
616
62
678
616
62
678
616
50
666
datasets for our anomaly detection systems. For each of
the five days in the DARPA dataset, 20% of the day’s
packets were extracted to be used for training, and the
remaining 80% of the day’s data were set aside to be
used for testing. To simulate the network traffic in real
time, anomalous packets were then sporadically inserted
into both the training and testing data after an initial in-
terval consisting of only normal traffic (50 packets for
training data and 200 packets for testing data). In this
way, different datasets were created with each attack
type for all five days of DARPA’99 week one (See Fig-
ure 1). The total number of abnormal packets inserted
into both the training and testing data was no more than
10% of all normal data for the given day with payload
length 1400 characters or more. In some cases, as shown
in Tables 3 and 4, the number of abnormal packets in-
serted into data was less than 10% because there was not
enough attack data available for that day. For the training
data, 20% of the abnormal data was mixed with 20% of
the normal data for each day. Likewise, 80% of the ab-
normal data was mixed with 80% of the normal data se-
lected from each day for testing. For each packet, 256 1-
gram and 65,536 2-gram features were extracted to pro-
duce separate representations of the training and testing
datasets detailed in Tables 3 and 4. The datasets were
stored in the ARFF file format used the by the open
source machine learning software WEKA [15].
2.2. Clustering-Based Anomaly Detection
Clustering algorithms are commonly used for anomaly
detection, and are generally created for the batch envi-
ronment [16]. However, some batch clustering algo-
rithms, such as DBSCAN, can be modified to process
stream data.
2.2.1. DBSCAN
DBSCAN is a density-based clustering algorithm devel-
Z. MILLER ET AL.
162
}
oped for the batch setting. The algorithm takes two user-
defined parameters, epsilon (ε) and minimum points, and
relies on the concepts of ε-neighborhood and core-ob-
jects. An ε-neighborhood is defined by DBSCAN as be-
ing a set of points that have a distance to another point
less than the user-defined parameter ε. More specifically,
given point p and dataset D, the ε-neighborhood of
p is equal to:

Np
 
{,Np qDdistpq
 , (1)
where is the Euclidean distance between
points p and q [17].
,distp q
A core-object is defined as a set of points within an ε-
neighborhood that contain more points than the mini-
mum points parameter. If p is part of a core-object,
DBSCAN will expand the cluster around p.
The basic structure of the algorithm is as follows:
1) DBSCAN takes the ε and minimum points parame-
ters and then chooses a point p that has not been visited.
2) DBSCAN calculates . If the size of

Np
N
is greater than minimum points, DBSCAN expands
a cluster around p. Otherwise, the point is considered
noise.

p
3) DBSCAN iterates to a new un-visited point and re-
peats the process [18].
Although DBSCAN was originally developed for a
batch environment, it has provided an inspiration for
stream clustering algorithms.
2.2.2. De nStream
DenStream is a stream clustering algorithm based on
DBSCAN with a damped window model. It expands the
concept of an ε-neighborhood in DBSCAN with a fading
function to maintain up-to-date information about the
data stream. The fading function is defined as:

2t
ft
, (2)
where 0
represents the decay factor and t repre-
sents the time.
DenStream also modifies the core-object concept of
DBSCAN, creating a core-micro-cluster with three addi-
tional attributes: radius, center and weight. The radius
must be less than or equal to ε, and the weight of a clus-
ter must be greater than the user-defined parameter µ
[11]. The weight w, center c and radius r of a core-micro-
cluster are more formally defined at time t, for a set of
close points, with time-stamps
as:
12
,,,
n
pp p12
,,,TT
n
T
1
n
i
i
wftT

, (3)

1
n
ii
i
f
tTp
cw
,

1(,)
n
ii
i
f
tTdist pc
rw
, (5)
,
i
distpc
i and the center
where is the Euclidean distance between the
point p c.
Because DenStream operates in a stream environment,
e-micro-cl
e this, a potential core-micro-
cl
the corusters need to change dynamically as
time passes. To facilitat
uster or p-micro-cluster is introduced. P-micro-clusters
are similar to core-micro-clusters, except they differ in
that the center and radius values are based on the
weighted sum and squared sum of the points (1
CF and
2
CF ). Also, the weight must be greater than or equal to
where
defines the threshold between icro-
clusters and outliers(described in the next praph)
that 01
p-m
arag
such
. 1
CF and 2
CF are calculated us-
ing the formulas:

n
ii
C fp
, (6)
1
1
i
Ft T


22
1
n
ii
i
CFf tTp

. (7)
This changes the center and radius values to be [11,
19]:
1
CF
cw
, (8)
2
21
CF



. (9) and CF
rww


Although theicro-cluster permits t
updated dynam, it generally will not
resentative view of a data stream as new points appear.
To
ew point p arrives in the
str
rge p with the closest
p-
an or equal to the value of ε, the point is merged.
Th
p-mhe model to be
ically provide a rep-
handle this concept drift, DenStream also introduces
the outlier-micro-cluster (or o-micro-cluster) and an out-
lier-buffer that temporarily stores o-micro-clusters and
allows them to become p-micro-clusters. The operation
of DenStream is as follows:
Initial Step: run DBSCAN on a set of initial points to
generate starting p-micro-clusters.
Online Steps, when a n
eam:
1) The algorithm attempts to me
micro-cluster. If the radius of the potential micro-cluster
is less th
2) If the point is not merged to a p-micro-cluster, it
tries to merge p with an existing o-micro-cluster. If the
radius is less than ε, it is merged with the o-micro-cluster.
en if the o-micro-cluster now has a weight large
enough to become its own p-micro-cluster, it is removed
(4)
Copyright © 2011 SciRes. JIS
Z. MILLER ET AL. 163
ier-buffer.
is done using the formula:
from the outlier-buffer and added to the model as a p-
micro-cluster.
3) If the point cannot be merged to an existing o-mi-
cro-cluster, it creates a new o-micro-cluster and gets
placed in the outl
After the merging phase of the DenStream algorithm,
the lower weight limit is calculated for all o-micro-clus-
ters in the outlier buffer. This

0
0
21
,
21
cp
p
ttT
cT
tt

, (10)
where 0
represents the decay factor, t and t rep-
resent the current and starting time for th
ter, and Tp is the predetermined time-period.
the w
ckets, DenStream was modified
create the DenStreamDetection algorithm, which treats
hen a
2.
odel was created in two steps.
he first step used the training data to find a range for
such as ε and
in Section 2.4. A false
po
ese parameters, models were generated with a range of
false pverall
performance.
nother approach to the detection of anomalous network
aintain
tatistical information about network packet payloads.
stogram-based detection algorithm provides a sim-
le method for classification of network traffic. The al-
rithm 2, creates a histo-
c0
e o-micro-clus-
4) Ifeight of a particular cluster is less than the
lower weight limit, the o-micro-cluster can be removed
from the outlier buffer.
2.2.3. Ou r DenStream- B a sed Detectio n System
To detect anomalous pa
to
incoming packets as points to be clustered. W
packet is merged with a p-micro-cluster, it is classified as
normal. Otherwise, it is sent to the outlier-buffer and
classified as anomalous. The ability for o-micro-clusters
to be promoted to p-micro-clusters was removed because
the majority of the packets clustered to the outlier-buffer
are abnormal packets. If one of these o-micro-clusters
became a p-micro-cluster, the model would be tainted
and therefore unable to differentiate between abnormal
and normal packets.
The basic structure of DenStreamDetectionis shown in
Algorithm 1.
2.4. Creatio n of the Detection Model
The anomaly detection m
T
the parameters in DenStreamDetection
minimum points. Using 50 initial points, multiple Den-
StreamDetection models for each day were created to
find a range of optimal parameters that could be used in
the testing step. We found that ε had a larger impact on
the predictions than the minimum points. During the first
step, different parameter ranges were identified based on
day and abnormal packet type.
The second step used the testing data to make a pre-
diction model, which was evaluated with the sensitivity
and false positive rates defined
sitive is a normal packet classified as abnormal. The
parameters used in this step were 200 initial packets, 10
minimum points and a range of ε values specific to each
day and attack type determined from the first step. Using
Algorithm 1. DenStream Detection algorithm.
th
ositive and sensitivity rates to demonstrate o
2.3. Histogram-Based Anomaly Detection
A
packets has involved the use of histograms to m
s
PAYL is an example of such a system [7], in which a
model is created for known normal packet payloads and
then compared with incoming packet payloads to deter-
mine whether or not the newly arriving packets are
anomalous. Due to the evolutionary nature of streaming
data, it is important that any abnormal packet detection
method is able to update its normal model as concept
drift occurs in the incoming data stream. With this in
mind, we present a histogram-based classification method
capable of modeling dynamic network traffic in real
time.
2.3.1. Algorithm Description
The hi
p
gorithm, summarized in Algo
gramen compassing a “normal” model of the network
packets expected to be encountered. This histogram is
generated by counting the frequency of n-gram features
Algorithm: DenStreamDetection (iniP, minP, ε)
Parameters: iniP: number of initial packets
minP: number of minimum packets in initial p-micro-
clusters
ε: distance threshold
Input: File containing normal and abnormal packets in n-gram
form
Output: Predictions of normal and abnormal packets.
cl
. Try to merge p into its nearest p-micro-cluster c
at.
1. Initialize DenStream using iniP and minP to build p-micro-
usters.
2. As each packet p comes in:
3p
4. if
p
radius c
then
5. Merge p into cp;
6. Classify as normal packet;
7. else
8. Try to merge p into nearest o-micro-cluster c0;
9. if
radius c0
then
Create a new ocluster and insert into outlier-
f
10. Merge p into
c0;
11. Classify as abnormal packet;
12. el e s
13. -micro-
buffer;
14. Classify as abnormal packet;
15. end i
16.end if
Copyright © 2011 SciRes. JIS
Z. MILLER ET AL.
164
ed detection algorithm.
und within packet payloads. To begin classification of
a st
normal packets to construct the normal model histogram.
rom the new
pa
rived since
th
ram-based algorithm requires several
arameters, it is important to note that these are not
wo parameters in particular
am model. If this value is too
sm
tances classi-
fie
Algorithm 2. Histogram-bas
fo
ream of packets, the algorithm first requires x initial
This histogram contains frequency counts from all initial
packets for each possible n-gram attribute. Since we are
attempting to model normal traffic, it is imperative that
no abnormal packets are included when this model is
created or else the model will be contaminated and de-
tection rates will decrease. To effectively reflect the evo-
lutionary nature of network traffic, the same fading func-
tion with decay factor λ used in DenStream is applied to
the histogram after each new packet is processed. This
helps to reduce the impact of outdated stream data. After
the initial histogram has been built, the algorithm can
begin to classify the subsequent packets.
In order to classify an incoming network packet, the
algorithm builds a histogram from the newly arrived
packet’s payload. The histogram generated f
cket is then compared with the normal model histo-
gram (to which the fading function has been applied as
each new packet comes in) by computing the Pearson
correlation value between the two histograms. If the
computed Pearson correlation value is above auser-de-
fined threshold t, the packet is classified as normal; oth-
erwise, the packet is classified as abnormal.
To account for the possibility of concept drift and shift
occurring in data flows, the normal histogram model may
need to be rebuilt using packets that have ar
e initialization of the normal histogram model. This
allows the normal model to stay current, modeling pack-
ets most recently classified as normal. In order to facili-
tate this rebuilding process, the algorithm maintains two
queues of user-defined size containing information from
previously processed normal packets. One of these
queues, of size q, stores the histogram data for the pre-
vious packets, while the other, with size w, stores Pear-
son correlation values computed between the packets and
the normal histogram model. Note that only data for
packets classified as normal are included in these two
queues; any packets classified as abnormal are not taken
into account. If the normal histogram is to be rebuilt, a
set of user-specified conditions must be met, giving the
user control of the rebuilding process. When the model is
rebuilt too often, the algorithm’s efficiency will decrease
significantly; however, if it is not rebuilt enough, accu-
racy will diminish. To determine when rebuilding the
normal model is necessary, the algorithm calculates the
number of Pearson correlation values in the stored queue
that are below the user-defined threshold h. If this count
is found to be of a certain value r, the normal model is
rebuilt using packets stored in the histogram data queue
and the queue containing previous Pearson correlation
values is emptied.
2.3.2. Cr i t i cal Paramete rs
Though the histog
p
equally important. Rather, t
have the greatest effect on the algorithm’s ability to de-
tect anomalous packets.
The first of these most critical parameters is q, the size
of the queue of previously processed instances used to
rebuild the normal histogr
all, the normal histogram model generated when the
model is rebuilt will not take into account a sufficient
number of previously processed packets. This results in
an insufficiently robust model, causing both undesirable
sensitivity values and false positive rates.
While q has a noticeable influence on the effectiveness
of the histogram detection algorithm, t, which defines the
Pearson correlation threshold between ins
d as normal and abnormal, is undoubtedly the most
important parameter. This is understandable, as t directly
controls the classification of each individual instance as
it is processed by the algorithm. Furthermore, t also plays
Algorithm: HistogramDetection (x, w, q, t, r, h, λ)
Parameters: x: number of initial normal packets
w: size of Pearson correlation queue
q: size of rebuild queue
t: classification threshold
r: rebuild count
h: rebuild threshold
λ: decay factor
Input: File crmal packets in n-gram
form
Output: Pred abnormal packets
2. Inqueue
b to queue of size w
e w
. As each new packet p arrives
irst(p)
rsonCorrelation(g,c))
h values h
ar()
as abnormal
ontaining normal and abno
at
dictions of normal an
1. Initialize histogram
g using x initial packets
itialize rebuild
3. Initialize Pearson correlation log l to queue of siz
4
5. decay (g)
6. Create histogram
c from packet p
7. if pearsonCorrelation(g,c) >t then
8. Classify
p as normal
9. b.addF
10. if (b.size() >q) then
11. b.removeLast()
12. end if
13. l.addFirst(pea
14. if (l.size() > w) then
15. l.removeLast()
16. end if
17. Calculate number of packetsn in l wit
18. if n r then
19. rebuild(g)
20. l.cle
21. else
22. Classify
p
23. end if
24. end if
Copyright © 2011 SciRes. JIS
Z. MILLER ET AL. 165
o evaluate the performance of the anomaly detection
se positive rates were cal-
ulated using the following formulas:
a role in controlling the rebuilding of the normal histo-
gram model. Because parameter h specifies an interval
above t, t is directly related the frequency at which the
normal model histogram is rebuilt. Thus, the value of t is
closely connected to the core functionality of the histo-
gram-based detection algorithm.
2.4. Performance Metrics
T
models, the sensitivity and fal
c
sensitivity TP
TP FN
, (11)
false positive rate
F
P
TNFP , (12)
where TP/FN stand for the number of correctly/incur-
rectly classified abnormal packets, and
number of correctly/incorrectly classified normal packets.
.1. Density-Based Detection Results
fter tuning the DenStreamDetection-based system on
ed a range of
ues for each day that could be used to evaluate the
th a 14% false positive
rat
mDetection System Results 2-gram(1-gram)
TN/FP are the
Sensitivity measured how well the model detected ab-
normal packets, and false positive rate indicated the per-
centage of false alarms generated.
3. Results and Discussion
3
A
packets using 2-gram features, we discover
ε val
model. For every day except for Tuesday, the false posi-
tive rate was kept between 0% and 10% so that an ap-
propriate detection rate could be found. Tuesday, how-
ever, needed the false positive limit to be heavily relaxed
in order to achieve a moderate sensitivity. When testing
the detection system, the false positive and sensitivity
rates for the highest, middle and lowest ε values were
generated for both1-gram and 2-gram feature representa-
tions. These are displayed in Table 5. The results with
highest sensitivity for each virus type were then averaged
to find best overall performance.
In general, the DenStreamDetection-based system was
able to correctly detect most Shell-code attacks, achiev-
ing on average 91% sensitivity wi
e. Similarly, Generic HTTP attacks produced 78%
average sensitivity and a 13% false positive rate. CLET
attacks, however, had a similar false positive rate of 14%,
but a substantially lower average sensitivity of 65%. This
disparity was likely due to the polymorphic nature of
CLET attacks, which are designed to mimic normal net-
work traffic.
Table 5. DenStreamDetection system results.
DenStrea
CLET Generic Http Shell-code
Day ε
100)
FP Sens FP Sens FP Sens
307(20)75(78) 7(20) 78(95) 7(20)98(
45 7(967) 7()
60 4
) 67(7(10) )76(7610) 96(95
Mon
6(6) 9(49)6(7) 73(70) 6(7) 93(93)
6535(35) 62(62) 33(35) 74(74) 35(35)86(86)
80 3
33 3
4(33)56(56) 33(34) 72(72) 34(34)81(81)
Tue
1003(33) 51(50)2(33) 69(69) 3(33)76(79)
9511(11) 38(37) 11(11) 63(62) 11(11)81(81)
1109(10) 29(29))10(10 60(60) 10(10)78(78)
Wed
125 )8(8 25(25)8(8) 54(56) 8(8) 70(72)
5 6(4) 96(98)6(3) 99(1) 6(3) 98(1)
303(2) 84(84)4(1) 90(90) 4(1) 93(93)
Thu
550(2) 76(78)0(1) 81(81) 0(1) 81(82)
559(9) 56(58)9(9) 76(77) 9(9) 92(94)
658(9) 53(53)8(9) 74(74) 8(9) 92(92)
Fri
758(8) 50(49)8(8) 71(71) 8(8) 88(88)
Best. A11 1vg4(16) 65(67)3(16) 78(82) 4(16)91(92)
Thur htioes
9snfalo.
lso, Thursday experienced both the lowest ε value sand
th
em experienced
sli
.2.1. Optimal Parameters
o anoma-
packet detection in two steps: training and testing.
le parameters for the algo-
sday exhibited theighes detectn rat (up to
9%) whil t keepig the alse positive rtes bew 6%
A
e largest ε range to achieve its results.
The models utilizing both 1-gram and 2-gram feature-
sproduced similar results. Using the 1-gram representa-
tion for the same ε values, the syst
ghtly better detection rates at the expense of higher
false positive rates. Also, because the 1-gram representa-
tion has a much smaller feature space than 2-gram, the
total run-time of 1-gram was significantly less.
3.2. Histogram-Based Detection Results
3
The histogram-based algorithm was applied t
lous
In the training step, favorab
rithm were approximated by performing several experi-
ments on the training data. Since the training data in-
cluded 50 initial normal packets, this value was used for
x during the training step. Most critically, appropriate
values of t were ascertained for the different attack types
on each day, as this parameter has the greatest effect on
the performance of the algorithm. Suitable values were
also obtained for all other parameters during the training
phase. The optimum value of q was found to be 200, as
this allowed the algorithm to maintain a fairly accurate
model of normal traffic while minimizing the time needed
Copyright © 2011 SciRes. JIS
Z. MILLER ET AL.
166
tained 200 initial normal
pa
ble to attain
irly consistent results across all days’ testing data for
imal parameters. The algorithm
Histogram Detection System Results 2-gram
attackedod r ra-based
algorit r sd
ratainedalse ositao
maly detection systems is
he evolutionary nature of network
this challenge, we calculated the
to for this model to be rebuilt. Also, 30 was generally
used for w, with r valued at 10 and h at 0.2. These pa-
rameters effectively limited the frequency of rebuilding
the normal histogram model while still allowing the al-
gorithm to handle concept drift in the data. A λ value of
0.01 was found to work sufficiently well, as this decay
factor helped to better maintain an up-to-date normal
model for normal traffic.
Once appropriate parameters were identified, the test-
ing phase began. In this step, the value 200 was assigned
to x since the testing data con
ckets. With the rest of the algorithm’s parameters re-
maining static, the algorithm was tested using varying
values of t in order to gauge sensitivity values at differ-
ent false positive rates. This produced the results sum-
marized in Table 6, which displays a sampling of t val-
ues used for different days and attack types with the re-
sulting sensitivities and false positive rates.
3.2.2. Re s ults Achieve d
As can be seen from Table 6, we were a
fa
each attack type using opt
performed best at detecting shell-code attacks, achieving
an average of 97% sensitivity and 1% false positive rate
across all shell-code attack testing datasets. Performance
was slightly less desirable in detection of Generic HTTP
attacks, but was nevertheless acceptable, with an average
detection rate of 84% and 3% false positive rate. CLET
Table 6. Histogram-based detection system results.
CLET Generic HTTP Shell-code
Day FP Sens
0.0025 0 058
t FP Sens t FP Sens t
25 0.00250 76 0.0005
0.0060 7 31 0.0.008
0.0080 8 0.00900.05
00705 76 10 07
Mon
60 9 76 03096
0.00005 1 27 0.00021 27 0.00001131
0.0025 7 37 0.00102 71 0.0010 279
Tue
0.0035 9 40 0.00152 83 0.0015 297
0.0020 1 29 0.00040 38 0.0005 043
0.0040 2 35 0.00100 66 0.0010 078
Wed
0.0045 3 44 0.00301 81 0.0020 196
0.0050 0 69 0.00050 78 0.0005 097
0.1505 3 98 0.00100 88 0.0755 298
Thu
0.4000 3 100 0.08552 100 0.1000 3100
0.0005 0 26 0.00050 39 0.0010 060
0.0020 2 27 0.00100 61 0.0015 182
Fri
0.0065 13 63 0.00151 79 0.0020 196
Be
s prov mst ifficult fothe histogm
hm to detect. In order foreaonable etection
tes to be ob, fpive rates generally hd t
be pushed much higher than necessary for the other two
attack types, to an average of 7% with optimal parame-
ters. Despite this fact, sensitivity remained comparatively
low, at an average of 61%. This difficulty detecting
CLET attacks was likely due to their polymorphic nature,
which also proved troublesome to the DenStream Detec-
tion system.
While results were relatively consistent across each
days’ worth of training data, those achieved using Thurs-
day’s testing data were markedly superior. Using optimal
parameters, the histogram-based detection algorithm was
able to achieve perfect detection on all three types of
attacks, each with a false positive rate of 3% or less.
There are several possible reasons for this exceptional
performance related to the nature of Thursday’s testing
data as discussed in Section 3.3. Overall, the relatively
high t values used indicate greater consistency in Thurs-
day’s network traffic. Due to these elevated t values, the
algorithm was able to more effectively identify abnormal
network packets.
The number of parameters required by the histogram-
based detection algorithm necessitated fairly specific
tuning for our different datasets in order to perform op-
timally. This was demonstrated by the algorithm’s per-
formance on 1-gram features when the same parameters
used for 2-gram were applied. While the clustering-based
algorithm was able to achieve comparable results from
both 1-gram and 2-gram with the same parameters, the
histogram-based algorithm performed poorly on 1-gram
when applied with the same parameters as 2-gram. As a
result, 1-gram results have not been reported for the his-
togram-based algorithm.
3.3. Concept Shift
he performance of our anoT
heavily influenced by t
affic. To demonstrate tr
Pearson correlation between segments of ten packets for
each day. By measuring the correlation between two
consecutive segments, changes in the stream can be
visualized (Figure 2), which offers a possible explana-
tion to the results observed in Tables 5 and 6.
Monday, Wednesday, and Friday exhibit continuous
concept shift, particularly during the training phase, as
the calculated Pearson correlation values oscillate regu-
larly. Therefore, a robust model for normal traffic was
built in each case that responded accurately to the evolv-
ing data stream. As a result, greater sensitivity was a-
chieved on each of these three days.
st Avg. 7 61 3 84 197
Copyright © 2011 SciRes. JIS
Z. MILLER ET AL.
Copyright © 2011 SciRes. JIS
167
Figure 2. Pearson correlation curves for DARPA’99 week 1.
Tuesday demonstrates a ve
starts with slight shift, but remains relatively stable
th
ed through concept shift. During the training
ph
experienced consistent con-
cept drift, which led to superior detection results.
were
pplied to the problem of anomaly detection. First, a
gorithm was used to detect abnormal
ackets. Using 1-gram and 2-gram features, this ap-
ry distinct pattern, as it Therefore, Thursday likely
roughout the stream. In contrast to Monday, Tuesday
experiences only two major shifts, both of which occur
during the training phase. Therefore, Tuesday’s models
may have a less accurate view of the incoming stream,
causing decreased performance. Since Tuesday only ex-
perienced shift during the training phase, it had a less
accurate view of the changing stream environment. This
may have led to the performance issues stated previ-
ously.
The high sensitivity values for Thursday can also be
explain
ase, consistent concept shift occurred, allowing Thurs-
day’s model to effectively capture the changing pattern
of normal network traffic. Following the training step,
the remainder of Thursday’s data was relatively stable.
4. Conclusions
In this paper, two data stream mining techniques
a
stream clustering al
p
proach achieved moderate success with Generic HTTP
and Shell-code attacks but had a higher average false
positive rate. Second, a stream adaptation of the relative
frequency histogram approach found in [7] was created
using Pearson correlation to detect anomalies.
Though the histogram-based approach achieved mod-
erately better results, it required more fine-tuning be-
cause of the number of parameters used. In contrast,
Z. MILLER ET AL.
168
easi
ac
w
thank the Summer Research Institute at
oughton College for providing funding for our re-
rdisci, G. Gu and W. Lee, “Using an Ensembl
One-Class svm Classifiers to Harden Payload-Based
ction Systems,” ICDM’06: Proceedings of
nation Conference on Data Mining, Hong
generalization of the clustering algorithm waser to
hieve since it uses fewer parameters. This was evi-
denced by the ability of the clustering algorithm to per-
form effectively on both 1-gram and 2-gram features
with the same parameters, while the histogram algorithm
required specific parameter tuning for each feature type.
Lastly, to better explain the performance differences
between certain days, we analyzed the Pearson correla-
tion between consecutive segments of 10 packets. By
plotting these values on a graph, concept drift and shift
ere visualized, and clear variations were observed be-
tween days. The location and frequency of concept shift
and drift in the data streams, especially within the train-
ing phase, provided an account for the observed changes
in performance.
5. Acknowledgements
We would like to
H
search.
6. References
[1] R. Pee of
Anomaly Dete
the Sixth Integ
Kong, 18-22 December 2006, pp. 488-498.
doi:10.1109/ICDM.2006.165
[2] D. Anderson, T. Lunt, H. Javits and A. Tamaru, “Nides:
Detecting Unusual Program Behavior Using the Statisti-
cal Component of the Next Generation Intrusion Detec-
tion Expert System,” Technical Report SRI-CSL-95-06,
P. Chan, “Learning Non Stationary
, P. Fogla, G. Giacinto and W. Lee,
Intrusion Detection,” Recent Advances in Intrusion
Computer Science Laboratory, SRI International, Menlo
Park, May 1995.
[3] R. Perdisci, “Statistical Pattern Recognition Techniques
for Intrusion Detection in Computer Networks, Chal-
lenges and Solutions,” University of Cagliari, Italy, 2006.
[4] M. Mahoney and
Models of Normal Network Trac for Detecting Novel
Attacks,” ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, Edmonton, July
2002, pp. 376-385.
[5] M. Mahoney, “Network Trafic Anomaly Detection Based
on Packet Bytes,” ACM-SAC, Melbourne, 2003 pp. 346-
350.
[6] R. Perdisci, D. Ariu
“McPAD: A Multiple Classifier System for Accurate
Payload-based Anomaly Detection,” Computer Networks,
Special Issue on Traffic Classification and Its Applica-
tions to Modern Networks, Vol. 5 No. 6, 2009, pp. 864-
881.
[7] K. Wang and S. Stolfo, “Anomalous Payload-Based Net-
work
Detection (RAID), Vol. 3224, 2004, pp. 203-222.
doi:10.1007/978-3-540-30143-1_11
[8] K. Wang, “Network Payload-Based Anomaly D
and Content-Based Alert Correlation
etection
,” Columbia Univer-
sity, Uppsala, 2011.
ine Learning
m with Noise,”
Evalua-
sity, New York, 2006.
[9] J. Tang, “An algorithm for Streaming Clustering,” MSc.
Thesis, Uppsala Univer
[10] A. Bifet, G. Holmes, R. Kirkby and B. Pfahringer, “MOA:
Massive Online Analysis,” Journal of Mach
Research, Vol. 11, 2010, pp. 1601-1604.
[11] F. Cao, M. Ester, W. Qian and A. Zhou, “Density-Based
Clustering over an Evolving Data Strea
SIAM Conference Data Mining, Bethesda, 2006.
[12] R. Lippmann, J. Haines, D. Fried, J. Korba and K. Das,
“The 1999 DARPA Off-Line Intrusion Detection
tion,” Computer Networks, Vol. 34, No. 4, 2000, pp. 579-
595. doi:10.1016/S1389-1286(00)00139-0
[13] K. L. Ingham and H. Inoue, “Comparing Anomaly Detec-
tion Techniques for HTTP,” Recent Advances in Intru-
Engine Using Spectrum
Edition, Mor-
roceedings of
in Large
al Data,”
Clustering Evolving Data
sion Detection (RAID), 2007.
[14] T. Detristan, T. Ulenspiegel, Y. Malcom and M. Under-
duk, “Polymorphic Shellcode
Analysis,” Phrack, Vol. 11, No. 61, 2003.
[15] I.H. Witten and E. Frank, “Data Mining: Practical Ma-
chine Learning Tools and Techniques,” 2nd
gan Kaufmann Publishers, Waltham, 2005.
[16] L. Portnoy, E. Eskin and S. Stolfo, “Intrusion Detection
with Unlabeled Data Using Clustering,” P
ACM CSS Workshop on Data Mining Applied to Security
(DMSA-2001), Philadelphia, 2001, pp. 333-342.
[17] M. Ester, H. Kriegel, J. Sander and X. Xu, “A Den-
sity-Based Algorithm for Discovering Clusters
Spatial Databases with Noise,” International Conference
on Knowledge Discovery in Databases and Data Mining
(KDD-96), Portland, August 1996, pp. 226-231.
[18] K. Mumtaz and K. Duraiswamy, “An Analysis on Den-
sity Based Clustering of Multi Dimensional Spati
Indian Journal of Computer Science and Engineering,
Vol. 1, No. 1, 2010, pp. 8-12.
[19] A. Forestiero, C. Pizzuti and G. Spezzano, “FlockStream:
a Bio-Inspired Algorithm for
Streams,” ICTAI’09 Proceedings of the 2009 21st IEEE
International Conference on Tools with Artificial Intelli-
gence, Washington DC, 2009, pp. 1-8.
Copyright © 2011 SciRes. JIS