Due to the ever growing number of cyber attacks, especially of the online systems, development and operation of adaptive Intrusion Detection Systems (IDSs) is badly needed so as to protect these systems. It remains as a goal of paramount importance to achieve and a serious challenge to address. Different selection methods have been developed and implemented in Genetic Algorithms (GAs) to enhance the rate of detection of the IDSs. In this respect, the present study employed the eXtended Classifier System (XCS) for detection of intrusions by matching the incoming environmental message (packet) with a classifiers pool to determine whether the incoming message is a normal request or an intrusion. Fuzzy Clustering by Local Approximation Membership (FLAME) represents the new selection method used in GAs. In this study, Genetic Algorithm with FLAME selection (FGA) was used as a production engine for the XCS. For comparison purposes, different selection methods were compared with FLAME selection and all experiments and evaluations were performed by using the KDD’99 dataset.
Internet users increased greatly during the last decade. This increase gives the attackers motivation for initiating harm actions and making attacks. The Intrusion Detection Systems (IDSs), however, act as a defense line against the attack attempts. The weakness analysis tools, as vulnerability assessment tools, should be carefully evaluated and frequently developed to establish a solution package as is the case in other computing security applications. Such tools are concerned with keeping systems confidential, available, reliable, integral, authenticated, and impregnable against any attempts of attack. Thus, intrusion detection (ID) can be defined as “the process of monitoring the events occurring in a computer system or network and noticeably different from normal system activities and thus detectable” [
There is a necessity for developing adaptive methods and techniques that work well entire systems to identify attacks trials. Currently, different optimizing algorithms, artificial intelligence (AI), and hybrid algorithms are employed to improve the intrusion detection possibilities. For example, Genetic Algorithms [
In FLAME-ID, the XCS applied a modification on the classifier generator, GA, by applying the Fuzzy Clustering by Local Approximation Membership (FLAME) as a new selection method; FGA selection. FGA differs than previous selection methods that have been implemented in [
In sum, this study applied a new approach, that is, FGA, as a classifier generator in the eXtended Classifier System (XCS) for ID. Afterwards, the study assessed ability of FALME selection to enhance the attack detection rate (DR) relative to the existing and commonly-used selection methods.
The remainder of this paper is organized as follows. Section 2 reviews related previous works. Section 3 describes the dataset that is commonly used in training the IDS and testing its performance. The research background is presented in Section 4. Thereafter, Section 5 discusses the proposed method for improving ID and Section 6 discusses evaluation of performance of the proposed method. Lastly, the experimental results are given in Section 7.
Detecting intrusion is a tempting area to conduct a set of researches since 1982 [
Researchers in [
In matter of feature filtration, Alsharafat in [
A method for attack sequence detection in the Cloud environment using Hidden Markov Model was developed by researchers in [
In [
Danane and Parvat in [
The KDD’99 dataset is a benchmark dataset that is frequently used in evaluating different IDSs like in [
・ Basic Features
These features are derived from network packet headers, regardless of the payload.
・ Content Features
These features are used to assess the payload.
・ Time-based Traffic Features
Attack Type | Number of Records | % of the KDD’99 Dataset |
---|---|---|
Normal | 97,227 | 19.69 |
DoS | 391,458 | 79.24 |
Probe | 4107 | 0.83 |
R2L | 1126 | 0.23 |
U2R | 52 | 0.01 |
These are features that are used to capture second temporal window properties.
・ Host-based Traffic Features
These are features that utilize a historical window estimated over the number of connections.
Both of the time-based and host-based traffics are considered as traffic features in the KDD’99 dataset.
Likewise, attacks fall into four main categories [
・ DoS attacks
These attacks focus on making all resources of the attacked system too busy and unable to accept legal request of resources.
・ R2L attacks
This sort of attacks focuses on exploiting the vulnerabilities of the attacked system by unauthorized access from a remote machine.
・ U2R attacks
These attacks entail access to local user accounts and their related privileges using the weakness traps of the attacked system.
・ Probe attacks
This sort of attacks concentrates on monitoring networks and their related information to discover vulnerabilities and related trapdoors.
In the present study, the KDD’99 dataset was partitioned into two sub-sets: a training sub-set and a testing sub-set. The training sub-set consists of 10% of five million records that were distributed as shown in
The Learning Classifier System (LCS) is classified as a machine-learning paradigm that uses simple string rules, i.e., classifiers, to guide its performance in unknown environments. The LCSs have gained interest of researchers since 1995 [
The LCSs are often utilized to raise the DRs in the unknown environments. They employ two operations; evolution and learning, and the different problems require different learning styles to identify useful classifiers and find better rules. Learning is concerned with the reply from the environment so as to decide on whether or not the chosen rule leads to correct detection. Therefore, any learning process achieved by taking feedback from environment in positive or negative way. In both cases, new rules must be created and evaluated to optimize DR. This step describe as an evolutionary process is realized via evolutionary algorithms as the GA [
The LCSs are of three types [
・ Zeroth level classifier systems (ZCSs);
・ Anticipatory learning classifier systems (ALCSs);
・ eXtended Classifier Systems (XCSs).
The XCS is a rule-based classifier system that was introduced by Wilson in 1995 [
・ The condition part that can be represented using real code, integer code, or binary code, as is the case in the current study, for example, {0, 1, #}, where # denotes a non-significant value.
・ The action part, which represents the result of the chosen rule to be applied in an environment. In general, the action can be either an attack or a normal action.
The XCS is based on two elements: reinforcement learning (RL) and GA. Reinforcement learning is concerned with gaining feedback from an environment which can be a reward or a penalty. Thus, the RL is designed to determine how the rule, that is, the classifier, will be useful in other situations [
・ Detector
The detector receives input from the environment that represents a network traffic record and converts it into a binary code.
・ Match set [M]
The match set [M] is a pool of classifiers that are matched with a condition part of the incoming message from the detector.
・ Prediction Array
This array is produced from [M] according to the average fitness-weighted prediction rules in [A].
・ Action set [A]
This set represents a pool of classifiers derived from [M] that support the action which will be chosen.
・ Effectors
The effectors are an output unit that will fire the action of the classifier selected from [A] to the environment where the result can be either a normal action or an attack.
Functioning of the XCS can be summarized in the following steps [
Step 1: Environment input is received via a detector that is concerned with the packet coming from the network.
Step 2: The received input is matched with a set of the existing system population (Pop) classifiers. The matching process is involved in matching the received input with the Pop. So, the matched populations will be placed in a separate set called the matching set [M]. The contents of [M] represent the entire knowledge of the XCS that is concerned with deciding what to do with the incoming input.
Step 3: A prediction array is generated. This array is used to select the appropriate action to be executed. The predictive ability of the action of every classifier in [M] is evaluated by taking a weighted average of the predictions of all the classifiers in [M], where the weight of every value of classifier represents the fitness, F.
Step 4: Producing an action set [A]. Next, a subset of [M] which supports the selected action is produced. This set is called the action set [A].
Step 5: Action execution and action set updating. The most appropriate classifier picked up from [A] will be fired to the environment by an effector. Then, feedback will be received from the environment in the form of reward if the taken action is correct. Otherwise, the system will get penalty. In each cycle, the XCS updates only the classifiers in [A] based on the feedback received, whether reward or penalty. Therefore, every classifier maintains a prediction about the feedback according to its response. Afterwards, the predictions are adjusted accordingly.
The related previous studies provided different AI and hybrid techniques for developing IDSs. Every technique has its own pros and cons. Hence, these techniques have different detection powers in terms of the DR, FAR, or Accuracy measures of performance. The IDS proposed in the current study combines machine learning (ML), represented by the XCS, and an evolutionary algorithm, represented by the FLAME_GA (FGA), which applies a new selection method called FLAME selection. In the traditional GAs, a set of operations is implemented in every generation. These operations are selection, crossover, mutation, and replacement. The method of FLAME selection is a new selection method that was adopted in the present study to provide an enhancement for the GA so as to improve the DR of the proposed IDS.
Fuzzy clustering by Local Approximation of Memberships (FLAME) determines clusters according to fitness of the classifier. This selection method depends on the neighborhood relationships between parents that were applied to the neighboring memberships in fuzzy membership of populations [
1) Extracting the information structure. This is usually realized by
a) Creating a graph to connect each parent to the best neighbor according to the k-Nearest Neighbors (KNN) algorithm.
b) Calculating the density for each parent according to its closeness to its k nearest neighbor.
c) Categorizing each parent into either of three classes; the inner, outer, or rest classes.
As such, the FLAME identifies the parents having archetypical features, which known as a Cluster-Supporting Objects (CSOs). Also, CSO represent an inner class.
2) Assigning a fuzzy membership vector to each parent. Achievement of this step entails
a) Provide initial values of fuzzy membership function:
i) Each parent in the inner CSO is fully assigned to one class to be associated with one class.
ii) All parents in the outlier class have fixed and full memberships to this class.
iii) The rest class contains parents with equal membership.
b) Updating the fuzzy membership value for all parents in all three classes by applying a linear aggregation of the fuzzy memberships of the nearest neighbors.
3) Assignment of every parent to the cluster that has the highest membership value on a one-to-one object-cluster basis according to the following:
a) Passing on the highest membership value, based on a one-to-one parent-class basis, to all parent in the class.
b) Every parent will be assigned to specific class if that class has higher membership value based on a one-to-multiple basis.
After implementing FLAME, all parents in the GA belong to one of the following classes:
• Inner class: where all parents have superior density between neighbors.
• Outer class: where parents have less density value compared with neighbors and a predetermined threshold.
• Rest class: where any parents not belong to inner or outer will be assigned to this class.
Researchers in different studies like [
・ Detection Rate (DR)
The DR can be defined as “the ratio between the number of correctly detected attacks and the total number of attacks” [
DR = TP TP + FP (1)
・ False Alarm Rate (FAR)
The FAR can be defined as “the number of ‘normal’ patterns classified as attacks (False Positive) divided by the total number of ‘normal’ patterns” [
FAR = FP FP + TN (2)
The parameters appearing in Equation (1) and Equation (2) are explained in
This study proposed a new selection method called FLAME selection that has been implemented within a GA, which served as the classifier generator in the XCS. The initial results show that the FLAME selection method produced classifiers with higher fitness than traditional selection methods such as the Ranking, Roulette Wheel, and Tournament methods.
As regards ID, the experimental results showed that the XCS with FLAME selection is an effective method that can be used to improve the DR and reduce the
Parameter | Definition |
---|---|
True Positive Rate (TP) | Attack occurs and an alarm is raised |
False Positive Rate (FP) | No attack occurs but alarm is raised |
True Negative Rate (TN) | No attack occurs and no alarm is raised |
False Negative Rate (FN) | Attack occurs but no alarm is raised |
FAR. Having a low FAR and high DR is an advantage for any IDS. Furthermore, parent (classifier) selection in the GA is a critical decision that must be soundly taken in the early stages of classifier generation for detecting the network attacks. Quite often, several modifications and trials will need to be conducted to gain high DR and several comparisons should be made to investigate and evidence gain of superior DR by the inner class than the other classes. In this regard, the FLAME selection method aims at selecting the best set of parents to breed a new generation with high potentiality to increase the DR and reduce the FAR.
In the future, further enhancements will be needed to improve the DR for all types of network attacks.
The author declares no conflicts of interest regarding the publication of this paper.
Alsharafat, W. (2019) A New FLAME Selection Method for Intrusion Detection (FLAME-ID). Communications and Network, 11, 11-20. https://doi.org/10.4236/cn.2019.111002