Anomaly detection is now very important in the network because the increasing use of the internet and security of a network or user is a main concern of any network administrator. As the use of the internet increases, so the chances of having a threat or attack in the network are also increasing day by day and traffic in the network is also increasing. It is very difficult to analyse all the traffic data in network for finding the anomaly in the network and sampling provides a way to analyse the anomalies in network with less traffic data. In this paper, we propose a port scan detection approach called CPST uses connection status and pattern of the connections to detect a particular source is scanner or benign host. We also show that this approach works efficiently under different sampling methods.
Traffic analysis is essential for the network security, especially for intrusion detection system. Port scanning is one of the anomaly detection, which is generally carried out in the network for the security purpose. When an intruder or attacker wants to do any harmful activity in the network, then first he want to analyse the entire network, for example, which operating systems are using in network or what ports are open or accessible or which service is running on the particular host. So there is a need of intrusion detection techniques which identify the scanner in the early stage of network based on sampled as well as non-sample data and generate the alert to the network administrator.
In the present scenario, the network becomes larger and larger day by day and the link speed is also increasing. This results in the huge amount of traffic data in the network. It is very difficult to analyse that huge data due to limited resources like CPU, memory, etc. So to cope with the increasing link speed, sampled traffic data are used as an input for various anomaly detection or scan detection like “Denial of service attack” or “Port scan attack”. However, sampling distorts traffic statistics such as mean rate and flow size distribution. But it is very useful for analysing the network traffic for detecting the attacks. Therefore, various sampling methods like packet sampling such as Cisco Net-flow [
IP Packet attributes used by Net-Flow:
・ IP source address;
・ IP destination address;
・ Source port number;
・ Destination port number;
・ Protocol type.
In the literature, various port scan detection techniques have been developed like TRW [
The main idea behind CPST is that it is based on connection status as well as pattern to have a low degree of false positive and high degree of efficacy. We pursue the problem in the general framework of port scan detection through connection status as well as pattern of connection in the sampled data. In connection status approach a decision is made on the basis of the status of the connection, i.e. the connection is established or connection is failed. In connection pattern approach, a decision is made on the basis of the pattern of the connections, for example, calculate the ratio between the destination IP’s and the destination Port’s and then make a decision based on those values, that particular source is scanner or benign host.
The remaining of this paper is organized as follows. In Section 2, the port scan detection is widely explained. In Section 3, we describe existing two sampling algorithms (TRW and TAPS) used in the network. In Section 4, we provide the detail of CPST approach with mathematical analysis. In Section 5, we compare the performance of CPST with TRW and TAPS under two sampling techniques and Section 6 concludes the paper.
In [
A large number of port scan detection techniques have been proposed in the literature. These techniques are broadly categorized into two categories, namely “single source scan detection” and “distributed scan detection”. These techniques are further divided into sub categories like threshold based, algorithmic based, soft computing based or rule based etc. [
The main idea behind TRW [
For a given source r let Yi be a random variable that represents the outcome of the first connection attempt by r to the ith distinct local host, where
With these two hypotheses, four outcomes are possible when a decision is made as shown in
TAPS [
In [
In [
In this section, two sampling techniques are described: random packet sampling and random flow sampling.
Packet sampling techniques are currently being standardized by the Packet Sampling (PSAMP) Working Group of the Internet Engineering Task Forces (IETF) [
Sr. No. | Original source | Algorithm outcomes | Decision |
---|---|---|---|
1 | Scanner (under H1) | Under H1 | True Detection |
2 | Scanner (Under H1) | Under H0 | False Negative |
3 | Benign Host (Under H0) | Under H1 | False Positive |
4 | Benign Host (Under H0) | Under H0 | Normal |
packet sampling technique is mainly of two types (1) Systematic Packet Sampling and (2) Random Packet Sampling. Systematic packet sampling involves the selection of packets into a systematic method or according to a deterministic function. In Random packet sampling the selection of packets is generated according to a random process.
A flow in RTFM [
One of the main characteristics of the scanner is that maximum time they do not make a successful connection with the server or destination or they do not complete three way handshaking. The second characteristic of the scanner is the ratio between the destination host ip vs. destination host port for a particular source is always greater than a particular value k. So, if the ratio of destination ip/port or destination port/ip is greater than this value k, then the particular source is treated as a scanner, and if its value is less than k then it is declared as benign host.
The novel feature of our proposedtwo pass port scan detection technique―CPST is that it uses both connections status and connection pattern approaches for the detection of scanners. In connection status approach, a decision is made on the basis of the status of the connection, i.e. the connection is established or connection is failed. In connection pattern approach, a decision is made on the basis of the pattern of the connections, for example, calculate the ratio between the destination IP’s and the destination Port’s and then make a decision based on those values, that particular source is scanner or benign host. In CPST, two levels of detection are performed. In the first level, the scanner is detected on the basis of pattern of destination ip/port or vice versa for a particular host.
In the second level, connection status is checked and a decision in made for a source in accordance to connection status. CPST is based on the sequential and pattern inference testing. Sequential inference testing observes connection status of each source IP in a flow to check whether the connection is fail or successful. For particular, IP if connection is fail then there are more chances of having a scanner or if the connection is successful then there are more chances of having benign host. Similarly pattern inference testing observes the connection pattern of each source to check whether it is scanner or benign host (see
Let us suppose that when a remote source or a local source r makes a connection attempt to a local destination, then an event E is generated. The result of that event is either a “success” or a “failure”, depending on the connection status of the particular source. Now there are two possibilities of connection of a particular source to a destination host, either the source tries a connection attempt to an inactive host or to an inactive service or it tries a connection attempt to an active host or active service. Now if the host is a scanner then it will try to connect with different ports on a same destination IP or same port on different destination IP addresses.
In CPST, sequential hypothesis technique is used. As per the metric of access pattern for a scanner:
The indicator random variable is defined as follows:
There are possibilities of four events associated with the random variable Yi and their probabilities [
where, H0 is the set of benign hosts and H1 is the set of scanners. The observation that a connection attempt is more likely to be a success from a benign source than a malicious one implies the condition:
Whenever an event occurs, the sequential hypothesis testing updates the likelihood ratio (flow: srcip) is defined similarly to the TRWSYN and TAPS cases as follows:
Yi can take the value 1 or 0 depending upon the above mentioned conditions.
H1 (Scanner) and update Ss (Source) and removed that source from Sn (List of sources under test) and add it in to SCn (List of scanners)
else
H0 (Benign Host) and update Ss (Source) and removed that source from Sn (List of sources under test) and add it in to BHn (List of Benign Host)
esle
Continue with more observations (results pending).
PF is probability of false positives and PD is the probability of detection for port scan detection [
The performances of existing techniques are evaluated mainly on the basis of the detection rate and false positive rate metrics. The performance of CPST is evaluated and analysed with existing algorithms (TRW and TAPS) on the basis of these metrics.
The detection or success rate is defined as the ratio of total number of detected scanners in a dataset to the total number of scanners as shown in Equation (2). In the ideal case the detection rate is equal to 1.
The false positive rate is defined as the ratio of total number of false scanners detected to the total number of scanners present in dataset as shown in Equation (3). In other words, if a benign host is considered as a scanner then the result is called false positive. In the ideal case the false positive rate is equal to 0.
DARPA dataset [
not at its minimum value. It increases a while for low sampling rate, but when sampling rate increases the ratio monotonically decreases and it reaches nearly to zero for the higher sampling interval. But CPST exhibits the lower false positive rate as compared to TRW with packet sampling and slightly more as compared to TRW with flow sampling, and lower false positive rate as compared to TAPS with both sampling (packet sampling and flow sampling). In
In this paper, we present a two pass port scan detection technique called CPST which uses the fundamental concepts of connection status and pattern of connections for detecting the scanner or malicious host in the network. CPST is an effective technique. We compare the performance of this technique using DARPA data setting for packet sampling and flow sampling with existing TRW and TAPS scan detection techniques. The results show that CPST has better success and false positive ratio. It gives better detection ratio under high sampling rate as compared to the existing scan detection techniques, but CPST exhibits the lower false positive rate as compared to TRW with packet sampling and slightly more as compared to TRW with flow sampling and TAPS with both sampling (packet sampling and flow sampling). The proposed scheme exploits the access pattern and status of a particular source in a network flow. The success rate of the proposed scheme is about 61 % and the false positive rate is less than 2 % with higher sampling interval.
SunilKumar,KamleshDutta,AnkitAsati, (2015) Two Pass Port Scan Detection Technique Based on Connection Pattern and Status on Sampled Data. Journal of Computer and Communications,03,1-8. doi: 10.4236/jcc.2015.39001