Anomaly based approaches in network intrusion detection suffer from evaluation, comparison and deployment which originate from the scarcity of adequate publicly available network trace datasets. Also, publicly available datasets are either outdated or generated in a controlled environment. Due to the ubiquity of cloud computing environments in commercial and government internet services, there is a need to assess the impacts of network attacks in cloud data centers. To the best of our knowledge, there is no publicly available dataset which captures the normal and anomalous network traces in the interactions between cloud users and cloud data centers. In this paper, we pr esent an experimental platform designed to represent a practical interaction between cloud users and cloud services and collect network traces resulting from this interaction to conduct anomaly detection. We use Amazon web services (AWS) platform for conducting our experiments.
Intrusion detection is a very interesting topic among the researchers. In particular, anomaly detection is of high interest since it helps in detecting many novel attacks. However, there has not been a proper application of this system in the real world due to the complexity of these systems, as these require continuous testing and evaluation and proper tuning prior to deployment [
Another challenge is the comparison of IDS systems against one another. The lack of appropriate public dataset severely affects the evaluation of IDSs mainly affecting anomaly based detectors. Many existing datasets (KDD & DARPA etc.) [
Cloud security issues have recently gained traction in the research community where the focus has primarily been on protecting servers on cloud providers (securing the low level operating systems or virtual machine implementations). Unsecured cloud servers have been proven to be crippled with novel denial-of-service attacks. Most existing work on network traffic generation has not focused on applicability in the area of network security and evaluation of anomaly based techniques. The authors in Sommer and Paxson [
DHS Predict is a distributed repository of many hosts and providers at major universities and other institutions. Datasets mainly include Domain Name System (DNS) data, Internet Traffic Flow, Border Gateway Protocol (BGP), Internet Topology Data, Intrusion Detection System (IDS) and Firewall Data, and Botnet Behavior. Access to this dataset is available to certain verified accounts at some locations. Despite the major contributions by DARPA (Lincoln laboratory) [
Cloud computing is a general term for anything that involves delivering hosted services over the Internet. It is a nascent technology. These services are broadly divided into three categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS). Supplying all those services at that scale requires can be achieved by expanding the hardware which makes up the datacenter. These services are delivered in various means (like private, public or hybrid cloud) by the cloud service providers (CSP) [
AWS is located in 9 geographical regions : US East (Northern Virginia), US West (Northern California), US West (Oregon), AWS GovCloud (US) Region, Sao Paulo (Brazil), Ireland, Singapore, Tokyo and Sydney [
Amazon web services (AWS) is an evolving and comprehensive cloud computing platform provided by Amazon.com. The first AWS was launched in 2006 to provide online services for websites. Sometimes web services are also known as remote or cloud services. AWS is distributed geographically into regions to ensure robustness and minimize the outages impact. The AWS offers many services like cloud drive, cloud search etc. [
PROVIDER | REGION & SUBREGION |
---|---|
AWS US AWS US AWS US AWS AWS AWS AWS AWS AWS Google Google Google Microsoft Microsoft Microsoft Microsoft Microsoft Microsoft Microsoft Microsoft | US East (N Virginia) US West (N California) US West (Oregon) GovCloud (Oregon) South America (Sao Paulo) EU (Ireland) Asia Pacific (Singapore) Asia Pacific (Tokyo) Asia Pacific (Sydney) Central US (Council Bluffs, IA) Central US (Pryor Creek, OK) Europe (Europe) Azure North-central US (Chicago, IL) Azure South-central US (San Antonio, TX) Azure West US (California) Azure East US (Boydton, Virginia) Azure East Asia (Hong Kong, China) Azure South East Asia (Singapore) Azure Northern Europe (Dublin, Ireland) Azure West Europe (Amsterdam, Netherlands) |
online storage service Dropbox which uses the IAAS (Infrastructure as a service) of the AWS. Once a file is added to Dropbox the file is transferred to Amazon S3 after encryption to various datacenters across USA. Similarly the download process is also the same. All AWS offerings are billed according to usage from service to service.
Users access cloud computing using networked client devices, such as smartphones, desktop computers, laptops, tablets. The users are classified into two categories: Mobile cloud user’s & Stationary cloud users. Mobile cloud users are the clients with access to mobile devices like smartphone, tablet etc. which use the resources of the cloud provider. Stationary users are the ones like desktop computers for accessing the cloud and also for performing research related to it. There are two main examples of stationary cloud users that are used for the research: PlanetLab, EmuLab. For our experiments we use PlanetLab nodes which mimic stationary cloud users. PlanetLab is a group of computers available as a test bed for computer networking and distributed systems research. PlanetLab is a great tool for performing large-scale Internet studies. Its power lies in that it runs over the common routes of the Internet and spans nodes across the world, making it far more realistic than a simulation. PlanetLab nodes utilize virtualization software, allowing applications to have full access to the system kernel [
During the last decade, anomaly detection has attracted the attention of many researchers to overcome the weakness of signature-based IDSs in detecting novel attacks. There are often limitations to test and evaluate a novel network concept/solution on a real network. Hence, most researchers rely on captured network traffic data to evaluate the performance of their proposed network concept/solution. Also there is a scarce commodity among network research community for a real network traffic dataset. Various network problems have been analyzed, evaluated & validated based on captured network traffic. Hence it is important to maintain the completeness and quality of the network traffic dataset. There are many examples of network datasets like KDD CUP-99 [
Intrusion detection systems (IDS) are classified into Anomaly based & Signature based. Signature based detection involves searching the traces (packets & bytes) for malicious traffic. The advantage of this technique is that if you know the network behavior you are trying to identify it is easy to develop and understand the signatures.
One of the highest priorities of this work is to generate realistic background traffic. The main concern is to accurately reproduce the quantity and time distribution of flows for HTTP protocol (since majority of the traffic using the web is based on HTTP). To achieve this we have generated series of time instances which send web requests from the planet lab nodes to the EC2 server as shown in
To model HTTP requests, several approaches are available. The majority of work in literature is based on well-known statistical distributions. These probability distributions are analytically well described and have the advantage of being compact and easy to evaluate. We have generated series of random time instances using mean (μ) and standard deviation (ơ) which follow both normal and uniform distributions separately. These are used to generate series of web requests from the PlanetLab nodes to the EC2 server. Algorithm 1 explains in detail the mechanism for generating the web requests from normal nodes.
The algorithm (Algorithm 1) is executed in the PlanetLab nodes which are used as the clients for generating the web requests. For our experiments we chose 4 nodes for generating normal instances. Initially the start time (t{1}) of the node is calculated and then a time sequence (t) is read from the file. Then the thread is made to sleep for some time and then again the current system time (t{2}) is calculated. The time difference (d) between current time (t{2}) and start time (t{1}) is calculated. Then the difference (d) is compared with the time sequence (t). If (d) is more than (t), this is the time to start the web request (t{3}). The node generates a request to the EC2 server to which the server responds. There is a TCP Handshake taking place between the node and the server. Then the file is downloaded onto the node. The end time (t{4}) is calculated after the web request response is completed. The web response time (WRT) is the difference between the end time (t{4}) and start time (t{3}). In this way WRT is calculated for one sequence. If (d) is less than (t) then the new time sequence is read from the file.
Since the proposed dataset is intended for network security and intrusion detection purposes [
PlanetLab Nodes IP Address |
---|
pl2.eecs.utk.edu 160.36.57.173 pli1-pa-6.hpl.hp.com204.123.28.57 planetlab2.unl.edu129.93.229.139 planetlab2.cesnet.cz195.113.161.83 |
complete without a diverse set of attack scenarios. Attack traffic represents an attack scenario in an unambiguous manner. In the simplest case humans can carry out these attacks, and in the ideal case the autonomous agents can be used along with the compilers to carry out these attacks. Today, cloud computing systems are providing a wide variety of services and interfaces to the customers. There are various threats to these services which are explained in this paper. Our aim here is to mimic the actions of malicious hackers by performing multi-stage attack scenarios, each carefully crafted toward achieving a predefined set of goals [
・ DDoS (Distributed denial of service)
・ Man-in-the-Middle attack or ARP spoof
・ Portscan
Distributed denial of service (DDoS) is an attack which many nodes systems attack one node all at the same time with a flood of messages. A distributed denial-of-service (DDoS) attack is one in which a multitude of compromised systems attack a single target, thereby causing denial of service for users of the targeted system [
The testbed network architecture for DDoS as shown in
When launching an Amazon EC2 instance we need to specify its security group. The security group acts as a firewall allowing us to choose which protocols and ports are open to computers over the internet. We can choose to use the default security group and then customize it, or can create our own security group. Configuring a security group can be done with code or using the Amazon EC2 management console [
SSH was used so that remote hosts could communicate with the EC2 server, HTTP was used for the website to be accessible; RDP was used so that the server could be launched from our system, DNS so that the server could be accessed with a DNS. All the other protocols were blocked.
Load balancing is when the processes and communications are distributed evenly across a network. When it’s
PlanetLab Nodes IP Address |
---|
peeramide.irisa.fr 131.254.208.10 planetlab01.tkn.tu-berlin.de 130.149.49.136 planetlab2.tsuniv.edu 206.23.240.29 planetlab1.cs.uoregon.edu 195.113.161.83 |
Protocols | Port |
---|---|
SSH HTTP RDP DNS | 22 80 3389 53 |
difficult to predict the number of requests that will be issued to a server we use load balancing. Busy Web sites typically employ more than one web server in a load balancing scheme. If one server is full of requests, the requests are forwarded to another server with more capacity [
This attack is designed toward performing a stealthy, low bandwidth distributed denial of service attack without flooding the network. We will be using “slowloris” [
The attack slowly overwhelms the server thereby bringing down the service completely. When the attack stops the server starts automatically again.
ARP spoofing is a technique where spoofed messages are sent by the attacker into the LAN (Local area network). The attacker machine sits anonymized in between the host and the gateway and captures the traffic both ways. The technique it uses for capturing the traffic is IP forwarding. Many of today’s networks are built on what is called the eggshell principle: hard on outside and soft on the inside. This means that if an attacker gains access to a host on the inside, she can then use the compromised host as a pivot to attack systems not previously accessible via the Internet such as a local intranet server or a domain controller. In our case we host two machines in the same virtual private cloud (VPC) in Amazon EC2, one machine acts as host and the second machine will be the attacker. The attacker machine will capture the traffic between the host and the gateway as shown in the
For this experiment we consider the following: Two Windows 2008 server EC2 instances in the same subnet, Wire-shark, Ettercap-NG. We rent two instances with following configuration as shown in
Port scanning is a technique where the open ports of a server or website are probed. It is used by attackers as a means to compromise the services running on a system. We use Nmap which helps us in providing the open ports and the services running on the server. For our experiment we use Nmap to detect open ports and also any other operating system vulnerabilities by launching stealth attack as shown in
Operating System IP Address Type |
---|
Windows Server 2008 IP:107.23.207.107 Victim Windows Server 2008 IP:107.23.207.115 Attacker |
root@saikiran:~#sudo nmap -v -O --osscan-guess 72.44.46.206
Starting Nmap 4.52 (http://insecure.org) at 2014-11-01 18:10 UTC
Initiating Ping Scan at 18:10
Scanning 72.44.46.206 [2 ports]
Completed Ping Scan at 18:10, 0.07 s elapsed (1 total hosts)
Initiating Parallel DNS resolution of 1 host. at 18:10
Completed Parallel DNS resolution of 1 host. at 18:10, 0.08 s elapsed
Initiating SYN Stealth Scan at 18:10
Scanning ec2-72-44-46-206.compute-1.amazonaws.com (72.44.46.206) [1714 ports]
Discovered open port 3389/tcp on 72.44.46.206
Discovered open port 80/tcp on 72.44.46.206
Completed SYN Stealth Scan at 18:10, 34.46 s elapsed (1714 total ports)
Initiating OS detection (try #1) against ec2-72-44-46-206.compute-1.amazonaws.com (72.44.46.206)
Retrying OS detection (try #2) against ec2-72-44-46-206.compute-1.amazonaws.com (72.44.46.206)
Host ec2-72-44-46-206.compute-1.amazonaws.com (72.44.46.206) appears to be up ... good.
Interesting ports on ec2-72-44-46-206.compute-1.amazonaws.com (72.44.46.206):
Not shown: 1700 filtered ports
PORT STATE SERVICE
80/tcp open http
3389/tcp open ms-term-serv
6000/tcp closed X11
6001/tcp closed X11:1
6002/tcp closed X11:2
6003/tcp closed X11:3
6004/tcp closed X11:4
6005/tcp closed X11:5
6006/tcp closed X11:6
6007/tcp closed X11:7
6008/tcp closed X11:8
6009/tcp closed X11:9
6017/tcp closed xmail-ctrl
6050/tcp closed arcserve
Device type: general purpose
Running (JUST GUESSING): Microsoft Windows Vista|2008 (98%)
Aggressive OS guesses: Microsoft Windows Vista (98%), Microsoft Windows Server 2008 Beta 3 (96%), Microsoft Windows Vista Home Basic (91%)
No exact OS matches for host (test conditions non-ideal).
Uptime: 61.417 days (since Mon Sep 1 08:10:23 2014)
TCP Sequence Prediction: Difficulty = 262 (Good luck!)
IP ID Sequence Generation: Incremental
OS detection performed.
Nmap done: 1 IP address (1 host up) scanned in 38.886 seconds
Raw packets sent: 5194 (232.752 KB) | Rcvd: 36 (2092 B)
For the purposes of proper scanning and pinging we had to allow ICMP traffic through EC2. We had allowed this rule in the EC2 security groups. The first session is initiated from the PlanetLab node. Nmap is launched at the “tsuniv.edu’’ (206.23.240.29) PlanetLab node. It launches a stealth scan on the EC2 server which gives an approximation of the operating system and the number of open ports as shown above. The traffic is captured using the wireshark software [
Our experiments to implement the framework for generation and collection of network traces involve real world instances and systems. This usually raises an ethical debate as scanning remote network devices can sometimes lead to adverse attacks. At the same time, developing a robust framework for network traces without collecting data from the real world is very difficult. Simulation tools and performing experiments with a controlled lab environment cannot replicate the randomness of the real world network traffic. A recent journal article that discusses the ethics of security vulnerability research [
Cloud computing offers many services to their clients including software, infrastructure etc., but they pose significant security risks to customer applications and data beyond what is expected using traditional on-premises architecture. These security risks can be well understood if we have access to the network traces in the cloud. Most of the network trace datasets are proprietary and cannot be shared due to privacy reasons; others are heavily anonymized and do not reflect current trends and lack certain statistical properties. Also, publicly available datasets are either outdated or generated in a controlled environment [
Due to the ubiquity of cloud computing environments in commercial and government internet services, there is a need to assess the impacts of network attacks in cloud data centers. We present a systematic approach to design and develop an experimental platform designed to represent a practical interaction between cloud users and cloud services and collect network traffic traces resulting from this interaction to conduct anomaly detection. Our results show statistical differences between normal and anomalous network traffic traces which can be exploited by anomaly detection systems to detect and isolate adversaries in the cloud data centers. In future, we plan to implement the captured traffic on the IDS (Intrusion Detection Systems) for better understanding of anomalies and also to reduce the false positives.
This work was partially supported by Department of Homeland Security (DHS) SLA grant 2014-ST-062- 000059 and Office of the Assistant Secretary of Defense for Research and Engineering (OASD(R&E)) under agreement number FAB750-15-2-0120.
Sai Kiran Mukkavilli,Sachin Shetty,Liang Hong, (2016) Generation of Labelled Datasets to Quantify the Impact of Security Threats to Cloud Data Centers. Journal of Information Security,07,172-184. doi: 10.4236/jis.2016.73013