On a Generic Security Game Model

doi:10.4236/ijcns.2017.107008

International Journal of Communications, Network and System Sciences
Vol.10 No.07(2017), Article ID:78063,31 pages
10.4236/ijcns.2017.107008

Vivek Shandilya¹, Sajjan Shiva²

●How to Cite this Article

¹Department of Computing Sciences, Jacksonville University, Jacksonville, USA

²Department of Computer Science, University of Memphis, Memphis, USA

This work is licensed under the Creative Commons Attribution-NonCommercial International License (CC BY-NC 4.0).

http://creativecommons.org/licenses/by-nc/4.0/

Received: June 24, 2017; Accepted: July 25, 2017; Published: July 28, 2017

ABSTRACT

To protect the systems exposed to the Internet against attacks, a security system with the capability to engage with the attacker is needed. There have been attempts to model the engagement/interactions between users, both benign and malicious, and network administrators as games. Building on such works, we present a game model which is generic enough to capture various modes of such interactions. The model facilitates stochastic games with imperfect information. The information is imperfect due to erroneous sensors leading to incorrect perception of the current state by the players. To model this error in perception distributed over other multiple states, we use Euclidean distances between the outputs of the sensors. We build a 5-state game to represent the interaction of the administrator with the user. The states correspond to 1) the user being out of the system in the Internet, and after logging in to the system; 2) having low privileges; 3) having high privileges; 4) when he successfully attacks and 5) gets trapped in a honeypot by the administrator. Each state has its own action set. We present the game with a distinct perceived action set corresponding to each distinct information set of these states. The model facilitates stochastic games with imperfect information. The imperfect information is due to erroneous sensors leading to incorrect perception of the current state by the players. To model this error in perception distributed over the states, we use Euclidean distances between outputs of the sensors. A numerical simulation of an example game is presented to show the evaluation of rewards to the players and the preferred strategies. We also present the conditions for formulating the strategies when dealing with more than one attacker and making collaborations.

Keywords:

Security System Architecture, Stochastic Game, General Sum Game, Imperfect Information, Numerical Simulation

1. Introduction

Ensuring security of a system delivering important services, with exposure to the Internet, has many challenges. When such a system is distributed with many stakeholders , the task of securing such system becomes more complex. A typical system, delivering one or more services would be operating with the co- ordination of processes and data on many different systems exposed to the Internet with different degrees of security. Each stakeholder has his own computational space. Thus each stakeholder providing a service to other stakeholders has to ensure the security of the legitimate users’ processes and data while also dealing with malicious users appropriately [1] . In the recent time, many successful attacks have been new, persistent and evolved over time.There have been attempts to employ game theory to model and address such attacks [2] . These attacks are accomplished with multiple steps. When an attacker begins to attack, at each step he is breaching the security of the computational space of his service provider and other related stakeholders. The administrators and security systems must defend the system from such attackers. To do so effectively, it is important to detect each anomalous behavior at the earliest and respond appropriately to minimize the damage. But the information received from the detection of an anomalous behavior may not be sufficient to classify the user as an attacker or a legitimate user who made an innocent fumble or an unusual workload. In such cases the security system must engage with the user to get sufficient information at the earliest to deal with him appropriately. Since such engagement by the administrator or the security system is an expensive overhead, only a detection of an anomalous behavior must trigger such engagement. Thus, we need an event (detection of anomalous behavior) driven engagement and a security system facilitating it.

The engagement of the suspected user and the administrator/security system is essentially an interaction between two decision making entities. We observe that game theory provides the relevant framework for such a mechanism design. Hence, we present here an architecture for such a security system based on game theory.

Applying game theory to computer/cyber/network security has been an active research area since 2005 [3] . The crucial part of these attempts is to build appropriate game models to address the security situations. Most of the efforts have been to address a very specific attack by devising a suitable game model and executing a game. Roy, et al. [4] provides a survey of the works to model the interaction between a user and a network administrator and a classification of these works based on the game models used. Since our architecture is aimed at addressing different attacks it must have game models appropriate to each attack. But we observe that a generic game model which is abstracted enough to address the typical scenarios would facilitate well in deriving the specific game models. We present a generic game model, a game and numerical simulation to validate our analyses of its utility.

As one of the early attempts, [3] modeled the interaction between an attacker and the administrator as a stochastic game with 14 states considering 3 types of attacks. Their game assumed perfect information. [5] presented a two state, imperfect information, zero sum, stochastic game with numerical simulation showing the advantage of considering the imperfect information. The main motivation for considering the imperfection in the information was the errors in the player’s sensors. The error in the sensor makes the player believe that he may be in the states other than the state he really is. We make two extensions to this work. The simulation method used to study and explore the value of the game and the rewards for the strategies are an extension of [6] which has only 5 strategic options.

Our model has the same structure as [5] with one extension. When there are more than two states in the game, this error in sensor could make him mistakenly believe he is in any of the other states than the one he really is in. The error in perceiving the current state gets distributed over the other states depending on the sensor reading’s error needed to misread the state as the current state. Based on this extension in the game model, a game with five states is designed. The error in the perception of current state gets distributed over the four states other than the real state. Thus, the probability of the player¹ being deluded to be in any of the other states is proportional to the distance between the sensor readings of state defining variables of the other states from that of the current real state.

The second extension is in the game. Since the set of actions available at each state are distinct in the game considered here the extended action sets will be different from the original action sets. For the administrator the sensor error is the cause of perceiving the current state as an information set instead of a single state. The user, who is assumed in this work to have an error free sensor, also faces deception while game enters the Trap state. Then administrator’s ploy in the honey pot makes the user unsure of his current state and causes extension of his perceived action set. This increase in the size of the perceived action set at honey pot for the user, making his/her choosing the action less obvious due to increased apparent choices. Following these interactions, administrator will be able to determine the motive of the user under suspicion. This whole interaction based on the administrator ploy is abstracted into one single state.

We analyze the game for its solution discussing the preferred strategies. A numerical simulation implemented in the programming language C of the game validating the analysis is presented. In evaluating the simulation, we discern that the strategy has to precipitate into game theoretic actions, which in turn into computer/network administrator actions. We present a gradation of aggression the players can assert and study the game’s dynamics.

The main motivation for this work is to devise a game model and a game generic enough to incorporate real security games, needed to build a security system. [7] presents the architecture to construct a security system based on the game theory approach. The generic game model and the game here are useful to derive different game models and instantiate games, appropriate for different security situations. The major contributions of this work are summarized as below.

1) We present a security system architecture, its components and the detailed operational flow.

2) We extend the previously proposed game model by devising a method for modeling the distribution of sensor error over multiple states.

3) We devise a game which is representative of the security related interac- tions. The game has 5 states, each having a distinct action set, resulting in dis- tinct perceived/extended action set corresponding to each distinct information set of states.

4) We present the expression for the generalized Imperfect Information Factor in Appendix B.

5) We present the numerical simulations studying the dynamics of such a game when played with malicious users of different natures, and picking the preferred strategy profile.

6) We present the framework to actively interact with suspicious players in honeypot.

7) We present the conditions for interacting with a group of attackers and for collaboration.

2. Related Works

A survey of application of game theory for network security based on the targets in a network is at [8] . A survey of different network related attribution to the game theoretic actions with equilibrium calculation is at [9] . A simulation of security game with 5 strategic options of administrator and the adversary is at [6] . A dynamic altering of the properties of a computing platform based on the game-theory to counter the attacks is studied with simulations at [10] . A zero- sum game model is presented at [11] . A game theoretic framework with classifi- cation of the attackers based on the skill and motivation, with solution based on solving the Markov chain is at [12] . A game theoretic attempt to secure the infrastructure resources is at [13] . Computation of 2-Player Nash Equili- brium is shown at [14] to be complete in the class PPAD introduced at [15] . An algori- thm for general-sum stochastic game in two scenarios is at [16] . Considering behavioral probability weighting of players on their equilibrium strategies in three types of security games is presented in [17] . In this work, the weighting of players results in capturing the situations where different objectives are needed for attacks to succeed, as noted by the Total Effort, Weakest link and Best Shot games. We here use the different behaviors (as explained in Section 7.1) of players as a parameter which they vary to achieve optimal result against the adversary.

3. Game Inspired Security Architecture

Here we present the architecture. We refer to the security architecture as Game Inspired Defense Architecture (GIDA) (Figure 1).

Figure 1. GIDA.

・ Target System: It is the computational space in CBS which is being protected by the GIDA security system. As is prevalent, such a system would be a network of many systems. All the interaction with other blocks happen through the Sensors & Actuators (S-A) blocks. It is having a two-way data link and a two-way control-signal link with the (S-A) block. The target system is the CBS of the organization which would be using GIDA-based as a real time security system to protect itself from attacks and failure. It would typically contain the usual subsystems that make up any of the present day industrial business system, like database server, web server, confidential files, etc.

・ Internet: This block is the external world with which the target system communicates during its operation. The system has no control over the form and content of what happens there and should only deal with what comes from it, into itself. The interaction of data coming in and going out into the Internet from the system happens through the (S-A).

Thus there is a data link with the (S-A) block.²

The Internet is functionally connected in layers. Typically, the immediate external surface would be that of the Internet Service Provider for the company. Then those provided by the clients and those interacting with in that surface, and so on.

・ Sensors: This block is the channel through which all the interactions happen between the Internet and the target system. The sensors and actuators are a bit indistinguishable functionally in terms of distinct systems as interactivity of the system components is maturing with technology. For example, even though Firewall keeps a gate to disallow some traffic as per the prescribed policies, it also gives a count of what is the attempted traffic and the one that does go through, to the defense system. In the first function it is acting as an actuator while in the second as a sensor. But the predominantly sensing com- ponents are put into sensors. An example for sensors operating at application layer is at [18] .

・ Actuators: This block has the actuators.

The anomalies can happen high up at the application layer or the lower layers like Network layers, Transmission layer, Data link layer and so on. With the convergence in the datacom technologies the distinction of these layers is also not clear in the present times. So, what ever the sensors and actuators the operate with a given functionality space is contained in this block.

Similarly, with the maturing of technology of interfaces the same devices/ systems some times do both sensing and actuating functions. So, that classifica- tion is also not a hard one. But the systems which are able to sense the anomalies, at what ever the layer are termed as sensors and the systems which are able to act upon to affect the Target System’s behavior are considered as actuators.

There is typically some inbuilt set of actions based on the standard prescribed policies, which allow the actuators to take actions. Responses for some obvious anomalies are also covered by them. But when some anomaly is observed which is not obvious then those readings/values/data/related to that event is sent to the GIDA Control Unit (GCU). Then the GCU transfers these observations to the Knowledge Management System (KMS) and receiving an action plan based on an appropriate game model it will direct/send control signals to the actuators to take actions to deal with the scenario. Subsequently the sensor functionality of the systems in this block keep sending any required information back to the GCU.

In case of an observed anomaly which has a corresponding prescribed action for the sensor/actuator system to counter, then it shall be acted upon as prescrib- ed. But never the less the event is reported to the GIDA Control Unit, at the earliest, for it to be logged in and documented in the KMS for further analysis and forensic activities to learn about the system and scenarios.

It is having both a two-way data link and two-way control-signal link with the block Target System and a data link with the block Internet.

・ KMS: This is the knowledge center where the information related to the observed anomalies is sent in and a defense action plan is got as the output to the GCU block. To do the attack identification effectively, an up-to-date knowledge is needed about the attacks, risks, vulnerabilities etc. There are many online government databases maintaining such knowledge bases. This block keeps itself updated from those knowledge bases and reports to them its own findings.

There is a two-way data link and a two-way control link to the GCU block. There is also a two-way data link with the External Defense knowledge Systems block.³ The KMS is the block which takes the input of observations anomalies submitted through the GCU. These inputs are analyzed to identify the attack/attacks as per AVOIDIT, and to pick a suitable model as per ADAPT. Then this game model is given as output to the GCU. This can be as simple as a onetime action of shutting down a web server or restarting a database or a sequence of actions in time based on the game model.

While identifying attack, to take the advantage of the up to date online resources, the block with information related to attack, vulnerabilities, risk etc. will be communicating with the online resources to get the information about the latest attacks.

・ Honeypot: This block is the trap to which a suspected user is sent, to understand his intentions when his activity raises alarm through the sensors enough to consider it unsafe to let him interact with the main system.

It has a two-way data link and a two-way control link to the (S-A) block. This block mimics the main target system in its constitution virtually. Even though does not contain any real confidential information or high value assets, it only poses to have them.

When the anomalous behavior of a user is ambiguous enough to resist the classification as either an attacker or a normal user, further inquiry is needed to learn more about the intention of the user. If it seems unsafe to wait and watch the user to operate on the real system, it is a convenient strategy to lead him to Honeypot where he is under the illusion of being in the real system, and opera- tes as his plans. This facilitates GIDA to observe the user’s behavior through the same (S-A) block, which also is connected to the Honeypot. Here its not only observing the actions, an instigating interaction is possible leading to a classifi- cation of the user possibly with optimum time and effort. Such a classification would help in the appropriate treatment of the user.

・ External Defense knowledge Systems: This block is the online knowledge bases related to attacks, vulnerabilities, risks etc. There is a two-way data link with KMS block.

・ GCU: This block as shown in the Figure 2 is the central control unit, for the whole defense operation of GIDA. It is connected to the (S-A) which intimates when there is any anomaly observed. This starts the involvement of this unit. The control agent translates and facilitates the communication be- tween the Sensors and the KMS in one way, while from KMS to the Actua- tors in the other, as shown in the Figure 3. As and when required it sends control signals to (S-A) to abort a traffic or operation, divert it to Honeypot or communicate with the KMS to decide what to do.

Figure 2. GIDA control unit.

Figure 3. GIDA control agent.

This block controls and co-ordinates the defense/security activity of GIDA. It gets its inputs about the observed anomalies from the sensors in the (S-A) block and submits it to KMS and gets the plan of action as per a game model as the counter measure. It then uses the look-up table to send directions for such actions to be performed by the actuators in the (S-A) block.

This block consists of mainly three sub blocks. A GIDA-Control-Agent and the two other blocks with look-up tables for translating the two inputs into two outputs. The first one is a look-up table mapping inputs of observation⁴ of anomaly/anomalies from the sensors to the inputs to be submitted to KMS⁵. The second one is a look-up table mapping the output of the KMS which is an ac- tion-plan based on a game model, to the the action commands to be given to the actuators in the Actuators block. The control agent box has four constituents as shown below.

3.1. Operational Flow

The typical operational flows are as follows:

1) when there is no attack or any problems in the operation of the target system, the target system is simply interacting with its clients over Internet. Both the interactions and operations are being observed by the systems in the (S-A). The most probable event in GIDA system on such a day would be, a lucky update of the knowledge base in KMS related to some attack from an online resource.

2) When there is an anomaly observed by the sensor, which is obvious in nature, and already has a prescribed action to be taken by a co-operating actuator, then such an action would be taken immediately and the event is reported at the earliest to the GIDA Control Unit. The information received like this will be submitted to KMS for documentation and forensic purposes.

3) When anomaly is observed by some sensors, and the observed pattern does not have any obvious (reflex) action plan for the sensors/actuators, they report those observations to the GCU. Then these inputs and submitted⁶ to the KMS appropriately by the GCU⁷. The information is analyzed and attack/attacks is/are identified using attack identification system. That identification will be used to pick the appropriate game model as per ADAPT. That game model will be given as the output of KMS to the GCU. Then GCU takes prescribed sequence of action as per that game model⁸.

Once an observed anomalous behavior is not obvious enough for the actuator/sensor systems to act upon itself, the anomaly is referred to the GCU. Then through KMS, every response is formulated as a game. As everywhere in game theory literature, from the trivial interaction to the most sophisticated one is modeled as a game, we model all the responses unequivocally as a game.

4) When the information from the observed anomaly is sufficient to classify the user to be not safe to operate in the main system, while it is insufficient to classify if he is really malicious and identify his intentions, he would be directed to honeypot and the step two or three is followed as if he is on the main system.

3.2. Game Model Repository

The success of a security system based on game theoretic ideas lies on appro- priateness of the game model to represent the current security situation and the effectiveness of the prescribed strategies. Not only different attacks warrant for different games to be played with different appropriate game models, different scenarios with same kind of attack may need different game models with different considerations of rewards and transition functions. But to have a generic game model which can be used to derive such game models is crucial, as that gives the general structure with proven boundary conditions. The Figure 4 shows this connection.

Functionally GIDA is designed as an event-driven system as show in Figure 5. Computationally, it is an implementation of a Buchi Automaton.⁹ An overview of the event flow can be summarized as in this figure. A set of anomalies is detected in the operation of system to be protected (during operations by a user/ some users). This triggers the operation of GIDA. Then this information is used to identify the possible attack/attacks going on using the taxonomy AVOIDIT. This gives the attack components, the relevant game models. Using both of these in ADAPT, the relevant game model is obtained.

This Figure 6 gives a succinct event flow diagram of the events in terms of interactions between a user and an administrator. By this figure and the previous one, we can see that though we need different game models with different cha- racteristics, there seems a commonality in the security games and the modeling requirements. Until now most of the effort is focused on using game theory to address some particular attacks. But to systematize this as a paradigm for security solution, a generic game model becomes a necessity.

Figure 4. Generic game model’s utility.

Figure 5. Internal flow diagram.

Figure 6. Security event flow.

4. Game Model

^{10
$k = 0, 1, \dots, K$} for one administrator and users respectively.

The model considers that a player¹⁰ k observes the game’s true state using an im- perfect sensor/a set of imperfect sensors. That means, player k can view the present state $ξ_{j}$ to be any state in the information set $I_{ξ_{j}}^{k} = {ξ_{j_{1}}, ξ_{j_{2}}, \dots, ξ_{j_{p}}}$ with $ξ_{j}$ being an element of $I_{ξ_{j}}^{k}$ . The perceived action set at this state may be expanded, i.e., player may decide to take an action which is allowed at $ξ_{j_{i}} \neq ξ_{j}$ where $ξ_{j_{i}}$ belongs to $I_{ξ_{j}}^{k}$ . When the true state is $ξ_{j}$ , let the player k’s extended action set $B_{ξ_{j}}^{k} = \cup_{ξ_{j} \in I_{ξ_{j}}^{k}} A_{ξ_{j}}^{k}$ where $A_{ξ_{j}}^{k}$ denotes the allowed action set of player k at state is $ξ_{j}$ .If the player k takes an action $α^{k} \in B_{ξ_{j}}$ , when the true state is $ξ_{j}$ but $α^{k}$ is not in $A_{j}^{k}$ , then in terms of the influence on state transition probabilities, $α^{k}$ is considered equivalent to player k taking no action at state $ξ_{j}$ . However, its influence on player k’s payoff $α^{k}$ may not be equivalent to player k taking no action at state $ξ_{j}$ depending upon the cost of the attempted execution of $α^{k}$ . Formally, the model is represented by a tuple, $(S, E^{1}, E^{2}, A^{1}, A^{2}, Q, R^{1}, R^{2}, β)$ whose elements are defined below.

1) $S = {ξ_{1}, ξ_{2}, \dots, ξ_{N}}$ is the set of states.

2) $E^{k} = {E_{ξ_{1}}^{k}, E_{ξ_{2}}^{k}, \dots, E_{ξ_{N}}^{k}}$ , $k = 0, 1$ where the $j^{t h}$ , $0 < j < N$ , set $E_{ξ_{j}}$ with $E_{ξ_{j}}^{k} = {p_{j_{i}}^{k} | 1 \leq i \leq m_{j}, \sum_{i = 1}^{m_{j}} p_{j_{i}}^{k} = 1, p_{j_{i}}^{k} > 0}$ , represents the error probabilities of k^th player’s sensor at the true state $ξ_{j}$ over the corresponding information set, $I_{ξ_{j}}^{k}$ . $I^{k} = {I_{ξ_{1}}^{k}, I_{ξ_{2}}^{k}, \dots, I_{ξ_{N}}^{k}}$ , $k = 0, 1$ where the $I_{j}^{k}$ represents the information set of player k when the true state is $ξ_{j}$ , i.e.,

$I_{ξ_{j}}^{k} = {ξ_{j_{1}}, ξ_{j_{2}}, \dots, ξ_{j_{i}}, \dots, ξ_{j_{m_{j}}}}$

where $m_{j} = | I_{ξ_{j}}^{k} |, ξ_{j_{i}} \in S$ , with $m_{j} \leq N$ being an integer indicating the number of states that have a possibility of being considered the current state at state $ξ_{j}$ with the condition that $ξ_{j} \in I_{ξ_{j}}$ .

3) $A^{k} = {A_{ξ_{1}}^{k}, A_{ξ_{2}}^{k}, \dots, A_{ξ_{N}}^{k}}, k = 0, 1$ is the action set of player k, where $A_{ξ_{j}}^{k} = {α_{j_{1}}^{k}, α_{j_{1}}^{k}, \dots, α_{j_{M^{k}}}^{k}}$ is the action set of player k at state $ξ_{j}$ . Let

$B^{k} = {B_{ξ_{1}}^{k}, B_{ξ_{2}}^{k}, \dots, B_{ξ_{n}}^{k}}, k = 0, 1$

where $B_{ξ_{j}}^{k}$ represents the action set of player k at $I_{ξ_{j}}^{k}$ . That means $B_{ξ_{j}}^{k} = \cup_{ξ_{j} \in I_{ξ_{j}}^{k}} A_{ξ_{j}}^{k}$ . By introducing different action sets at each state we may get distinct $B_{ξ_{j}}^{k}$ at for each distinct $I_{ξ_{j}}^{k}$ . Let $T_{ξ_{j}}^{k} = | B_{ξ_{j}}^{k} |$ .

4) The state transition probabilities are represented by function $Q : S \times B^{1} \times B^{2} \times S \to [0 1]$ which maps a pair of states and a pair of actions to a real number between 0 and 1. The model assumes that for any state $ξ_{j}^{k}$ if the player k takes an action $α_{j}^{k} \in B_{ξ_{j}}^{k}$ , which is does not belong to $A_{ξ_{j}}^{k}$ , then

$Q (ξ_{j_{1}}, α_{i_{1}}^{k}, α_{i_{2}}^{l}, ξ_{j_{2}}) = Q (ξ_{j_{1}}, Normal_operation, α_{i_{2}}^{l}, ξ_{j_{2}})$

where l represents the other player.

5) The reward¹¹ of the player k is determined by the function $R^{k} : S \times B^{1} \times B^{2} \to R$ which maps a state and a pair of actions to a real number.

6) $β, 0 < β < 1$ is the discount factor for discounting the future rewards in this infinite horizon game.

4.1. The Error Distribution

As $E^{k}$ represents the set of error probabilities of the player k, let us consider the set of error probabilities $E_{ξ_{j}}^{k}$ with the current state being $ξ_{j}$ . Let the error of the sensor for player k at the state $ξ_{j}$ is $γ_{j}^{k}$ , $0 \leq γ_{j}^{k} < 1$ . The error $γ_{j}^{k}$ is always less than 1 because the real state $ξ_{j}$ is always taken as an element of the information set $I_{ξ_{j}}$ at $ξ_{j}$ . Then at the current state $ξ_{j}$ , let the probabilities with which administrator perceives the current state to be $ξ_{1}$ , $ξ_{2}$ , $ξ_{3}$ and $ξ_{4}$

be $p_{j_{1}}^{k}$ , $p_{j_{2}}^{k}$ , $p_{j_{3}}^{k}$ , and $p_{j_{4}}^{k}$ respectively. Then the error at state $ξ_{j}$ is $γ_{j}^{k} = (\sum_{i = 1}^{N} p_{j_{i}}^{k}) - p_{j_{j}}^{k} = 1 - p_{j_{j}}^{k}$ . For $1 \leq i, j \leq N$ , let $ω_{j_{i}}^{k}$ be the set of sensor in-

¹¹The expected quantified value of the outcome is called reward.

¹²It is true that some of the parameters may yield discontinuous intervals of values associated with a single state. This makes a set of disconnected regions correspond to a single state. In such cases we have to device more involved mechanisms, which is apt for future enquiries.

puts to the player k indicating the current state to be $ξ_{i}$ , while the real current state is $ξ_{j}$ . In practice the sensor can be a device or a collection of devices which collects values of some parameters of the system. All such parameters can be considered to form an orthogonal basis of the vector space, where some closed volume is taken to be associated with each state. All of those points in that closed region get mapped to one state. All of them have the values of the parameters which lead the player to perceive the current state to be the parti- cular state.¹²

Let the current real state be $ξ_{j}$ , where $1 \leq j \leq N$ . At this state the sensor inputs at two different instances be, say $ω_{j_{i}}^{k}$ and $ω_{j_{h}}^{k}$ , where $1 \leq i, h \leq N$ due to erroneous sensors. This leads to the perception of the current state to be $ξ_{i}$ and $ξ_{h}$ respectively. Depending on the nature of the system, consider some representative statistical measure of central tendencies like Mean, Mode, Median and so on, of $ω_{j_{i}}^{k}$ and $ω_{j_{h}}^{k}$ . Let $E d_{j_{i h}}^{k}$ to be the Euclidean distance between those measures.In this work, larger errors are assumed to be less probable than the smaller errors in sensor operations,that is, $E d_{j_{j i}}^{k} > E d_{j_{j h}}^{k} \Rightarrow p_{j_{i}}^{k} > p_{j_{h}}^{k} .$ In the following game with 5-states, the sensor error is distributed over the three states other than the real current state. This means for example, if the sensor error at $ξ_{1}$ is 0.3, then the probability with which the administrator perceives the current state to be not $ξ_{1}$ is 0.3. This could result in perceiving the current state to be $ξ_{2}$ , $ξ_{3}$ and $ξ_{4}$ states respectively with probabilities of 0.15, 0.1 and 0.05. From the sensor relation, we have the sensor error at state $ξ_{j} | i \neq j, γ_{j}^{k} = \sum_{i = 1}^{N} p_{j_{i}}^{k}$ related to the probabilities of virtual states.

The sensor outputs values of parameters which define states. Particular range of values of particular set of parameters will correspond to a state. In fact the general way to define it, is a set of values, covering the range, for each of the parameters to correspond to a state. And if there are some parameters whose values do not affect in deciding a particular state then then range can accom- modate any value.

The states of the system is formulated using the following.

・ values¹³ corresponding to a set of parameters¹⁴

・ a vector space with these parameters constituting the basis¹⁵

Thus we can formulate probabilities of observed instant being in the defined states as follows.

・ $S \to S - O$

where

$S - O = {S - O_{0}, S - O_{1}, S - O_{2}, \dots, S - O_{N}}$ ,

$S - O_{i} = {s - o_{i_{0}}, s - o_{i_{1}}, s - o_{i_{2}}, \dots, s - o_{i_{F i}}}$

$s - o_{i_{j}} = (s - o_{i_{j}} (0), s - o_{i_{j}} (1), s - o_{i_{j}} (2), \dots, s - o_{i_{j}} (g))$ is the $j^{t h} g$ -di- mensional sequence corresponding to $i^{t h}$ state, $ξ_{i}$ .

$s - o_{i_{j}} (l)$ = a MCT, representative value of the expected range of values for the $l^{t h}$ parameter in the $j^{t h}$ behavior $\in ξ_{i}$ . $l \in N$ , $0 \leq l \leq g$

$g$ = number of parameters observed by sensors $0 \leq j \leq F i$ , where $F i + 1$ disjoint ranges correspond to state $ξ_{i}$

¹³Range of values, represented by a MCT.

¹⁴(e.g. download bandwidth used, upload bandwidth used, number of ftp requests by a user, …).

¹⁵(orthogonal/orthonormal) depending on the parameters’ co-relation.

When there is an anomaly detected, in terms of sensor values, which do not exactly fit into any particular state and there is uncertainty about to which state the current values belong to, imperfect information must be considered.

・ Current observed Values of parameters as anomaly $C V = (c v_{0}, c v_{1}, c v_{2}, \dots, c v_{g})$ implies the current values of g parameters

・ consider the minimum of Euclidean distances with elements in each state $(Δ_{0_{f 0}}, Δ_{1_{f 1}}, Δ_{2_{f 2}}, \dots, Δ_{N_{f N}})$ . where $f i$ is the $f i^{t h}$ element, $0 \leq f i \leq F i$ has the minimum of distances between $C V$ and $s - o_{i_{f i}}$ element in the $i^{t h}$ state.

$Δ_{i_{f i}} = {\sqrt{\frac{\sum_{j = 0}^{j = g} c v_{j} - (s - o_{i_{j}})}{g}}}_{\min}$ (1)

The error in perception of states is given by

$p_{j_{i}} = \frac{\sum_{j = 0}^{j = N} Δ_{j_{f j}} - Δ_{i_{f i}}}{\sum_{j = 0}^{j = N} Δ_{j_{f j}}}$ (2)

And the error distribution is given by

・ Then the error at state $ξ_{j}$ is

$γ_{j}^{k} = (\sum_{i = 1}^{N} p_{j_{i}}^{k}) - p_{j_{j}}^{k} = 1 - p_{j_{j}}^{k}$ (3)

・ Larger errors are assumed to be less probable than smaller errors.

An simple illustration to show its application can be as below.

Example:

Let some office has a worker’s network traffic as below.¹⁶

Low_Privilege State

8AM - 6PM ={Download £ 1 Mbps, Upload £ 150 kbps}

This behavior is low-privilege, when worker works at his computer in the office.

High_Privilege State

8AM - 6PM ={Download £ 1 Mbps, Upload £ 150 kbps}

This behavior is high-privilege, when manager works at his computer.

Sensor Readings On some day, at 3:30 PM there is 1.7 Mbps download and 150 kbps upload.

1) $\frac{(0.7 + 0.3) - 0.7}{(0.7 + 0.3)} = 0.3$ Probability in Low_Privilege State. Mean Down- load: 1 Mbps

2) $\frac{(0.7 + 0.3) - 0.3}{(0.7 + 0.3)} = 0.7$ Probability in High_Privilege State. Mean Down- load: 2 Mbps

4.2. Reward and Transition Functions

¹⁶For other kinds of parameters, with crucial readings transmitted across channel, edit distance can also be a useful measure to find errors.

In a given state of the game, the action selected by the players decide their rewards and not the resulting state transition that happens depending on such a selection. The rewards are affected by the

・ The cost of attempting/executing an action. Let the unit vector representing this be $a_{1}$ .

・ The desirability of the possible state change brought by the selection of an action. Let the unit vector representing this be $a_{2}$ .

・ The amount of information being divulged to the other players regarding the intention made explicit by the selection of an action. Let the unit vector representing this be $a_{3}$ .

Depending upon the system these aspects may be mutually affecting each other or independent of each other. The reward is located in the vector space thus constructed with these unit vectors as basis. With system having, each of these aspects $a_{1}$ , $a_{2}$ and $a_{3}$ with $i, j \in {1,2,3}$ & $i \neq j$ , fully affecting the other, (colinear), $a_{i} \cdot a_{j} = 1$ , being mutually independent (orthogonal), $a_{i} \cdot a_{j} = 0$ and when partially affect each other will have $0 < a_{i} \cdot a_{j} < 1$ . This is illustrated at Appendix A.

The transition depends on the actions selected by the players in the current state and the stochasticity involved. Stochasticity is not arbitrary nor have to do with the instability of the system per se, but provides for the possibility of the state transition propelled by the actions of the players with conflicting interests.

4.3. Equilibrium

Given the information set associated with each state $j,0 \leq j \leq 4$ the corres- ponding mixed strategy is given as

$π^{k} (ξ_{j}) = [π^{k} (ξ_{i}, α_{i_{1}}^{k}), π^{k} (ξ_{i}, α_{i_{2}}^{k}), \dots, π^{k} (ξ_{i}, α_{i_{l}}^{k})],$

where $ξ_{i} \in I_{ξ_{j}}^{k}$ , $l = | A_{ξ_{i}}^{k} |$ & $α_{i}^{k} \in B_{ξ_{j}}^{k}$ . Thus, $| π^{k} (ξ_{j}) | = T_{ξ_{j}}^{k}$ . We are considering general-sum games. When the game starts, say at $ξ_{j}$ , with the input of state $ξ_{j}$ and each player’s selected action $(α_{j}^{0}, α_{j}^{1}, α_{j}^{2}, \dots, α_{j}^{K})$ , we calculate the reward at state $ξ_{j}$ of each player, $r_{j}^{k}, 0 \leq k < K$ as per the reward function $R^{k}$ . The net

reward incurred by player k is $ν_{(π^{0}, π^{1}, π^{2}, \dots, π^{K})}^{k} = \sum_{j = 0}^{- \ln_{β} y} β^{j} q_{j} ν_{(π^{0}, π^{1}, π^{2}, \dots, π^{K})}^{k} (ξ_{j}),$ where $q_{j}$ is the probability of reaching the state $ξ_{j} | ξ_{j} \in S$ during the game’s

execution at $j^{t h}$ step. $q_{j} = \prod_{i = 0}^{i = j} q_{i}, q_{0} = 1$ as game does start at $ξ_{j}$ state with $j = 0$ , $q_{j}$ is the probability of game reaching $ξ_{j}$ from $ξ_{j - 1}$ . y is the effective discounting factor deciding the threshold at horizon, due to insignificance of the effective value addition of further rewards. Since the game is an imperfect information game, $B_{ξ_{1}}^{0}$ gives the perceived action set. Here if the states $ξ_{1}, ξ_{2}, ξ_{3}$ are perceived to be the current state with a probability say, 0.1, 0.3 and 0.6 respectively, we get the probabilities associated with actions of each state multi- plied by these corresponding probabilities. With this extended action set, the selection is done, by each players, their rewards are calculated for this step.

We know that the actions belonging to $ξ_{2}$ & $ξ_{3}$ cannot be executed and its selection is equated to selecting the normal operation of $ξ_{1}$ , for the effect on state transition.Now based on these probabilities the effective action selection is evaluated for the player. Thus the Nash equilibrium would be the strategy profile

$(π_{*}^{0}, π_{*}^{1}, π_{*}^{2}, \dots, π^{K}) | c^{0} & c^{1} & \dots & c^{k} & \dots & c^{K}$

where $c^{k} \Leftrightarrow ν_{(π_{*}^{0}, π_{*}^{1}, π_{*}^{2}, \dots, π_{i}^{k}, \dots, π_{*}^{K})}^{k} \leq ν_{(π_{*}^{0} π_{*}^{1}, π_{*}^{2}, \dots, π_{*}^{k}, \dots, π_{*}^{K})}^{k},$ $\forall i, \forall c^{k} | 0 \leq k \leq K$ .

5. The Game

In this section we present the game with five states based on the model in Section 4, to address the security situations. In general the user presence in a network system can be classified as one with low and high privileges. The exact privileges could be different depending on the networks and situations, but this distinction provides distinct premises for the access, possible actions and res- ponsibilities of the user. Thus the activities, both malicious and benign, security related or not, could be respectively categorized and represented by formulating two different states. When a malicious user successfully attacks and a suspicious user gets trapped in a honeypot by the administrator for investigation, the resulting two situations are formulated as two other states. When the user logs out of the system he is considered to be out there in the Internet in another state. Since the security oriented activity happens only after the user is at least in the low privilege state, to start with, the game as in 7 explicitly begins at the Low_ Privilege_State in the Figure 7. Thus these 5 states provide a relevant representa- tion of general conditions of user activity and hence the user-administrator interaction.

$\begin{array}{l} S = {Internet_State (ξ_{0}), Low_Privilege_State (ξ_{1}), High_Privilege_State (ξ_{2}), \\ Attack_State (ξ_{3}), Trap_State (ξ_{4})} \end{array}$

$ξ_{0}$ = Internet_State: In this state, the user is out there in the world connect- ed to Internet. He has to go through authentication as in Sign_In successfully to get in to the system. For this, the user must take the Sign-In-Request action and the administrator must grant/take-the-action Sign_In to the user. The normal operation in this state by the user means that he is not sending a legitimate request for Sign_In, and just like millions on the net, is just out there at some IP-address. The normal operation in this state by the administrator means that he is not signing-in this user by granting the permission to be in the system. The cracking efforts to break in to the system by a user is also captured by the normal operation of the user. The blocking, refusal to letting in or system just being unavailable is also captured by the administrator’s normal operation. This is to emphasize the main focus to be the interaction of a user while he is already in the system.The game reaches this state:

・ when the administrator successfully signs out a user from any of the other 4 states, in response to user’s any choice of action including a Sign_Out_ Request at that state.

-when the user having completed his legitimate jobs in the system goes away.

-when the malicious user having completed his malicious tasks escapes out of the system out in to the Internet.

Figure 7. The game: States & transition.

・ when a person connected to the Internet tries to sign in to the system with a Sign_In_Request but the administrator refuses to let in.

・ when a person connected to the Internet tries to sign in to the system with a Sign_In_Request and the administrator allows to let in but the operation fails (due to bad connection, system problem, etc. captured by the stochasticity) and the user remains outside the system on Internet.

Thus the action set of both the administrator & the user is as below.

$\begin{array}{l} A_{ξ_{0}}^{0} = {Normal_Operation_IS_U (α_{0_{0}}^{0}), Sign-In_IS (α_{0_{1}}^{0})} \\ A_{ξ_{0}}^{1} = {Normal_Operation_IS_A (α_{0_{0}}^{1}), Sign-In_Request_IS (α_{0_{1}}^{1})} \end{array}$

$ξ_{1}$ = Low_Privilege_State: In this state the user is in the system passing through the initial authentication. This is the state when the administrator can take notice of the user’s activity. Any anomaly in the user behavior will start attracting administrator’s attention/concern. The game can reach this state,

・ when the user is already in the system after logging in going through the initial authentication successfully. The administrator can observe the user behavior, though he is not particularly concerned, to begin with.

・ when the transition from $ξ_{1}$ to $ξ_{2}$ does not succeed, either due to the given the transition probability (system’s stochasticity), or the administrator may decide not to promote the user to $ξ_{2}$ .

・ when the user in $ξ_{2}$ acts to return to low privilege, the administrator demotes and the transition succeeds.

・ when the administrator decides that a user does not have any malicious intention at $ξ_{4}$ & lets him back into the system.

・ when the user who successfully attacked the system at $ξ_{3}$ , returns to legitimate & low key behavior with in the system either to continue to be so or to elope later. This happens when the user takes the action Return_LPS and the administrator takes the action $Normal_Operation_AS_A (α_{1_{1}}^{2})$ the transi- tion succeeds as per the occurrence ratio.

$\begin{array}{l} A_{ξ_{1}}^{0} = {Normal_Operation_LPS_A (α_{1_{0}}^{0}), Sign-Out_LPS (α_{1_{1}}^{0}), \\ Promote (α_{1_{2}}^{0}), Defend_LPS (α_{1_{3}}^{0}), Trap_LPS (α_{1_{4}}^{0})} \end{array}$

$\begin{array}{l} A_{ξ_{1}}^{1} = {Normal_Operation_LPS_U (α_{1_{0}}^{1}), Sign-out_Request_LPS (α_{1_{1}}^{1}), \\ Privilege_Request (α_{1_{2}}^{1}), Attack_LPS (α_{1_{3}}^{1}), Resist_LPS (α_{1_{4}}^{1})} \end{array}$ ¹⁷

¹⁷Here we note that the Normal_Operation is an abstraction of the actions that are typical to that state with no security consequences. Since they are going to be different at different states for both the players, we denote each with different notation to distinguish them.

$ξ_{2}$ = High_Privilege_State: In this state the user has gained more privileges than in the $ξ_{1}$ . For example, when a user who entered to check his bank account balance has gone into (the higher privileges) editing mode of account information, like the passcode, usually has to go through more authentication. The game reaches this state

・ when the user successfully receives the privileges by the administrator to move from the Low_Privilege_State.The user should chose the action Privilege_ Request and administrator should choose the action Promote, and depending on the game’s stochasticity, if the transition succeeds, $ξ_{2}$ is reached.

・ when the user continues to operate with the privileges granted by the administrator with neither attacking nor being trapped by administrator. That is, when both the players are choosing Normal_Operation, depending on the occurrence ratio the game could remain here.¹⁸

・ when the user after successfully attacking the system in $ξ_{3}$ and completing the intended damage returns to operating as a High_Privilege user. The user should choose Return_HPS, the administrator Normal_Operation_AS_A and the transition should succeed.

$\begin{array}{l} A_{ξ_{2}}^{0} = {Normal_Operation_AS_U (α_{2_{0}}^{0}), Sign-Out_HPS (α_{2_{1}}^{0}), \\ Demote (α_{2_{2}}^{0}), Defend_HPS (α_{2_{3}}^{0}), Trap_HPS (α_{2_{4}}^{0})} \end{array}$

$\begin{array}{l} A_{ξ_{2}}^{1} = {Normal_Operation_AS_A (α_{2_{0}}^{1}), Sign-Out_Request_HPS (α_{2_{1}}^{1}), \\ Privilege_Relinquish (α_{2_{2}}^{1}), Attack_HPS (α_{2_{3}}^{1}), Resist_HPS (α_{2_{4}}^{1})} \end{array}$

¹⁸Alternatively, when the transitions from $ξ_{2}$ fail to reach the three other states, the game could remain in this state as shown in the Figure 1.

¹⁹The game reaches this state when the user’s attempt to transit to the normal state is unsuccessful. This is typical of a situation when the traces are not getting wiped out even though the user is trying to get back to the guise of a normal user.

$ξ_{3}$ = Attack_State: In this state the user has already attacked the system successfully. He is either accessing other’s confidential data or damaging the system or doing some other harm. Here either he is doing some normal operations which is actually some attacking activity or eloping along with the confidential data, or returning to legitimate behavior with either high or low privileges into $ξ_{1}$ & $ξ_{2}$ . He would be concerned to wipe out as much as possible any trace of his attack before he goes away. In this state administrator can revive the system, which is like restarting a file server for example, if the file server is found compromised. This can accompany with sending the suspected user either out of the system or to honeypot taking the game to $ξ_{4}$ with the action Trap_AS. The game reaches this state

・ when the user successfully attacks from the $ξ_{1}$ .

・ when the user successfully attacks from the $ξ_{2}$ .

・ when the user succeeds to behave inconspicuously with Normal_Operation continuing to damage the system.¹⁹

・ when the transitions to $ξ_{1}, ξ_{2}$ & $ξ_{3}$ fails as per the stochasticity of the system.

$\begin{array}{l} A_{ξ_{3}}^{0} = {Normal_Operation_AS_A (α_{3_{0}}^{0}), Sign-Out_AS (α_{3_{1}}^{0}), \\ Revive_LPS (α_{3_{2}}^{0}), Revive_HPS (α_{3_{3}}^{0}), Trap_AS (α_{3_{4}}^{0})} \end{array}$

$\begin{array}{l} A_{ξ_{3}}^{1} = {Normal_Operation_AS_U (α_{3_{0}}^{1}), Sign-Out_Request_AS (α_{3_{1}}^{1}), \\ Increase_Attack (α_{3_{2}}^{1}), Return_LPS (α_{3_{3}}^{1}), Return_HPS (α_{3_{4}}^{1})} \end{array}$

$ξ_{4}$ = Trap_State: In this state, the administrator, after suspecting the user while in $ξ_{1}, ξ_{2}$ & $ξ_{3}$ to be an attacker, and has successfully sent him to the honey pot. In this state user could either enact Normal_Operation actions, or with a suspicion of being trapped can do Sign-Out-Request action, which is either trying to close all his operations and logging out at the earliest or engage with the administrator in his ploy behaving like a benign user. User can also confirm his malicious intention by enacting it during the inquiry. The game reaches this state

・ when the administrator successfully traps in to Honeypot a user suspecting him of some malicious intention.

・ when the user with no malicious intention continues to do Normal_Operation oblivious of being trapped and the administrator continuous to observe him at Honeypot. Alternatively $ξ_{4}$ is reached when a user with malicious intention becomes suspicious of being trapped starts behaving.

・ when the user who originally had malicious intention may try to attack at the false targets provided by the administrator and try to escape by choosing the Sign-Out-Request action, but does not succeed.

・ when the administrator identifies the user who has successfully attacked the system and decides to trap him into honeypot while the user is either trying to return to $ξ_{1}$ & $ξ_{2}$ .

In each of these actions the user engages with the administrator’s ploy. In the end administrator may declare either the user innocent and let him back in to the system through the $ξ_{1}$ or catch him and take further punitive actions. The fact that the user already created suspicion, makes the administrator to let him back to the $ξ_{1}$ , and let him earn his higher privileges, than to directly enter $ξ_{2}$ .

$\begin{array}{l} A_{ξ_{4}}^{0} = {Normal_Operation_TS_A (α_{4_{0}}^{0}), Sign-Out_TS (α_{4_{1}}^{0}), \\ Test_TS (α_{4_{2}}^{0}), Judge_TS (α_{4_{3}}^{0}), Allow_TS (α_{4_{4}}^{0})} \end{array}$

$\begin{array}{l} A_{ξ_{4}}^{1} = {Normal_Operation_TS_U (α_{4_{0}}^{1}), Sign-Out_Request_TS (α_{4_{1}}^{1}), \\ Commit (α_{4_{3}}^{1}), Behave (α_{4_{3}}^{1})} \end{array}$

Consequently, the action sets of the players are summarized as below.

$\begin{array}{l} User_Action_Space = A^{1} \\ = {α_{1_{1}}^{1}, α_{1_{2}}^{1}, α_{1_{3}}^{1}, α_{1_{4}}^{1}, α_{1_{5}}^{1}, α_{2_{1}}^{1}, α_{2_{2}}^{1}, α_{2_{3}}^{1}, α_{2_{4}}^{1}, α_{3_{1}}^{1}, α_{3_{2}}^{1}, α_{3_{3}}^{1}, α_{3_{4}}^{1}, α_{3_{5}}^{1}, α_{4_{1}}^{1}, α_{4_{2}}^{1}, α_{4_{3}}^{1}} . \end{array}$

$\begin{array}{l} Administrator_Action_Space = A^{2} \\ = {α_{1_{1}}^{2}, α_{1_{2}}^{2}, α_{1_{3}}^{2}, α_{1_{4}}^{2}, α_{1_{5}}^{2}, α_{2_{1}}^{2}, α_{2_{2}}^{2}, α_{2_{3}}^{2}, α_{2_{4}}^{2}, α_{2_{5}}^{2}, α_{3_{1}}^{2}, α_{3_{2}}^{2}, α_{3_{3}}^{2}, α_{3_{4}}^{2}, α_{4_{1}}^{2}, α_{4_{2}}^{2}, α_{4_{3}}^{2}} . \end{array}$

With the above information about the states and actions, the other 6 compo- nents of the 9-tuple, $E^{1}, E^{2}, Q, R^{1}, R^{2}, β$ are suitably designed reflecting the system in question. We present a numerical simulation with one such typical set of values in Section 7.

6. Game Analysis

As the general case, each of the five states being perceived with a non-zero probability at each state would make the whole action space of the player as the extended action set at each state. But for the game to go into fifth state adminis- trator has to explicitly take the user out of the main system and put into the honeypot. Since neither the stochasticity of the game nor the user can acciden- tally take the game to this state, administrator wont mistake to be in it while being in the other states. Thus, $p_{j_{i}}^{0} = 0$ for $j \neq i$ & $j = 4$ , while $0 \leq p_{j_{i}}^{0}$ & $\sum_{i = 1}^{N} p_{j_{i}}^{0} = 1, 1 \leq j, i, \leq N$ . Thus, $I_{ξ_{j}}^{0} = {ξ_{1}, ξ_{2}, ξ_{3}}, 1 \leq j \leq 3$ . Hence, the extended action set corresponding to the information set would be for $j \neq 4$ ,

$B_{ξ_{j}}^{0} = {α_{0_{0}}^{0}, α_{0_{1}}^{0}, α_{1_{0}}^{0}, α_{1_{1}}^{0}, α_{1_{2}}^{0}, α_{1_{3}}^{0}, α_{1_{4}}^{0}, α_{2_{0}}^{0}, α_{2_{1}}^{0}, α_{2_{2}}^{0}, α_{2_{3}}^{0}, α_{2_{4}}^{0}, α_{3_{0}}^{0}, α_{3_{1}}^{0}, α_{3_{2}}^{0}, α_{3_{3}}^{0}, α_{3_{4}}^{0}$ $, α_{4_{0}}^{0}, α_{4_{1}}^{0}, α_{4_{2}}^{0}, α_{4_{3}}^{0}, α_{4_{4}}^{0}}$

In the Trap state the administrator’s response tries to delude the user about his current state to be his preferred state. We consider the user’s sensor to be perfect in general, which is the worst case scenario. Thus user’s view of the states can be defined as $p_{j_{i}}^{1} = 0$ if $j \neq i, j \neq 4$ . $p_{j_{i}}^{1} = 1$ if $j = i$ & $j \neq 4$ . Since the user can also have a honey pot detection system $p_{4_{4}}^{1} > 0$ . The administrator’s ploy at $ξ_{4}$ need not be only passive reactive one, a motivation for a proactive one is suggested with the carrot and stick approach in [5] . [19] & [20] discuss such scenarios where the inquiry into the nature and motivation of the attacker are discussed.

Based on the above strictures, the game was simulated and evaluated for rewards to find the preferred strategy profiles.

7. Simulation

The game starts only after a sensor reporting an anomalous action by a user. Thus there is a suspicion in the admin and a plausibility of malice in the user. At this stage it is both an incomplete and imperfect information game, with the admin not being sure of both the user’s nature and the current state of the game. The rewards for each affecting feature of the game are symbolically distributed over the discrete space defined by $- 1000, - 100, - 10, 0, 10, 100, 1000$ . The logari- thmically varied values are chosen to represent how the expense of actions and potential outcomes of actions differ. Though it has to be fine tuned for the different attacks based on the specifics this setup gives us a general idea. Taking a Normal_Operation action and for the administrator and the user is a low expense action to attempt and execute in the context of the game. Thus it leads to 0 pay off for both of them in $ξ_{1}$ & $ξ_{2}$ . Similarly the sign-out request from the user and the sign-out from the admin also leads to a high probability exit of the user from the system. This does not progress the interaction further, but leads to a logical non-antagonizing move. That way the payoff to both is 10, which is not nothing, but not much.

When the user chooses to attack and the admin chooses the action defend, the attack is least likely to succeed. This also reveals the user’s intention to attack to the admin. Thus this leads to −100 for the user and 100 for the admin. Similarly, when admin chooses the action “trap” and the user chooses “resist”, the admin’s action is least likely to succeed. This also makes the user aware that the admin is trying to trap him, and thus he may start behaving leading to no chance of discovery by the administrator of his real motive with malice, if there was one. This would hence result in −100 to the admin and 100 to the user. These figures are also considered due to the expense of resources to take these actions and the actions impact on the game.

The aggressive actions are the most expensive actions for the players. The exact defensive actions will result in least likelihood of succeeding of the aggre- ssive actions by the opponents. These defensive actions are the second most expensive actions to attempt/execute. The aggression will also yield higher payoff to the player than the defending one, if succeeds. When one player takes an aggressive action, the other player taking Normal_Operation or general state transiting actions will lead to greater loss to him and higher gain to the aggressor. This game dynamics at state $ξ_{1}$ is explained in Appendix A.

For computational reasons we restricted the simulation with some constraints. The game was programmed in C and run on a Debian7.6 of GNU/Linux on an Intel CoreTM i5 Processor with 4 GB RAM machine.

1) We ran the simulation with 100 strategies each varying in a gradation of aggression.

2) We find an approximate Equilibrium as explained at Appendix C.

3) We ran it for three types of users/attackers and delineate the findings.We simulated the games with each player 100 strategies, playing each 100 times. For better visualization purpose we plot rewards for 10 strategies (whose results are representative of the results with larger range) to cover the aggression, with steps of 0.1 from 0 to 0.9 in Figures 8-10.²⁰

²⁰TL Þ Top Left, TR Þ Top Right, BL Þ Bottom Left, BR Þ Bottom Right, AR Þ Admin Reward, UR Þ User Reward, ARS Þ Admin Reward Values Over 100 Executions with (0.5) mid level aggression for both in the Strategy Profile, URS Þ User Reward Values Over 100 Executions with (0.5) mid level aggression for both in the Strategy Profile.

The simulation was conducted with three user profiles in terms of skill, sophistication and effectiveness as Below_Average, Average and Above_Average users. When in Low_Privilege_State, the Average user would attack and the Administrator defends, playing the perfect antagonizing, there is 0.5 probability that attack succeeds. This way he is on par with the administrator in terms of imposing his will on the system operation. Numerically the Below_Average user would have <0.5 probability to be succeed while the Above_Average user >0.5 in such a situation. As this example illustrates the essential characteristics, all other transitions are similarly poised.

7.1. Results

Since the game is stochastic, for the same strategy profile there could be different rewards for the players. So, we ran the game for each strategy profile for 100 times, and picked the average reward. We evaluate a preferred strategy in terms of the discretized gradation of aggression as discussed in Appendix C. Figures 8-10 show the rewards of administrator and the users and the spread of rewards during multiple executions for a typical strategy profile.

Figure 8. Below-average-user: TL-AR, TR-UR, BL-ARS, BR-URS.

Figure 9. Average-user: TL-AR, TR-UR, BL-ARS, BR-URS.

Figure 10. Above-Average-User: TL-AR, TR-UR, BL-ARS, BR-URS.

We found many $ϵ$ -NOSPs with $ϵ = 0.01$ and a different prescribed one in each case. The prescription is based on the least motivation for the attacker to deviate, hence the chance of game going as expected his higher.

7.1.1. Below_Average User (BAU)

$(7,2)$ is the prescribed $ϵ$ -NOSP. It recommends for the user and adminis- trator to be least aggressive. The game starts with some anomaly by user and thus if the user is perceived as a BAU, then both players are better off to curtail their aggression.

7.1.2. Average User (AU)

$(8,6)$ is the prescribed $ϵ$ -NOSP. It recommends for the user to be less aggressive as he is more affective while the administrator to be quite aggressive.

7.1.3. Above_Average User (AAU)

$(4,1)$ is the prescribed $ϵ$ -NOSP. It recommends for the user and administrator to be least aggressive.

Among the possible strategies of players, graded on aggression, this gives a preference for achieving optimal reward. Based on these results, we can infer that during the interaction, due the combination of factors like capacity for domination and secrecy of intent, the players are better off to do most action when its most advantageous to them.

Conjecture 1: The aggression needed by the administrator for achieving equilibrium follows normal distribution over the power of the user, with the mean at the power of both being equal, while interacting with same reward functions and discount factor.

Conjecture 2: The aggression needed by the administrator for achieving the equilibrium would vary from linear to Normal distribution as the depth of the game tree increases, for the same reward functions and the user. The depth of the game tree is a function of discount factor and threshold discount.

8. n-Player Game

The game model presented in Section 4, considers $n, n \in Z$ , $0 < n < \infty$ players where the interactions between many users and administrators can be modeled. As practicality alludes, considering a single administrator would be convenient and useful for most cases. Thus the interaction between many users and an administrator is interesting to investigate. In such a situation, the users can be disparate and working on their disjoint motivations. They could also be working as a group. On the other hand, the administrator can be interacting separately with many users or interact with a group of users with actions collectively affecting all of them. In the light of the game model and game we presented in Sections 4 and 5 respectively, we derive the conditions of the group behavior and reaction of administrator to a group to be optimal.

For each state $j$ , let $r_{j_{\max}}^{k}$ be the reward associated with the preferred action choice of the user k. During the course of the game, if $\sum_{j = ξ_{i}}^{j = ξ_{f}} r_{j_{\max}}^{k} < \sum_{j = ξ_{i}}^{j = ξ_{f}} r_{j_{c o}}^{k}$ , $k \in S_{k}$ with $S_{k} = {k_{1}, k_{2}, \dots, k_{a}}$ where $r_{j_{c o}}^{k}$ is the reward of $k^{t h}$ player at $j^{t h}$ state when taking co-operative actions, a is the number of users who have motivations that can be pursued in co-operation. There are two trees of game progress that are considered here. In the first case, when each player takes action to fully maximize his total reward during the sequence of game play, he is trying to maximize the probability of those state which maximizes his reward. While doing so two such players may make the game progression such that the final resulting sequence of game play gives both a total reward which is far below their maximum possible value. But this value can be lower than a compromised game transition sequence in which both have higher probability to make higher reward. Recognizing this point, would be the threshold to co-operate.

Secondly, when there are two or more similarly engaged users, responding individually can be expensive for the administrator. Also a response to one user may require an action whose effects on the system conflict with the interaction with other user. The motivating example is when a user who is accessing a parti- cular web server is behaving suspiciously, the admin may want to restart the web server and disconnect the suspicious user. Simultaneously if another user is acting suspiciously with a database server, which is also running on the same machine which has no other processes active at this time, depending on the extent of threat by both the users, it may be optimal for the administrator to re- boot the operating system itself to address both users. The condition is

$\sum_{g = 1}^{g = n} r_{j_{\max}} < \sum_{g = 1}^{g = n} r_{j_{common}}$

where $r_{j_{\max}}$ and $r_{j_{common}}$ are the rewards associated with the actions taken for each individual game towards optimal reward, played with different users and the actions taken to collectively/commonly affect all the users concerned. At this condition the administrator is better off playing a single game considering all the users as a collective adversary.

In particular, when many players are involved, the decision process becomes decentralized. If this is true of the defending side, then taking optimal actions such that overall security is served best becomes a challenge. This problem was addressed in [21] and was shown that the individual optima for each defender in his own game to safe guard his own asset may not serve the global good when the assets defended by different defenders are interconnected and interde- pendent.

9. On Engaging Suspicious Users

The only way the game enters the Trapped State $ξ_{4}$ is from the other states. In the other states , the administrator could become suspicious about a particular user. After sufficient observation of such a behavior of the user, the adminis- trator could become sure that it is not safe to have that user operating freely in the network. For that particular reason administrator sends him to honey pot. Then the game enters the Trapped State $ξ_{4}$ . Then the administrator will have a belief about the user’s motive based on these actions of user at one or more of the other states ( $ξ_{1}, ξ_{2}, ξ_{3}$ ), when the game enters $ξ_{4}$ . For example, in a site offering the service of audio streaming but no download, trying to download (successfully or not) could be a reason to put him in Trapped State. The admi- nistrator holds the belief that the user wants to steal the audio files. The admi- nistrator does not know conclusively yet, either the true nature or motivation of the user. By nature, the user may be not an attacker, a naive attacker or an advanced attacker. The motivation of the user is what exactly he/she intends to do in the network. A normal user might have accidentally performed an action which was not allowed. This uncertainty to the administrator entails to incom- plete information in the game play. All the further interactions of admi- nistrator’s ploy with the user is abstracted to take place in a single state $ξ_{4}$ of the main game in Figure 1. Here, the prime motivation of the administrator is to learn the “original” motivation of the user with the minimum number of interactions to identify the nature and motivation of the user and thus determine the suitable response. The administrator, based on his current belief about the user’s nature and motivation, creates the situation in the state $ξ_{4}$ . Administra- tor provides some actions as choices for the user during each interaction step, such that user’s actions give the maximum information about his identity. The administrator has a finite set of such actions to offer the user to study his response. Offering them all at once is not advisable as it confuses the user making him change the original plan or become suspicious to altogether start behaving like a good user and may even try to log out inconspicuously. So, the actions should be provided as options at each step of this engagement to facilitate optimal user identification. Such a ploy can be characterized by the following structure to achieve this.

Ploy: Formal description: The administrator has a finite set of actions to offer to the user. Let this set be

$A_{p l o y} = {α_{p l o y_{j_{0}}}, α_{p l o y_{j_{1}}}, α_{p l o y_{j_{2}}}, \dots, α_{p l o y_{j_{x}}}}$ (4)

where x is the total number of action-options that the administrator can offer at any general step j in the interaction. Based on the belief about the motivation of user at the time entering the $ξ_{4}$ , the administrator associates a probability distribution over the set $A_{p l o y}$ , to represent the expected user’s preference. $p_{α_{p l o y_{j}}}$ is the probability with which the user is expected to prefer to take the action $α_{p l o y_{i_{j}}}$ in the step j while facing the ploy. This implies

$\sum_{i = 1}^{x} p_{a_{p l o y_{j_{i}}}} = 1$ (5)

Now the actions in the set $A_{p l o y}$ are sorted in descending order by their associated probabilities.

$A_{p l o y_{s o r t e d}} = {α_{p l o y_{i_{0}}}, α_{p l o y_{i_{1}}}, \dots, α_{p l o y_{i_{x}}}}$ (6)

where

$(p_{α_{p l o y_{i}}}) \leq (p_{α_{p l o y_{i}}}), \forall k \leq j .$ (7)

Let the action set of the user at step 1 at $ξ_{4}$ be

$A_{p l o y_{1}}^{1} = {α_{p l o y_{1_{0}}}, α_{p l o y_{1_{1}}}, \dots, α_{p l o y_{1_{y}}}} .$ (8)

And in general at any given step $j$ the user action set would be

$A_{p l o y_{j}}^{1} = {α_{p l o y_{j_{0}}}, α_{p l o y_{j_{1}}}, \dots, α_{p l o y_{j_{y}}}} .$ (9)

But this is essentially the similar looking action set as was in $ξ_{1}$ or $ξ_{2}$ or $ξ_{3}$ to start with from where the user landed in $ξ_{4}$ . But the effect of the actions is different. As is true for any mixed strategy game, there exists a Nash strategy profile for the user and the administrator here too. This implies that there is an action guided by the user’s Nash strategy, as identified/expected by the adminis- trator, that the user is better off taking at this step. Let $α_{p l o y_{1_{n a s h}}}$ be that action in the step 1 in $ξ_{4}$ . Administrator has to add/remove an action/actions at each step of interaction, to make the probability of the user taking actions as per Nash strategy, equiprobable with respect to user’s current action set. This leads us to the following theorem.

Theorem 1: In an incomplete information game, when the action prescribed by the Nash-strategy for a user is equiprobable with respect to other possible actions at a given state, the action selection is solely influenced by the original motivation of the player²¹. The player’s selecting the prescribed action is a sufficient condition for the confirmation of the expected motivation (belief of payoff of user).

Proof:

$p_{α_{p l o y_{j_{n a s h}}}^{1}} ≅ \frac{\sum_{i = 1}^{y} p_{α_{p l o y_{j_{i}}^{1}}} - p_{α_{p l o y_{j_{n a s h}}}^{1}}}{j_{y} - 1}$ (10)

where $j_{y}$ is the number of actions in the current action set at step j. The main rationale is that the situation artificially created at the trapped state neither dissuade not encourage the user to take the action $α_{p l o y_{j_{n a s h}}}^{1}$ . Administrator cannot infer well if he makes the user do the action administrator choose for user. From the point of information to the user, he has as much information as he could with choice in background. The average information/Shannon-entropy given to him is at its maximum. As established in 3.5 we have

${(I_{p l o y_{j}}^{1})}_{\max} = \sum_{i = 1}^{y} p_{α_{p l o y_{j_{i}}}^{1}} \log (1 / p_{α_{p l o y_{j_{i}}}^{1}})$ (11)

□

where $I_{p l o y_{j}}^{1}$ is the Average information over action set.

$p_{α_{p l o y_{j_{i}}}^{1}} = \frac{1}{j_{i}}, \forall j_{i}, 1 \leq j_{i} \leq y$

²¹And not by any reactionary behavior towards other players.

Even though all the individual actions may not be equiprobable for user’s selection, the main concern is regarding the action prescribed by the Nash strategy and its probability of being selected. From Equations (3.3) and (3.4), administrator has ensured that, if user takes the Nash action, then he is doing it at a maximum average information situation. This allows the sufficient confir- mation to justify the previous belief. How ever if the user takes a different action then it also provides the direction for updating the belief. In either case, the administrator is not influencing the user’s decisions and thus getting the most effective learning about user’s original intention. The Equation (3.7) gives the ideal situation. But the minima of absolute value of the difference between LHS and RHS is to be considered for practical purpose. When the difference is positive then the administrator is dissuading the user and when it is negative he is encouraging the user from perusing the expected behavior. Both disturb the objective observation. Hence the absolute minimum difference as shown above is the best administrator can do.

10. Conclusion

An implementation of countering a real life attack is to be done to evaluate the dynamics and scopes for improvements.The Trap state provides a learning opportunity for the administrator while gaming with the user. The apposite game model would allow a Bayesian game with updating the belief about user, as the game progresses. The Bayesian Nash Equilibrium gives the useful solution concept to optimize the interaction. Exploring that with in this model would be the next step. The present model has static reward functions, which are generat- ing the rewards in the three dimensional vector space. The next step would be to make it dynamic and learn during the game plays. This would make the model and resulting games truly generic to capture many more scenarios. A suitable method to compute the equilibria for the general cases and in particular security games is our planned future work.

Acknowledgements

I want to acknowledge Anupama for the times during which this work was done.

Cite this paper

Shandilya, V. and Shiva, S. (2017) On a Generic Security Game Model. Int. J. Communications, Network and System Sciences, 10, 142-172. https://doi.org/10.4236/ijcns.2017.107008

References

1. Shandilya, V. and Shiva, S. (2013) Security in the Cloud Based Systems: A Structure and Its Breaches. ICITST, London.

2. Zhang, M., Zheng, Z. and Shro, N.B. (2015) A Game Theoretic Model for Defending Against Stealthy Attacks with Limited Resources. Springer International Publishing, Cham, 93-112.

3. Lye, K.-W. and Wing, J.M. (2005) Game Strategies in Network Security. International Journal of Information Security, 4, 71-86. https://doi.org/10.1007/s10207-004-0060-x

4. Roy, S., Ellis, C., Shiva, S., Dasgupta, D., Shandilya, V. and Wu, Q. (2010) A Survey of Game Theory as Applied to Network Security. 43rd Hawaii International Conference on System Sciences, Honolulu, 5-8 January 2010, 1-10.https://doi.org/10.1109/HICSS.2010.35

5. Shiva, S., Roy, S., Bedi, H., Dasgupta, D. and Wu, Q. (2010) A Stochastic Game with Imperfect Information for Cyber Security. 5th International Conference on I-Warfare & Security (ICIW), Dayton, 8-9 April 2010, 308-319.

6. Nochenson, A. and Heimann, C.F.L. (2012) Simulation and Game-Theoritic Analysis of Attacker-Defender Game. In: Grossklags, J. and Walrand, J.C., Eds., GameSec, Lecture Notes in Computer Science, Springer, Berlin, 138-151.

7. Simmons, C., Shiva, S., Bedi, H. and Shandilya, V. (2013) A Game Inspired Attack-Defense and Performance Metric Taxonomy. In: Janczewski, L.J., Wolfe, H.B. and Shenoi, S.-J., Eds., 28th Security and Privacy Protection in Information Processing Systems (SEC), Springer, Auckland.

8. Manshaei, M., Zhu, Q., Alpcan, T., Basar, T. and Hubaux, J.-P. (2013) Game Theory Meets Network Security and Privacy. ACM Surveys, 45, Article ID: No. 25.https://doi.org/10.1145/2480741.2480742

9. Lu, W., Xu, S. and Yi, X. (2013) Optimizing Active Cyber Defense. In: Das, S.K., Nita-Rotaru, C. and Kantarcioglu, M., Eds., GameSec, 8252, Springer, Berlin, 206-225. https://doi.org/10.1007/978-3-319-02786-9_13

10. Carter, K.M., Okhravi, H. and Riordan, J. (2014) Quantitative Analysis of Active Cyber Defenses Based on Temporal Platform Diversity. Cornell University, New York.

11. Khouzani, M.H.R., Sarkar, S. and Altman, E. (2011) A Dynamic Game Solution to Malware Attack. 2011 Proceedings IEEE INFOCOM, Shanghai, 10-15 April 2011, 2138-2146. https://doi.org/10.1109/INFCOM.2011.5935025

12. Moayedi, B.Z. and Azgomi, M.A. (2012) A Game Theoretic Framework for Evaluation of the Impacts of Hackers Diversity on Security Measures. Reliability Engineering and System Safety, 99, 45-54. https://doi.org/10.1016/j.ress.2011.11.001

13. Ma, C.Y.T., Rao, N.S.V. and Yau, D.K.Y. (2011) A Game Theoretic Study of Attack and Defense in Cyber-Physical Systems. 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Shanghai, 10-15 April 2011, 708-713. https://doi.org/10.1109/INFCOMW.2011.5928904

14. Chen, X., Deng, X. and Teng, S.-H. (2009) Settling the Complexity of Computing Two-Player Nash Equilibria. Journal of the ACM, 56, 14:1-14:57. https://doi.org/10.1145/1516512.1516516

15. Papadimitriou, C.H. (1994) On the Complexity of the Parity Argument and Other Incident Proofs of Existence. Journal of Computer and System Sciences, 48, 498-532. https://doi.org/10.1016/S0022-0000(05)80063-7

16. Prasad, H., Prashanth, L. and Bhatnagar, S. (2014) Algorithms for Nash Equilibria in General-Sum Stochastic Games. ArXiv, Cornell University, New York.

17. Hota, A.R. and Sundaram, S. (2016) Interdependent Security Games on Networks under Behavioral Probability Weighting. IEEE Transactions on Control of Network Systems, No. 99, 1. https://doi.org/10.1109/TCNS.2016.2600484

18. Shiva, S., Dharam, R. and Shandilya, V. (2011) Runtime Monitors as Sensors of Security Systems. IASTED, Dallas.

19. Lisy, V., Pibil, R., Stiborek, J., Bosansky, B. and Pechoucek, M. (2012) Game-Theoretic Approach to Adversarial Plan Recognition. ECAI’12 Proceedings of the 20th European Conference on Artificial Intelligence, Montpellier, 27-31August 2012, 546-551.

20. Dritsoula, L., Loiseau, P. and Musacchio, J. (2012) Computing the Nash Equilibria of Intruder Classication Games. GameSec, 7638, 78-97.

21. Hota, A.R., Clements, A.A., Sundaram, S. and Bagchi, S. (2016) Optimal and Game-Theoretic Deployment of Security Investments in Interdependent Assets. International Conference on Decision and Game Theory for Security, Springer, New York, 101-113. https://doi.org/10.1007/978-3-319-47413-7_6

22. Filar, J. and Vrieze, K. (1997) Competitive Markov Decision Processes. Springer-Verlag, New York.

Appendix

A. Example

This can be illustrated with an example. In the game in Section 5, an attacker trying to attack in $ξ_{1}$ may be affecting the eventual success of his attacking plans by revealing his attacking intention such that if he does not succeed with the present attempt, may later affect the reaction of the system administrator to promote to $ξ_{2}$ if he chooses $α_{1_{2}}^{1}$ . Thus information revealed to the other player about one’s intention can affect the potential of the all next actions to propel the state transitions. Even if by chance, while the player is attacking, the game lands in the $ξ_{2}$ instead of $ξ_{3}$ , he will now be under serious gaze of administrator. With this premise, logging out of the system may be better off with sign-out action, than to be stuck in Honeypot by the administrator. Thus the desirability of landing in $ξ_{2}$ after an attack action $α_{1_{3}}^{1}$ may not be same as just landing with $α_{1_{2}}^{1}$ .

B. Generalized Imperfect Information Factor

The model for 2-state game in [5] with the following extension can be used to calculate Nash equilibrium for an N-state game with $N < \infty$ . From [22] we follow that the stochasticity of the game provides us with a probability $P^{m} (i, j)$ with which the state $ξ_{j}$ can occur from $ξ_{i}$ in $m^{t h}$ iteration. The occurrence ratios $r_{1}, r_{2}, r_{3}$ & $r_{4}$ corresponding to the states $ξ_{1}, ξ_{2}, ξ_{3}$ & $ξ_{4}$ are given by

$r_{i} = \lim_{m \to \infty} \frac{P (1, i) + P^{2} (1, i) + \dots + P^{m} (1, i)}{m}$ , $1 \leq i \leq N$ with $\sum_{i = 1}^{N} r_{i} = 1$ , and

$N = 4$ for the game in Section 5. But the perceived occurrence ratios are different and is given by ${r^{'}}_{i} = \sum_{j = 1}^{N} p_{j_{i}}^{k} * r_{j}$ . We note that the apparent strategy is different than the truly executing strategy. We have $π^{k^{'}} = I I F * π^{k}$ , where $I I F$ is called the Imperfect Information Factor. $I I F$ is defined as a matrix with

entries $z_{i j}$ with i rows and j columns, with $z_{i j} = \frac{p_{j_{i}}^{k} * r_{j}}{{r^{'}}_{i}}$ . This gives the genera-

lized $I I F$ for N states. For the game in Section 5 with 5 states IIF is evaluated as shown below.

$I I F = {[z_{i j}]}_{5 \times 5} = [\begin{matrix} \frac{p_{00}^{k} r_{0}}{{r^{'}}_{0}} & \frac{p_{01}^{k} r_{1}}{{r^{'}}_{0}} & \frac{p_{02}^{k} r_{2}}{{r^{'}}_{0}} & \frac{p_{03}^{k} r_{3}}{{r^{'}}_{0}} & \frac{p_{04}^{k} r_{4}}{{r^{'}}_{0}} \\ \frac{p_{10}^{k} r_{0}}{{r^{'}}_{1}} & \frac{p_{11}^{k} r_{1}}{{r^{'}}_{1}} & \frac{p_{12}^{k} r_{2}}{{r^{'}}_{1}} & \frac{p_{13}^{k} r_{3}}{{r^{'}}_{1}} & \frac{p_{14}^{k} r_{4}}{{r^{'}}_{1}} \\ \frac{p_{20}^{k} r_{0}}{{r^{'}}_{2}} & \frac{p_{21}^{k} r_{1}}{{r^{'}}_{2}} & \frac{p_{22}^{k} r_{2}}{{r^{'}}_{2}} & \frac{p_{23}^{k} r_{3}}{{r^{'}}_{2}} & \frac{p_{24}^{k} r_{4}}{{r^{'}}_{2}} \\ \frac{p_{30}^{k} r_{0}}{{r^{'}}_{3}} & \frac{p_{31}^{k} r_{1}}{{r^{'}}_{3}} & \frac{p_{32}^{k} r_{2}}{{r^{'}}_{3}} & \frac{p_{33}^{k} r_{3}}{{r^{'}}_{3}} & \frac{p_{34}^{k} r_{4}}{{r^{'}}_{3}} \\ \frac{p_{40}^{k} r_{0}}{{r^{'}}_{4}} & \frac{p_{41}^{k} r_{1}}{{r^{'}}_{4}} & \frac{p_{42}^{k} r_{2}}{{r^{'}}_{4}} & \frac{p_{43}^{k} r_{3}}{{r^{'}}_{4}} & \frac{p_{44}^{k} r_{4}}{{r^{'}}_{4}} \end{matrix}]$

C. $ϵ$ -Neighborhood Optimal Strategy Profile

The problem of finding the optimal exact Nash Equilibrium efficiently in the general case is challenging. Many progressive efforts are reported as discussed in Section 2. We plan to progress on those results in our future work. We here calculate an $ϵ$ -Approximate Equilibrium(AE) for an example of the stochastic game presented in 5. Traditionally, $ϵ$ -AE in a stochastic game is defined as the strategy profile, where no player can improve his expected average reward more than $ϵ$ by unilateral strategy deviation.We introduce another variation $ϵ$ - Neighborhood Optimal Strategy Profile ( $ϵ$ -NOSP) such that $0 \leq | ϵ | < 1$ . The traditional definition of the $ϵ$ -ANE is dependent on the motivation being not greater than $ϵ$ to deviate from the prescribed strategy profile for each of the players. When the game is stochastic, reward is valued at the average of rewards got by the execution of the game with the same strategy profile some number of times. We propose here a variation here due to two reasons.

1) Since the distribution of the rewards some times was bi-modal, the average may not be a comprehensive measure of the distribution.

2) There is a non-trivial probability of getting a higher reward for a player with a strategy with lower average reward than one with higher average reward, depending on their standard deviations.

Without assuming any distribution for the rewards over multiple game runs for a strategy profile, we calculate the ò-NOSP $ν_{(π_{*}^{0}, π_{*}^{1}, π_{*}^{2}, \dots, π_{i}^{k}, \dots, π_{*}^{K})}^{k}$ with

$ϵ \geq \frac{ν_{{(π_{*}^{0}, π_{*}^{1}, π_{*}^{2}, \dots, π_{*}^{k}, \dots, π_{*}^{K})}_{\max}}^{k} - ν_{(π_{*}^{0}, π_{*}^{1}, π_{*}^{2}, \dots, π_{i}^{k}, \dots, π_{*}^{K})}^{k}}{σ (ν_{{(π_{*}^{0}, π_{*}^{1}, π_{*}^{2}, \dots, π_{i}^{k}, \dots, π_{*}^{K})}_{\max}}^{k}) + σ (ν_{(π_{*}^{0}, π_{*}^{1}, π_{*}^{2}, \dots, π_{i}^{k}, \dots, π_{*}^{K})}^{k})}$

where $σ$ is the standard deviation.This indicates the overlap of reward spectrum given by compared strategy profiles for the player k.

NOTES

¹Administrator, as the sensor of the user is considered to be error free we are considering here the worst case scenario for the administrator. Our work can be extended to the case with user too having erroneous sensors.

²Though there is a possibility of getting some control-signal information, like requesting the ISP to provide some traffic related information, it is not represented here with a control-link signal link, as its not a regular event and not quantitatively salient.

³Since this is done periodically and not necessarily during the engagement with an attacker the links are show in red.

⁴These observations collected by the sensors must be useful for the Attack Identification System for analysis. So, GIDA Control Unit will have to feed it in appropriate way.

⁵With the maturing of the model, the KMS can be designed to take the inputs which are as closer to the sensor inputs.

⁶The GIDA Action Agent uses the first lookup table for this.

⁷A typical example for this would be when two different sensors detect slight anomalies, but individual neither large enough to warrant a standard reflex action, as per sensor/actuator policies.

⁸The GIDA Action Agent uses the second lookup table for this.

⁹The system will have accepting conditions and not an accept state like the general NFA, and is facilitated to run continuously.

Journal Menu>>