**Applied Mathematics**

Vol.05 No.10(2014), Article ID:46520,10 pages

10.4236/am.2014.510141

Modeling and Design of Real-Time Pricing Systems Based on Markov Decision Processes

Koichi Kobayashi^{1*}, Ichiro Maruta^{2}, Kazunori Sakurama^{3}, Shun-ichi Azuma^{2}

^{1}School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan

^{2}Graduate School of Informatics, Kyoto University, Kyoto, Japan

^{3}Graduate School of Engineering, Tottori University, Tottori, Japan

Email: ^{*}k-kobaya@jaist.ac.jp

Copyright © 2014 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY).

Received 2 April 2014; revised 2 May 2014; accepted 9 May 2014

ABSTRACT

A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load curve. In this paper, using a Markov decision process (MDP), we propose a modeling method and an optimal control method for real-time pricing systems. First, the outline of real-time pricing systems is explained. Next, a model of a set of customers is derived as a multi-agent MDP. Furthermore, the optimal control problem is formulated, and is reduced to a quadratic programming problem. Finally, a numerical simulation is presented.

**Keywords:**

Markov decision process, Optimal control, Real-time pricing system

1. Introduction

In recent years, there has been growing interest in energy and the environment. For problems on energy and the environment such as energy saving, several approaches have been studied (see, e.g., [1] [2] ). In this paper, we focus on real-time pricing systems of electricity. A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load curve (see, e.g., [3] - [6] ). In general, a real-time pricing system consists of one controller deciding the price at each time and multiple electric customers such as commercial facilities and homes. If electricity conservation is needed, then the price is set to a high value. Since the economic load becomes high, customers conserve electricity. Thus, electricity conservation is achieved. In the existing methods, the price at each time is given by a simple function with respect to power consumptions and voltage deviations and so on (see, e.g., [6] ). In order to realize more precisely pricing, it is necessary to use a mathematical model of customers.

In this paper, using a Markov decision process (MDP), we propose a mathematical model of real-time pricing systems. Since in many cases, the status of electricity conservation of customers is discrete and stochastic, it is appropriate to use an MDP. Then, a set of electricity customers is modeled by a multi-agent MDP. Furthermore, we consider the finite-time optimal control problem. By appropriately setting the cost function, it is achieved that customers conserve electricity actively. This problem can be used for the model predictive control method, which is a control method that the finite-time optimal control problem is solved at each time. In addition, the finite-time optimal control problem can be reduced to a quadratic programming problem. The proposed approa- ch provides us with a basic of real-time pricing systems.

This paper is organized as follows. In Section 2, the outline of real-time pricing systems is explained. In Section 3, a model of electricity customers is derived. In Section 4, the optimal control problem is formulated, and its solution method is derived. In Section 5, a numerical simulation is shown. In Section 6, we conclude this paper.

Notation: Let denote the set of real numbers. Let, denote the identity matrix, the zero matrix, respectively. For simplicity, we sometimes use the symbol instead of, and the

symbol instead of. For two events, let denote the conditional expected value of

under the event.

2. Outline of Real-Time Pricing Systems

In this section, we explain the outline of real-time pricing systems studied in this paper.

Figure 1 shows an illustration of real-time pricing systems studied in this paper. This system consists of one controller and multiple electric customers such as commercial facilities and homes. For an electric customer, we suppose that each customer can monitor the status of electricity conservation of other customers. In other words, the status of some customer affects that of other customers. For example, in commercial facilities, we suppose that the status of rival commercial facilities can be checked by lighting, Blog, Twitter, and so on. Depending on power consumption, i.e., the status of electricity conservation, the controller determines the price at each time. If electricity conservation is needed, then the price is set to a high value. Since the economic load becomes high, customers conserve electricity. Thus, electricity conservation is achieved.

In this paper, the status of electricity conservation of each customer is modeled by a Markov decision process (MDP). Then a set of customers is modeled by a multi-agent MDP (MA-MDP). Furthermore, by using the obtained MA-MDP model, we consider the optimal control problem and its solution method.

Figure 1. Illustration of real-time pricing systems.

3. Model of Customers

First, consider modeling the dynamics of each customer by a one-dimensional MDP. The value of the state

is randomly chosen among the finite set. The element of expresses the status of

electricity conservation, and “” implies the status that a customer conserves electricity maximally, “” implies the status that a customer does not conserve electricity. Then the MDP of a customer is given by

(1)

where is the control input, and corresponds to the price. The vector

denotes the probability distribution, that is, implies

the probability that the state is at time. In addition, the initial probability distribution must satisfy the following condition:

The transition probability matrix is given by

The control input is determined under the condition for each element:

(2)

and the condition for each column:

(3)

Next, consider modeling the dynamics of a set of customers by an MA-MDP. The number of customers is

given by. For the customer, the state is given by, and from (1), the MDP model is

given by

Then, we suppose that the MA-MDP model expressing the dynamics of a set of customers is given by

(4)

where expresses the effect of couplings between customers, and is a constant satisfying the following

condition:

(5)

For simplicity of discussion, coupling terms are given by, but may be replaced with matrices satisfying

some condition corresponding to (5).

4. Optimal Control

4.1. Problem Formulation

Consider the following problem.

Problem 1. Suppose that for the MA-MDP model (4) expressing the dynamics of customers, the initial state

, , the desired state, and the prediction horizon are given. Then, find a control

input sequence, , minimizing the following cost function

(6)

subject to the following constraint:

(7)

where is a given linear function, is a given vector. are given weights.

Hereafter, for simplicity of notation, the condition in the cost function (6) is omitted.

By using the constraint (7), the input constraint such as can be imposed. In addition, by

adjusting, several specifications such that the state must converges to the neighborhood of the desired state can be considered.

4.2. Solution Method

We derive a solution method for Problem 1. First, consider the MDP model (1). The MDP model is a class of nonlinear systems. However, in this case, it can be transformed into a linear system. The MDP model (1) can be rewritten as

where

By the property of the probability distribution, the relation holds. From this fact, the MDP model (1) can be equivalently transformed into the following linear system:

(8)

where.

Next, by using the linear system (8), consider representing the MA-MDP model (4) as a linear system. The linear system for the customer is denoted by

Then, the MA-MDP model (4) can be equivalently transformed into the following linear system:

(9)

where

Finally, consider the cost function (6). Define

Then we can obtain

Therefore, the cost function (6) can be rewritten as

(10)

From the above discussion, Problem 1 is equivalent to the following problem.

Problem 2

Problem 2 is reduced to a quadratic programming (QP) problem, and can be solved by a suitable solver such as MATLAB and IBM ILOG CPLEX. In addition, if, then Problem 2 is reduced to a linear program-

ming (LP) problem (we remark that in the cost function (10) is a constant). See [7] for further details.

5. Numerical Example

Since it is difficult to use data in real systems, we present an artificial example. The state is chosen among the

finite set. The number of consumers is given by. The coefficient matrices in the linear

system for the consumer are given by

The parameters in (9) are given by

The parameters, , , and are given by, , , and, respectively.

From, Problem 2 is reduced to an LP problem. The initial state is given by. In

addition, the input constraint is imposed.

In this numerical example, we consider the following two cases:

• The price for each customer is the same (i.e., holds).

• The price for each customer is different.

Case (i) is the conventional case in real-time pricing systems. In Case (ii), we suppose that the difference in the price is covered by using local concurrencies such as the Eco-point point system [8] . The Eco-money system [9] in Japan were introduced to stimulate the economy and raise awareness of global warming. In the Eco-point point system, many points, which correspond to money in a local concurrency, are given for the products that are effective from the viewpoints of electricity conservation and the environment. Such a system for energy management systems has been discussed in [10] .

Next, we present the computational results. First, the computational result in Case (i) is explained. Figures 2-6 show the probability distribution for each customer. From these figures, we see that increases and

decreases. Thus, the state converges to, which corresponds to the status that a customer conserves

electricity maximally, with a certain probability. Furthermore, the optimal value of the cost function is, and the optimal control input is derived as

Figure 2. π^{1}(t) in Case (i).

Figure 3. π^{2}(t) in Case (i).

Figure 4. π^{3}(t) in Case (i).

Figure 5. π^{4}(t) in Case (i).

Next, the computational result in Case (ii) is explained. Figures 7-11 show the probability distribution for

each customer. Comparing Figures 2-6 with Figures 7-11, we see that transient responses of are

improved in Case (ii). In particular, for the customer, the steady state of is also improved (see

Figure 6. π^{5}(t) in Case (i).

Figure 7. π^{1}(t) in Case (ii).

Figure 8. π^{2}(t) in Case (ii).

Figure 6 and Figure 11). Furthermore, the optimal value of the cost function is 84.5057, and we see that the

optimal value of the cost function is improved. The optimal control input

is derived as

Figure 9. π^{3}(t) in Case (ii).

Figure 10. π^{4}(t) in Case (ii).

From these values, we see that in the steady state, is widely different to,. Thus, in the system considered here, it is appropriate to use a local concurrency.

6. Conclusions

In this paper, we have proposed a modeling method and an optimal control method of real-time pricing systems using the MDP-based approach. In many cases, the status of electricity conservation of customers is discrete and

Figure 11. π^{5}(t) in Case (ii).

stochastic, and the use of the MDP model is effective. A real-time pricing system is modeled by multi-agent MDPs, and the optimal control problem is reduced to a QP problem. Furthermore, a numerical simulation has been shown. The proposed method provides us with a new method in real-time pricing of electricity.

There are several open problems. First, it is important to develop the identification method of the MA-MDP model based on the existing result (see, e.g., [11] ) for MDPs. Since the effect of couplings between customers was simplified, it is also important to consider modeling it more precisely. Next, the optimal control problem is reduced to a QP problem or an LP problem. These problems can be solved faster than a combinatorial optimization problem such as a mixed integer programming problem. However, for large-scale systems, the computation time for solving the optimal control problem will be long. Then, it is important to develop a distributed algorithm.

Acknowledgements

This research was partly supported by JST, CREST.

References

- Camacho, E.F., Samad, T., Garcia-Sanz, M. and Hiskens, I. (2011) Control for Renewable Energy and Smart Grids. In: Samad, T. and Annaswamy, A.M., Eds., The Impacy of Control Technology, IEEE Control Systems Society, New York.
- Ruihua, Z., Yumei, D. and Yuhong, L. (2010) New Challenges to Power System Planning and Operation of Smart Grid Development in China. Proceedings of the 2010 International Conference on Power System Technology, Hangzhou, 24-28 October 2010, 1-8.
- Borenstein, S., Jaske, M. and Rosenfeld, A. (2002) Dynamic Pricing, Advanced Metering, and Demand Response in Electricity Markets. Center for the Study of Energy Markets, University of California, Berkeley.
- Roozbehani, M., Dahleh, M. and Mitter, S. (2010) On the Stability of Wholesale Electricity Markets under Real-Time Pricing. Proceedings of the 49th IEEE Conference on Decision and Control, Atlanta, 15-17 December 2010, 1911- 1918.
- Samadi, P., Mohsenian-Rad, A.-H., Schober, R., Wong, V.W.S. and Jatskevich, J. (2010) Optimal Real-Time Pricing Algorithm Based on Utility Maximization for Smart Grid. Proceedings of the 1st IEEE International Conference on Smart Grid Communications, Gaithersburg, 4-6 October 2010, 415-420.
- Vivekananthan, C., Mishra, Y. and Ledwich, G. (2013) A Novel Real Time Pricing Scheme for Demand Response in Residential Distribution Systems. Proceedings of the 38th Annual Conference of the IEEE Industrial Electronics Society, Monteral, 25-28 October 2012, 1954-1959.
- Bello, D. and Riano, G. (2006) Linear Programming Solvers for Markov Decision Processes. Proceedings of the 2006 IEEE Systems and Information Engineering Design Symposium, Charlottesville, 28 April 2006, 90-95. http://dx.doi.org/10.1109/SIEDS.2006.278719
- Eco-Point System for Housing. http://www.vec.gr.jp/english/topics/100217_1.htm
- Kyoto Eco Money. http://www.city.kyoto.jp/koho/eng/topics/2012_8/index.html
- Sawashima, K., Kubota, Y., Lu, H., Takemae, T., Yoshida, K. and Wan, Y. (2011) Socio-Personal Energy Management System. Keio ALPS2011 Group K Final Report. http://lab.sdm.keio.ac.jp/alps2011k/FinalReport-ALPS2011-K.pdf
- Rust, J. (1994) Structural Estimation of Markov Decision Processes. In: Handbook of Econometrics, Elsevier, Amsterdam, Vol. IV, Chapter 51, 3081-3143.

NOTES

^{*}Corresponding author.