Applied Mathematics, 2013, 4, 1490-1494
Published Online November 2013 (http://www.scirp.org/journal/am)
http://dx.doi.org/10.4236/am.2013.411201
Open Access AM
Discrete-Time Hybrid Decision Processes:
The Discounted Case
Buheeerdun Yang1, Pingjun Hou1, Masayuki Kageyama2
1Graduate School of Science, Chiba University, Chiba, Japan
2Nagoya City University, Nagoya, Japan
Email: kageyama@sda.nagoya-cu.ac.jp
Received August 28, 2013; revised September 28, 2013; accepted October 5, 2013
Copyright © 2013 Buheeerdun Yang et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ABSTRACT
This paper is a sequel to Kageyama et al. [1], in which a Markov-type hybrid process has been constructed and the cor-
responding discounted total reward has been characterized by the recursive equation. The objective of this paper is to
formulate a hybrid decision process and to give the existence and characterization of optimal policies.
Keywords: Hybrid Decision Process; Discounted Reward Criteria; Optimal Equation; Chance Space; Fixed
Point Theorem
1. Introduction
The credibility theory, developed by Liu [2], is useful in
dealing with uncertainty in human thinking. In the real
world, we often encounter the complex problem with
human thinking, which could not be treated only by
probability theory. To deal with such complex problem,
Li and Liu [3] have introduced a more flexible uncertain
theory, called chance theory, which is a hybrid of prob-
ability and credibility. Also, recently, Kageyama et al. [1]
have given a method of constructing a Markov-type hy-
brid process from stochastic kernel and credibilistic ker-
nel. Imagining the much wider applications of hybrid
processes in the near future, it is meaningful to consider
the case where the behavior of hybrid processes given in
Kageyama et al. [1] may be influenced by a suitable
choice of decisions or actions. The objective of this paper
is to formulate a hybrid decision process, referring a
modeling of stochastic control system known as a Mar k ov
decision process (cf. [4,5]), and to give the existence and
characterization of optimal policies.
In the remainder of this section, we shall establish the
notation that will be used throughout the paper and recall
the chance measure and hybrid variables whose expected
values are d efined. Fo r an y no n- empty set
X
, a function
:0,0.gX5 is said to satisfy condition with K
X
if

0.5,
sup
xX
gx
(1.1)
and

*
*
,
0.5 if0.5.
sup
xx xX
gx gx


(1.2)
The set of such functions will be denoted by
X
K.
A Borel set is a Borel subset of a metric space. Let
X
and be arbitrary Borel sets. We denote by
Y
X
and
X
P the sets of the Borel subsets and the power
set of
X
respectively. Let

X
be the set of
probability measures on
X
. The subset
X
Y is
called an event if

x
Y for all
x
X, where

,xyYxy.
  (1.3)
The set of all events will be denoted by
X
YP.
For any

,
g
pXKY, a function
on
X
YP is defined by, for each
X
YP,
 
,.
sup
xX
g
pgxpx
  (1.4)
where
min ,ab ab for any real numbers . ,ab
For any

,
g
pXKY the chance measure
Ch ,
g
p on
X
Y
is given (cf. [3,6]) as follows: for
any
X
YP,




,if,0
Ch ,1,if,
gp gp
gp
gp gp




 
.5,
0.5,
(1.5)
B. YANG ET AL. 1491
where is a complement of a set . The triplet
 
,,Ch
,
X
YX Ygp BP is a chance space
constructed from

,

g
pXK
::rX Y

PY
0,
.
The function is called a hy-
brid variable if the set
 

 
,,
x
yXYrxyaXY 
P for any
a
. We denote by
YXK
the set of all func-
tions
yx
q on
X
Y such that

x
YqK for
each
x
X. The function
YXqK
is called a
credibilistic kernel ([7]). Let us denote by
YX
the
set of all stochastic kernel on given
Y
X
([8]).
In Section 2, We define a hybrid decision process by
using the credibilistic k ernel and stochastic kernel, wh ich
is analyzed to show the existence and functional charac-
terization of optimal policies in Sectio n 3. An example is
given in Section 4.
The expected value
,Ergp
of the hybrid variable
is defined by the Choquet in t e gral :
:rX Y



0
,Ch ,Ergprt gpt


d.
2. Hybrid Decision Processes
The state and parameter of some dynamic systems will
be denoted respectively by points in a Borel set
X
and
a finite set

12
:,,,,
l

 . Let
,
12
:, ,
k
A
aa a
be a finite action space.
Let , where
 
:SXK
 

12
:,,,
satisfies the condition
l
gg gg
g

 K
K
.
The discrete-time Markov-type hybrid decision proc-
ess with the state space
X
is a five-tuple

,,,,
A
r
X
qp,
where

,
A

qKa is a credibilistic ker-
nel,

,yxaPY XAp
0,
a stochastic kernel and
::rSA
 
,gp
a reward function.
For any initial state
00
X
, if the action aA
is chosen, then two things happen.
1) The state
00
,
g
p
of the system moves to the new
state
11
,gp
X
as the following state equation:




10 0
max(),:
for any;
gg aTga
 




q
q
(2.1)






10
,d:
for any.
X
pBBxapxT paB
BX

p
p
0
(2.2)
2) The expected value of reward func-
tion occurs, where
00
,,rgp a
r
 


110 0
,,:Ch,,,, d.rgpaxrxat gpt


(2.3)
Once the state has been translated into the new state, a
new action is chosen and the processes is repeated.
The metric on the state space
X
will be defined by,
for any state
,,,gpgp

X
,
,,, :max,
TV
gpg pggpp


 

(2.4)
where TV
pp
is the total variation metric which is
defined by


:2 .
sup
TV X
pppBpB


 
We denote by
X
the Borel subsets of
X
gen-
erated by the ρ-metric topology.
The stationary policy is measurable if,
for any action :fXA
aA
,

a
B
X
, where
 

:, ,
a
Bgp fgpa .
X
X (2.5)
We denote by the set of all measurable stationary
policies.
For any initial state 00
,
g
gp p
with
,gp
X
,
under the stationary policy , we define the total
discounted reward function by
f

0
1
,
tt


0
,,,
t
ftt
t
g
prgpfgp

(2.6)
where




00
111
111
,,
,,
,,
tttt
tttt
ggpp
gTg fgp
pTp fgpt



1.
q
p
(2.7)
The value function
,
g
p
on
X
is given as

,:sup ,
f
f
.
g
pg

p (2.8)
The policy *
f
is called optimal if, for any
,gp
X
,

*
,
f
,.
g
pg

p
(2.9)
3. Analysis
In this section, we will utilize the method of dynamic
programming (cf. [4,5,9]) to drive the discounted opti-
mality equation, from which the existence of optimal
policies is shown. As first, we show the measurability of
the total discounted reward function and the value function.
Lemma 3.1. For any stationary policy f
, the
function
,,,
tt tt
rg pfgp is a measurable function
of the initial state
00
,gp

,gp
X
.
Proof. For the case of 1t
, it suffices to prove that
 

11 11
:, ,,,
for any,
dgp rgpf gpd
d

X
Open Access AM
B. YANG ET AL.
1492
where 1
g
and 1 are given by (2.1) and (2.2). From
(1.4) and (1.5), for any ,
p
Aa
 

Ch,, ,,
x
Srxatgp

 
is measurable in

,gp
X
, so that, by (1.5), is measurable
in
,,rgpa

,gp
X
.
The set is rewritten by
d





11
1
,,,
|, |
d
aA
a,
g
prg pad
TaTa


qp
where a is given in (2.5). Since
A
is finite, together
with (2.1), (2.2 ) and (2. 5) ,

d
X
2t follows.
For the case of and , the measurability of
can be proved, by induction, simi-
larly to the above. This completes the proof.
0t
t
,, ,
tt t
rg pfg p
Theorem 3.1. For any stationary policy , the dis-
counted total reward function f

,
f
g
p
is measurable
on the state space
X
.
Proof. From Le mma 3.1 and the definition of φf(g, p),
the assertion holds obviously.
Let
X
denote the class of all bounded measurable
functions on
X
. For ,

X
, we define the metric
on
X
by



,
,:,,
sup
gp
.
g
pgp
 


X (3.1)
Then, it is clear that the space
,
X is complete.
For any policy ,
f

,gp
X
and h
X
, we
define the operator
f
U on
X
as follows:
 






,,,,
,, ,
f
Uhgprgpf gp
hTgf gpTpf gp
qp
.
(3.2)
Lemma 3.2. The operator
f
U is a contraction on the
space
X
.
Proof. For any state

,gp
X
, we have
 












,
,,
,, ,
,, ,
,, ,
sup
ff
gp
UhgpUhgp
hTgf gpTpfgp
hT gfgpTpfgp
hgphgphh

.

qp
qp
X
Thus, we have

,
ff
UhUh hh

,
. This com-
pletes the proof.
Theorem 3.2. The discounted total value
f
is a
unique fixed point of the operator
f
U, i.e.
.
f
ff
U

(3.3)
Proof. As is a non-negative and bounded func tion,
there exists a such that . So, we have
r
M00rM
that 01
f
M

for all

,gp
X
, which shows
f
X
from Theorem 3.1. Since











00 001
1
00 001
00 0011
,
,,,,, ,
,,,,, ,
,,,,,
f
t
tt tt
t
t
tt tt
t
f
gp
rgpfgprg pf gp
rgpfgprgpfgp
rg p fg pgp





we have
f
ff
U
.
From the Banach fixed point theorem, the conclusion
of the theorem follows.
In order to describe the optimality equation with re-
spect to the value function
,
g
p
, we define the
operator on
U
X
as following:
 



,max,,,
,,.
aA
Uh g prg pahTgaTpa
gp h


qp
XX
,
(3.4)
Then, we have the following.
Lemma 3.3. The operator is a contra ction with
modulus U
on the spa ce
X
.
Proof. From the definitions of
X
and U, it is
obviously that is the mapping from
U
X
to
X
. For
any
,gp
X
and any ,hh
X
,







,
,,
max ,,
,,
sup
,.
aA
gp
UhgpUhgp
hTga TpahTga Tpa
hgph gp
hh



qp qp
X
Thus, we have
,Uh Uhhh

,
, which com-
pletes the proof.
Lemma 3.4. The value function
is a bounded and
measurable function. i.e.,
X
.
Proof. For any f
, we observe that f
X
from
Theorem 3.1. By Lemma 3.3, it holds that n
f
U
as (cf. [4,5]). Since n n
f
U
is a measurable,
is a measurable. Noting that

01M
 , we have
X
, as described.
Here, we can state the main result which shows the
existence of optimal policies.
Theorem 3.3. It holds that
1) The value function
is the unique fixed point of
the operator .
U
2) An optimal stationary policy exists.
*
f
3) The policy f
such that
f
UU
is the op-
timal policy.
Proof. For 1), we have, fo r a ny

,gp
X
,
Open Access AM
B. YANG ET AL. 1493
 













0
0000 =1
00 0011
00 00
00 11
,,,,
,,,,, ,
,, ,,
,, ,,
max,,,
,.
t
ftt
t
t
tt tt
t
f
aA
gprgp f gp
rgpfgprgpfgp
rg p fg pgp
rg p fg pgp
rgp ag p
Ugp










Therefore,

sup ,,
ff
g
pgpU


.
Let us prove that U
. Because
A
is finite, there
exists which satisfies
*
f
 


*11
,,,, ,,,Ugprgpfgpgp gp

 .
X
From U
, we get
 



*11
,,,, ,
,.
Ugprgpfgp Ugp
gp


X
,
Repeating the above inequality, we have




*1
11
0
,
,,,, .
ntn
ttttn n
t
Ugp
rg p fg pgp



As

111
,0
n
nn
gp


as , we have that n
*
f
U
 
. Thus, U
follows.
For 2) and 3), we have been already shown in the
proof of 1).
4. The Controlled Floating Exchange Rate
System
In this section, we will give an example for application of
our model.
For simplicity, let us denote by
,
a normal dis-
tribution with the mean
0
and variance
. Let be
, t
0

S, 
X
be the amount of
wealth at time , whose controlled system equa-
tion will be defined by:
0
tt
1,
tttt
X
XaZ
 (4.1)
where t
Z
is a sequence of i.i.d. income random vari-
ables with the normal distribution and is an
action at time , selected from the action space

1,1 t
a
t
2
,

, ,
l01
:,
A
aa
01
aa
a a
2
a
with .
1
01
ll
aa

Let
0. 7,0.86,0.,0.9,1.0 be the possible excha ng e
rate. Then, the real income will be defined by

,,,,,,rxa dxaxSaA
 


where, is a reward when the action
(4.2)

,dxa aA
is
the stattaken toe
x
and the exchange rate
.
We assume tha0
t
X
is distributed withnorm the al
distribution
,
. Then, t
X is distributed with the
normal distrn ibutio
2
,t
kk
aa



. So, we
1
t
k
1k
ndican restrict the state sng hybrid pace of the correspo
decision process to the following:


, 0
,, ,gg

XK,
(4.3)
where
,0
des one point distribution with note

x
1
1 if
x
,


0x
1 if otherwise.
ibiliss follows The credtic kernel is given a


0.5 0.5000
00.50.50
000.50.5
0000.5
0.5 000
ji
q

0
.
0
0.5
0.5

f aA
q (4.4)
Here, is assed to be independent o
q um
. The
corresponing operate Tq has a fixed point
d
*0.5,0.5,0.5,0 .5 ,0.5g, i.e., **
g
Tgq (cf. [7]).
So, putting

,
*
g
0,,


, we have the fol-
lowing optimal3,
 
eqution by Theorem 3.a
 
00
,m
for all,0,
a
 


 
*
,,,rg a2
(4.5)
where,
ax
aA
,a
 
,,rg a

is defined by (2.3). ,
It is shown in [7] that for any

gK,


*
ng g T
q
n and tha0
nt there exist 1 such
that


0
n
Tg *
g
q for all

gK. Letting
g
*
0
Gthe sequeby , definence

k
G

 
11
GgT G


1
k
k
1
GT
kk g
 K

. P
qq
.
Then, it clearly holds that u
0
0
n
k
kG
Ktting
,,,,
kgg


for k
g
G, from Theorem

3.3 we have that


2
10
1
,, ,
for ,
,, max,
aA
g
rg aa
gG
 
a


 
(4.6)
and
 

2
1
m,, ,
for 2
kk
aA
k
rgaa
gGk
 



ax
.
g
,,g
,
a
(4.7)
Noting that if k
G then an
1k
TG
q
nction d that

0
0
nG

value fu


,,g
k
k
K, the

ca
vely by (4.5)-(4. 7). n
be obtained recursi
Open Access AM
B. YANG ET AL.
Open Access AM
1494
[1] M. KageyamaDiscrete-Tim
/10.1007/s10700-011-9109-2
REFERENCES
, B. Yang and P. Hou, “e Hy-
brid Processes and Discounted Total Expected Values,”
Fuzzy Optimization Decision Making, Vo l . 10, N o . 4, 2011,
pp. 341-355.
http://dx.doi.org
ger-Verlag
ce Measure for Hybrid Even
-008-0308-x
[2] B. Liu, “Uncertain Theory,” 2nd Edition, Sprin,
Berlin, 2007.
[3] X. Li and B. Liu, “Chants
with Fuzziness and Randomness,” Soft Computing, Vol.
13, No. 2, 2009, pp. 105-115.
http://dx.doi.org/10.1007/s00500
crete-Time[4] O. Hernandez-Lerma and J. B. Lasserre, “Dis
Markov Control Processes, Basic Optimality Criteria,”
Springer, New York, 1996.
http://dx.doi.org/10.1007/978-1-4612-0729-0
[5] M. Puterman, “Markov
and Sons, Inc., 1994. Decision Processes,” John Wiley
http://dx.doi.org/10.1002/9780470316887
[6] B. Liu, “Uncertain Theory,” 3rd Edition, UTLAB, 2009.
[7] M. Kageyama and K. Iwamura, “Discrete Time Credi-
bilistic Processes: Construction and Convergences,” In-
formation Sciences, Vol. 179, No. 24, 2009, pp. 4277-
ob, “Stochastic Processes,” John Wiley, New York,
amming,” Princeton Uni-
versity Press, Princeton, 1957.
4283.
[8] J. L. Do
1953.
[9] R. E. Bellman, “Dynamic Progr