Autonomous Adaptive Agent with Intrinsic
Motivation for Sustainable HAI*
Takayuki Nozawa1,#, Toshiyuki Kondo2
1Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan; 2Institute of Symbiotic Science and Technology,
Tokyo University of Agriculture and Technology, Tokyo, Japan; # corresponding author.
Received March 25th, 2010; revised September 8th, 2010; accepted September 13th, 2010.
For most applications of human-agent interaction (HAI) research, maintaining the user’s interest and continuation of
interaction are the issu es of primary importance. To achieve susta inable HAI, we proposed a ne w model of intrin sically
motivated adaptive agen t, which learns about th e human partner and behaves to sa tisfy its intrinsic motivation. Simula -
tion of interaction with several types of other agents demonstrated how the model seeks new relationships with the
partner and avoids situations which are not learnable. To investigate effectiveness of the model, we conducted a com-
parative HAI experiment with a simple intera ction setting. The results showed that the model was effective in inducing
subjective impressions of higher enjoyability, charm, and sustainability. Information theoretic analysis of the interac-
tion suggested that a balanced information transfer between the agent and human partner would be important. The par-
ticipants’ brain activity measured by functional near-infrared spectroscopy (fNIRS) indicated higher variability of ac-
tivity at the dorsola teral p refrontal cortex during the in teractio n with the proposed agen t. These results sugg est that the
intrinsically motivated adaptive agent successfully maintained the participants’ interest, by affecting their attention
Keywords: Human-Agent Interaction (HAI), Intrinsic Motivation, Reinforcement Learning, Functional Near-Infrared
Spectroscopy (Fnirs)
1. Introduction
Research on human-agent interaction (HAI) and hu-
man-robot interaction (HRI) has recently been growing
and producing a wide range of applications, such as en-
tertainment, therapeutic use, media of communication,
and other kinds of assistance for intellectual activities [1].
For most of these applications, maintaining the user’s
interest and continuation of interaction are the issues of
primary importance.
Among the various factors which affect the impression
of HAI, Nakata et al. focused on the predictability of the
behavior of agents. They experimentally studied how
different degrees of randomness in the behavior affect
the impression about the agents, and showed that maxi-
mum human interest is achieved by interaction with the
agent of intermediate informational transmission effi-
ciency [2,3]. Similarly, Kondo et al. investigated the re-
lationship between the predictability and sustainability of
the interaction, and showed that moderate degree of pre-
dictability can contribute to the sustainability [4].
However, humans get bored even with agents of mod-
erate predictability, once they fix their mental model
about the agents as such. To achieve more sustainable
HAI, it will be useful to take notice of our own motiva-
tion in HAI, as well as in interaction with other human or
animal. We are generally willing to continue interaction
with others when it satisfies our intrinsic motivation for
curiosity, exploration, manipulation, achievement, etc.
[5]. Therefore, one promising approach to more “natural”
and sustainable HAI would be to endow the agent with
the intrinsic motivation.
Intrinsic motivation has recently been utilized in de-
velopmental robotics (also known as epigenetic robotics
or ontogenetic robotics) to learn progressively from sim-
pler to more complex situations, avoiding situations in
Autonomous Adaptive Agent with Intrinsic Motivation for Sustainable HAI
which nothing can be learned [6,7]. However, effective-
ness of intrinsically motivated agent on HAI is not cer-
tain because, unlike the usual cases in developmental
robotics, the environment (human partner) can be highly
dynamic. Indeed, the effectiveness of intrinsically moti-
vated agent for sustainable HAI is yet to be explored.
In this study, therefore, we proposed a new model of
adaptive interaction agent which learns about the human
partner and behaves to satisfy its intrinsic motivation.
The model’s dynamical properties were analyzed by
simulating interaction with several types of other agents.
To investigate the model’s effectiveness for enjoyable
and sustainable HAI, we implemented the agent for a
simple interaction setting and conducted a comparative
experiment. In addition to investigating subjective im-
pressions of the agents and the interaction log, we meas-
ured the activities of the prefrontal brain region of the
participants during interaction by functional near-infrared
spectroscopy (fNIRS), to study the effect of different
types of agents on their cognitive states.
The rest of the paper is organized as follows. The
model of intrinsically motivated adaptive agent is defined
in Section 2. In Section 3, its dynamical properties are
described by simulation. Section 4 explains the setting of
HAI experiment. Section 5 gives the experimental results.
Finally, Section 6 concludes the paper.
2. Model of Intrinsically Motivated Adaptive
2.1. Adaptivity and Reinforcement Learning
We focus on discrete, turn-taking type of interaction,
which means that an interaction is described from the
agent’s viewpoint as a sequence
11    tttt asas (1)
where denotes sensory input from the human
partner to the agent at time
, and denotes ac-
tion of the agent at
An agent driven by certain motivation system—whether
intrinsic or extrinsic—must be able to adapt to the envi-
ronment (partner), to satisfy its motivation. We use the TD
learning [8], which is a standard method in reinforcement
learning (RL), to model the adaptive agent.
In the framework of RL, each input t is accompa-
nied by a reward . When a new input 1t is
obtained, the agent updates value of the preceding input
t based on the stored reward and the current value
:, as
()( ),
() ().
tt tt
rVs Vs
Vs Vs
 
 (2)
is learning rate parameter of the value function,
is discount rate of a future reward in the present
value. Values of these parameters should be determined
empirically, taking the nature of the problem to be
learned into account. In the initial state, without any a
priori knowledge, one can set for all
0)( sV
Based on the value function , the agent selects an
action which is likely to maximize expected values for
the next moment. The method of action selection is given
in Section 2.2.
) {1
a Ps
),( if
asP as
2.2. Internal Model
In many simple learning problems, the reward t is di-
rectly associated with t. In the intrinsically motivated
agent, however, t is derived from the agent’s internal
model of the environment (human partner).
), t
, )}
The role of the internal model is to predict what will
be the next input 1t given a context , and
how it is likely that a contextual situation itself
will take place.
,( t
Extension of the following discussion for context
111 LtttLttt with longer history
length is straight forward. However, longer history
requires exponentially longer period of interaction to
obtain a stable internal model.
,,,;,,( aaass 
For some specific problems, one could construct pa-
rametric internal models, which can be useful to save
computational resources. Here, however, we adopt more
extensive approach and define the internal model as a
combination of the transition probability distribution
and the context probability distribution
),( ttT asP
)( tC
The internal model is updated based on the interaction
history (1). As mentioned earlier, human partner can be
highly dynamic. Therefore, the update of the internal
model should incorporate decay of the memory. When
the current context and the new input 1t are
given, the transition probability distribution is up-
dated by
),( tt as
(| ,(|
(| , )(| ,(| ,
Tt Tt
Ttt Tt Tt
Pss Ps
sa Pss sa
Similarly, the context probability distribution is
updated by
),( asP as
asP C
C (4)
and C
are learning rates of the internal model.
Appropriate values of these parameters should be deter-
Autonomous Adaptive Agent with Intrinsic Motivation for Sustainable HAI169
mined based on the sizes of input/output sets and the
dynamic property of the partner. If the rates are too large,
the agent loses the memory of interaction history too
quickly, and thus fails to obtain an acceptable internal
model. Due to the exponential nature of the update rules,
possible criteria for the upper bounds of the parameters
could be 2/1)1( S
and 2/1)1(  AS
. If the
rates are too small, on the other hand, the agent cannot
catch up with the dynamic change of the human partner.
Therefore, the lower bounds depend on the property of
the partner, and these parameters should be empirical.
The important point is avoiding extremely small values
and thus giving the agent some opportunities to take ini-
tiative in the interaction (generally, this does not require
fine tuning).
In the initial state, when no a priori knowledge is
available, S/1),|'( assP
T for all SAS
and AS  /1),( asPC for all . AS ),( as
The transition probability T is also used to derive
action values for the action selection, as
stttTtt aasssPsVsaaV (5)
We utilize the Boltzmann action selection method, so
the probability of the agent selecting action aat
given , is
,)|( /)|(
tt t
saap (6)
where is the temperature parameter, which deter-
mines the balance between the maximization of the ex-
pected value based on the current value function and the
exploration for refinement of the value function.
2.3. Intrinsic Motivation for Information Transfer
In considering the proper expression of reward for the
intrinsically motivated adaptive agent, we directed our
attention on the transfer entropy [9], which is an infor-
mation-theoretic measure quantifying the causal interac-
tion between two systems, excluding the shared informa-
tion due to common history. The transfer entropy can be
utilized to characterize autonomous systems [10].
From the probability distribution of triadic inter-
action sequence , transfer entropy from the
agent (A) to the human partner (H) is given as
),,( 1ttt sas
sas tt
Here and
  tt
ss ttttttsspsspssH 1)|(log),()|( 111
(|,) (,,)log(|
tttttt tt
sas 1
ssapsas ps
 sa
are the conditional entropy, and is the
conditional mutual information. The more the agent’s
action t has influence on the human’s response 1t
given the same t, the larger HA becomes (in other
words, the more information the agent can transfer to the
human partner).
)|;( 1ttt sasMI
a s
s TE
Our intrinsically motivated adaptive agent tries to
maximize this measure of influence HA on the hu-
man partner. This leads the following reward function
log (|
log (|
Tt tt
CT tt
Tt t
Ps sa
Ps s
Ps sa
Note that the reward is expressed in terms of the in-
ternal model . Maximization of the reward leads
to the maximization of HA
TE (note the correspon-
dence between the reward term and the term in Equation
(7), given that the internal model is sufficiently precise).
)( C
By replacing all the probability terms in Equation (7)
with those of the internal model, one can also obtain the
agent’s subjective transfer entropy
s a
) (,)
tC tt
t Ct
t Ct
T t
)|( ssP
As the agent cannot directly access the probability dis-
tribution in Equation (7), the subjective transfer en-
tropy provides a dynamic estimate, from the viewpoint of
the agent, of how much it is controlling the environment
(partner). When the internal model is well adapted, the
subjective transfer entropy gives a good estimate of
Let us describe through several possible situations how
the reward (8) controls the behavior of the agent. First,
when the agent finds an action t which can effectively
induce otherwise rare response 1t givent,
ttTCtttT and the reward is high.
However, when the agent repeats the same pattern of
interaction sequence tt for its high value, both
the numerator and denominator terms of (8) come close
to 1 due to the update rules (3) and (4), so the reward and
the value decrease, which correspond to the loss of inter-
est, making the agent stop the repetition. On the other
hand, when t is followed by an unexpected 1t
given t, that is, 1,1 ttTCtttT , the re-
ward becomes negative, and such cases occur more often
in the situations where the agent cannot influence the
human response. Taking these considerations together,
one can expected that the reward makes the agent pursue
|()( 1, ssasP
,,( sas
|( ssP
Copyright © 2010 SciRes. JILSA
intermediate level of novelty, avoiding situations in
which nothing can be learned, in a similar way as the intel-
ligent adaptive curiosity proposed by Oudeyer et al. [7].
0100 200 300400
0.0 0.2 0.4 0.6 0.8 1.0
time [step]
Probabilit y
act ion 1
act ion 2
act ion 3
2.4. Algorithm
Combining the components described above, the opera-
tion of the intrinsically motivated adaptive agent can be
described in the following procedural form:
Initialize ),,( 000 ras , the value function V, and the
internal model ),(PP ;
Starting from 0t, repeat
With the given context ),( tt as , obtain new
input s from the environment (partner);
Update the value function )( t
sV by the TD
learning method (2);
Update the internal model ),( CT PP by the
rules (3) and (4);
0100 200 300 400
time [step]
Subject ive Transf er Entropy [ bit]
Evaluate the reward 1t
r by (8), and store it
for the update of value function (2) in the next
time step;
Select the action 1t
a using the rules (5) and
1 tt;
3. Simulation
In this section, we describe the behavior of the intrinsi-
cally motivated adaptive agent by simulating its interac-
tion with several types of other agents.
In the following, both the intrinsically motivated agent
and the other agent accept three types of inputs and take
three types of actions; that is, }3,2,1{
for both
agents. We used the parameter values shown in Table 1
for the intrinsically motivated agent, unless stated other-
wise. We utilize the transition of the action probabilities
given by Equation (6) and the subjective transfer entropy
given by Equation (9) to characterize the interactions.
Figure 1. Transition of the action probabilities (a) and the
subjective transfer entropy (b) of the intrinsically motivated
adaptive agent, which is interacting with the random action
3.1. Interaction with a Random Action Agent
less of the input t. In this case, the intrinsically moti-
vated agent cannot control the other agent, so the action
probabilities fluctuate around the uniform value of 1/3,
and the subjective transfer entropy is nearly zero.
Figure 1 shows the transition of action probabilities and
the subjective transfer entropy of the intrinsically moti-
vated agent interacting with an agent which selects its
action randomly with equal probability 1/3, regard-
a3.2. Interaction with a Partially Regular Agent
Table 1. Parameter values used for the adaptive agents in
Section 3 and 4.
Parameter Value
Learning rate of value function
Discount rate of future reward
Learning rate of transition probability T
Learning rate of context probability C
Temperature for action selection
Next, we study the interaction with a partially regular
agent, which chooses its response t to an input by
the following response probability matrix
 jitt pisjap (10)
Note that the inputs 2
s and 3 to this agent elicit
Copyright © 2010 SciRes. JILSA
Autonomous Adaptive Agent with Intrinsic Motivation for Sustainable HAI171
with higher probability the responses and ,
respectively, while exerts no control on the
0100200 300 400
0.0 0.2 0.4 0.6 0.8 1.0
time [step]
act ion 1
act ion 2
act ion 3
Figure 2 shows the transition of action probabilities
and the subjective transfer entropy of the intrinsically
motivated agent interacting with the agent defined by
Equation (10). The intrinsically motivated agent learns to
avoid the action 1 (Figure 2(a)) and to achieve higher
degree of control on the partner (Figure 2(b)). This re-
sult shows that the agent is actually capable of avoiding
situations where nothing can be learned. Figure 2(a) also
shows that the intrinsically motivated agent keeps trying
to find a new controllable relationship by altering its ac-
tion between 2 and 3, rather than adhering to one control
3.3. Interaction with a Fixed-Reward Adaptive
0100 200300 400
time [step]
Subjectiv e Tran s f e r E ntr opy [bit ]
Here, the intrinsically motivated agent (agent 1) interacts
with another adaptive agent (agent 2), which has the
same rules for the update of value function (2) and of the
internal model (3), (4), and uses the same method (5), (6)
for action selection, with the same parameter values in
Table 1, but its reward is directly associated with the
input by
r (11)
Figure 3 shows the transition of action probabilities and
the subjective transfer entropy of both agents. Similar to in
the interaction with the partially regular agent, the intrin-
sically motivated agent achieves control on the partner
(Figure 3(c)) by altering its action strategy (Figure 3(a))
and thus affecting that of the partner (Figure 3(b)). The
fixed-reward agent, on the other hand, does not exert much
influence on the intrinsically motivated agent.
Figure 2. Transition of the action probabilities (a) and the
subjective transfer entropy (b) of the intrinsically motivated
adaptive agent in interaction with the partially regular
3.4. Interaction between Two Intrinsically Moti-
vated Agents
established relationship, by decreasing the ratio of the
learning rate of the context probability C
to that of the
transition probability T
. Figure 5 shows the transition
of action probabilities and the subjective transfer entropy
of the two interacting intrinsically motivated agents,
whose C
s were set to 0.01. This result indicates that
the learning rates control the time scale of the agent’s
dynamics, with decreased values inducing slower transi-
tion of both the action probability distribution and the
subjective transfer entropy.
Finally, we show the interaction of two intrinsically mo-
tivated agents. Figure 4 shows the transition of action
probabilities and the subjective transfer entropy of the
two interacting intrinsically motivated agents. In this
case, the agents competes with each other for control and
keeps changing their strategies (Figure 4(a), (b)), so the
subjective transfer entropy does not reach the level
achieved in the interactions with more static agents (cf.
Figure 2(b) and Figure 3(c)). In summary, the intrinsically motivated agent demon-
strated its capabilities to pursue learnable and controlla-
ble situations, and to avoid fixed relationship with the
partner by altering its action strategy. These features are
expected to induce the impression of sometimes unpre-
As discussed in Section 2.2, the time scale in which
the intrinsically motivated agent changes its strategy de-
pends on the learning rates of the internal model; there-
fore, the agent becomes slower to lose its interest in the
Copyright © 2010 SciRes. JILSA
0100 200 300400
0.0 0.2 0.4 0.6 0.8 1.0
time [step]
Probabilit y
act ion 1
act ion 2
act ion 3
0100 200 300400
0.00.2 0.4
time [step]
Probabilit y
act ion 1
act ion 2
act ion 3
(a) (a)
0.0 0.20.4 0.6 0.8 1.0
time [step]
act ion 1
act ion 2
act ion 3
0100 200 300400
0.0 0.2 0.4 0.60.8 1.0
time [step]
act ion 1
act ion 2
act ion 3
0100 200 300 400
0.0 0.2 0.4 0.6 0.8
0100 200 300 400
0.0 0.2 0.4 0.6 0.8
time [step]
Subject ive Trans f er Entropy [ bit]
Agent 12
Agent 21
0100 200 300 400
0100 200 300 400
time [step]
Subjective Transfer Entropy [bit]
Agent 12
Agent 21
Figure 3. Transition of the action probabilities of the intrin-
sically motivated agent (agent 1; a), that of the fixed-reward
adaptive agent (agent 2; b), and the subjective transfer en-
tropy (c) of both agents.
Figure 4. Transition of the action probabilities of the two
intrinsically motivated agents ((a) agent 1, (b) agent 2), and
their subjective transfer entropy (c).
Autonomous Adaptive Agent with Intrinsic Motivation for Sustainable HAI173
dictable yet coherent and understandable behavior, and
thus would be effective in achieving more natural and
sustainable HAI.
0100 200 300 400
0.0 0.81.0
time [step]
Probabili ty
action 1
action 2
action 3
4. Experiment
To assess the effectiveness of our model of intrinsically
motivated adaptive agent in HAI, we conducted a com-
parative experiment of three types of agents in a simple
interaction design.
4.1. Interaction Design
We used a virtual agent, rather than a real robot agent, to
prevent physically induced artifacts on the fNIRS meas-
urement by minimizing the participants’ movements and
changes in posture during interaction with the agent [11].
A CG image of AIBO, Sony’s four-legged robot, was
presented on a 14.1 inch LCD display that was placed 70
cm in front of the participant sitting on a chair. Using a
computer mouse, the participant clicked or dragged on
the agent image. Based on the mouse-button pressing
time, the agent distinguished each mouse input either as a
click or as a drag (thus), with the
threshold of 350 ms.
},{ dragclickS
0100 200300400
time [step]
act ion 1
act ion 2
act ion 3
The agent, in return, showed one of four actions
(movies) },,,{ 4321 MMMM
; Action 1 was to
move the agent’s head upward, then downward, and up-
ward back to the neutral position. 2 was to move its
head upward and then shaking it left and right (once each
side), then back to the neutral position. 3 was to
move its head up-right, back to neutral, up-left, back to
neutral again. 4 was to move its head downward,
wiggle it forward and backward three times in quick
succession, then move back to neutral. The meanings of
the actions were left to the interpretation of each partici-
pant. Each action took 1.5 s, and the agent did not accept
new mouse input till the period ends.
0100 200 300 400 0.6 0.8
0100 200 300 400 0.6 0.8
time [step]
Subjec tiv e Tr ans fer E ntro py [bit]
Age nt 12
Age nt 21
4.2. Agents in Comparison
The following three types of agents were compared in the
interaction experiment:
Type I was the intrinsically motivated adaptive agent
defined in Section 2.
Type F was an adaptive agent which, like the fixed re-
ward adaptive agent used in Section 3.3, had the same
rules (2-6) for learning, but the reward was extrinsically
given by either of the fixed functions
r (12)
Figure 5. Transition of the action probabilities of the two
intrinsically motivated agents ((a) agent 1, (b) agent 2), and
their subjective transfer entropy (c). The learning rate of
the context probability 01.0
for both agents.
Copyright © 2010 SciRes. JILSA
1 2 34
10 1112 13
19 2021 22
567 8
14 1516 17
Figure 6. The arrangement of fNIRS optodes on the
Type R was an agent which selected its action ran-
domly with equal probability 1/4, regardless of the in-
put .
The parameter values in Table 1 were used for the
adaptive agents of type I and F. These parameter values
were determined based on some simulations and a pre-
liminary experiment.
4.3. Participants and Procedures
Twenty four healthy graduate or undergraduate students
(23 males and 1 female, all right handed, mean age 21.7
± S.D. 1.7 years) participated in the experiment. All par-
ticipants were explained about the experiment before
giving written informed consent. This study was ap-
proved by the ethics committee of the Tokyo University
of Agriculture and Technology.
Before the experiment, the participants were familiar-
ized with the operation, by a 5 min practice session with
an agent that showed all the four actions in an order
when the mouse was clicked, and in the reverse order
when dragged.
The participants were instructed to freely set and
change their aims of interaction. Each participant had
interaction sessions with all the three types of agents.
They were divided into four groups (six participants
each), by the two orderings of agent types, (R, I, F) or (F,
I, R), and by the two kinds of reward 1F
or 2F
(12) for type F agent.
Each interaction session consisted of 1 min fixation
phase, 15 min interaction phase, and again 1 min fixation
phase. During the fixation phases, the participants were
instructed to fixate their attention on a cross shown at the
center of the display. Each interaction session was fol-
lowed by 10 min rest period, during which they were
asked to complete the questionnaire.
4.4. Questionnaire
After each session, the participants were asked to de-
scribe their impression about the agent they interacted
with. After the second and third sessions, they were also
asked to answer a Likert scale questionnaire comparing
the last two agents they interacted with. The question-
naire consisted of 16 items with eight viewpoints, each
item with seven rating levels from “strongly disagree”
(3) to “strongly agree” (+3). The eight viewpoints were:
1. enjoyable,
2. charming,
3. lively,
4. soothing,
5. consistent,
6. obedient,
7. insightful to your intention, and
8. desirable for longer period of interaction.
For each of the eight viewpoints (adjectives), two items
—“The latter felt more {adjective} than the former.” and
“The latter felt less {adjective} than the former.” —were
presented. This was to balance the influence of posi-
tive/negative expressions, and to check the consistency
of each participant’s ratings. The items were arranged in
a randomized order.
4.5. Interaction Log
In regard to the actions of participants, timing and types
of mouse operations were recorded. For the agents, tim-
ing and types of movie actions were recorded. For the
adaptive agent of type I and F, the sequence of rewardst,
the estimated value functionV, and the internal model
were also logged.
),( CTPP
From these data, we examined statistics of hu-
man/agent actions, information theoretic measures of
interaction, such as mutual information, distinguish abil-
ity, controllability (dyadic) [3] and transfer entropy (tri-
adic), and dynamics of these measures.
4.6. fNIRS Measurement
Prefrontal region of human brain plays significant roles
for attention control, working memory, executive func-
tion, etc. [12], which will be important for sustainable
HAI. Therefore, we measured the activity in prefrontal
brain region of twelve participants during the interaction
sessions by functional near-infrared spectroscopy
Like functional magnetic resonance imaging (fMRI),
fNIRS assesses brain activities based on hemodynamic
responses. It enables us to measure the changes in con-
centration of oxygenated, deoxygenated, and total hemo-
globin (oxy-Hb, deoxy-Hb, and total-Hb) within cortical
tissue. In the analysis, we focused on the oxy-Hb, as it is
suggested to be the most sensitive indicator of activ-
ity-dependent hemodynamic changes [13].
We used a multi-channel fNIRS instrument
(FOIRE-3000, Shimadzu Co., Japan). Eight source and
Copyright © 2010 SciRes. JILSA
Autonomous Adaptive Agent with Intrinsic Motivation for Sustainable HAI
-3 -2 -10123
Rating I R
**** **
en t T
Entropy o f Ag ent A c tion [bit ]
0.0 0.5 1.0 1.5 2.0
Figure 8. Entropy of agent actions, which was calculated for
each interaction session and then averaged for each agent
type over participants. Error bars indicate the standard
error of the mean.
-3 -2 -10123
Rating I F
** *****
5. Results
For three participants (one with NIRS measurement),
over 95 percent of their mouse actions were either click
or drag in at least one session. Therefore, they were
judged to have conducted the interaction improperly, and
excluded from the following analyses.
5.1. Subjective Impressions
Figure 7 shows the distribution of the comparative rat-
ings between type I and the other two types, with respect
to the eight viewpoints given above. The viewpoints with
which Wilcoxon signed rank test (null hypothesis: the
rating is symmetric about 0) indicated significant differ-
ence with level 05.0
is marked with “*”, and those
with level
are marked with “**”. This result
shows that the intrinsically motivated adaptive agent
gave impressions of higher enjoy ability, charm, and sus-
tainability than the other two types of agents (viewpoint
1: .0
p for type I vs. R and for type I
vs. F; viewpoint 2:
p for type I vs. R and
p for type I vs. F; viewpoint 8: for
type I vs. R and
p for type I vs. F).
Figure 7. Boxplot of the ratings on comparative impressions
with respect to eight viewpoints. (a) the agent type I vs. R.
(b) the agent type I vs. F. See Section 4.4 for the contents of
the viewpoints.
seven detector optodes were placed on the prefrontal
regions, covering Fp1, Fp2 and Fz positions of the inter-
national 10-20 system, with a total of 22 channels (Fig-
ure 6). The data were acquired at a sampling period of
70 ms. To reduce instrumental and physiological noise,
the signals were band-pass-filtered with Chebyshev type
II filter of 4-th order with cut-off frequencies of 0.7 and
0.002 Hz, pass-band ripple 5 dB.
5.2. Statistics of Actions
The average number of interactions in a session was
383.9. There were no significant differences in the num-
ber among the three agent types. Regarding human ac-
tions, the average number of click was 256.9, and that of
drag was 127.0. The ratio of click to drag did not show
significant differences among the agent types, either.
Figure 8 shows the difference of average entropy of
agent actions for the agent types. The frequency
To avoid physically induced artifacts as much as pos-
sible, the participants were asked to assume a preferred
posture and retain it for the entire duration of the ex-
periment. To avoid the displacement of optodes, the par-
ticipants kept the fNIRS optodes on throughout all the
three interaction sessions.
Autonomous Adaptive Agent with Intrinsic Motivation for Sustainable HAI
(a) (b)
(c) (d)
Figure 9. Static estimation of (a) mutual information (,)
Isa, (b) transfer entropy TE, (c) mutual information
(, )
, and (d) transfer entropy . Error bars indicate the standard error of the mean.
Figure 9(a) and (b) show that type I agent caused in-
termediate level of information transfer from the partici-
pant to the agent, meaning that the predictability of its
actions for human was also intermediate between the
other two types of agents. Multiple pair wise compari-
sons (Wilcoxon signed rank test with Holm’s multiple
test correction) showed significant differences in
and in between the agent types
),( tt asMI
p for all pairs).
distributions of the agent actions were more biased in the
inter actions with type F agent ( for F vs. I and
F vs. R, for type I vs. R). 001.0p
5.3. Characteristics of Information Transfer
To compare the features of interaction with the different
types of agents from the information theoretic viewpoint,
first, we computed the frequency distributions of dyadic
pairs , and triadic interactions
t, t over each interaction session.
Using these, we calculated mutual information and trans-
fer entropy and compared them among the agent types
(Figure 9; results of the distinguish ability and controlla-
bility were omitted because they were qualitatively
equivalent to those of the mutual information).
),( tt as
), tt as),( 1tt sa
), 1tt sa,(1
a,(sFor the information transfer from agent to human, the
mutual information and transfer entropy exhibited dif-
ferent characteristics (Figure 9(c) and (d)). Multiple pair
wise comparisons showed significantly larger
with type F than with the other two types
),( 1tt saMI
p for F vs. I and F vs. R, for I vs. R), 243.0p
Autonomous Adaptive Agent with Intrinsic Motivation for Sustainable HAI177
Agent T ype
Ox y-Hb Var iability [a. u. ]
0.00 0.01 0.02 0.03 0.04
Figure 11. Variability of fNIRS oxy-Hb signal from a lower
left channel. Error bars indicate the standard error of the
and HA for the three types of agents.
AH in Figure 10(a) showed a similar result to Fig-
ure 9(b). HA in Figure 10(b), on the other hand,
manifested significant differences between all agent
types (
for F vs. I and F vs. R, for
I vs. R). This indicates that once t was given, the ac-
tion t of type I agent had more influence on the hu-
man response 1t than that of type F agent. A possible
interpretation for the highest HA in the interaction
with type R agent is that as the participants could not find
any strategy in the agent, they invented an imaginary
relationship and played their subservient roles.
These results suggest that the intrinsically motivated
adaptive agent induced better subjective impressions by
achieving a balanced information transfer with the hu-
man partners.
5.4. Variability of Activity in Prefrontal Cortex
Figure 10. Time-averaged subjective transfer entropy from
the participants to the agents (a), and from the agents to the
participants (b). Error bars indicate the standard error of
the mean. We evaluated the variability of activity in the prefrontal
region by the standard deviation of oxy-Hb signals from
each of the 22 channels. By multiple pairwise compari-
sons (Wilcoxon signed rank test with Holm’s multiple
test correction), the variability showed significantly
higher values with type I agent ( for I vs. F,
p for I vs. R, 375.0
p for F vs. R; see also
Figure 11) at a lower left channel (channel 22 in Figure
6). The channel was overlapped with the dorsolateral
prefrontal cortex (DLPFC), which is involved in control-
ling and sustaining attention [12]. Therefore, this higher
variability suggests that the intrinsically motivated adap-
tive agent successfully kept affecting the participants’
attention level.
but no significant differences were found in HA
between any pairs of agent types. We also note that
HA was significantly larger than for
type I agent (Wilcoxon signed rank test,).
TE )
To capture differences in dynamic aspects of the in-
teraction more accurately, we evaluated the transition of
subjective transfer entropy for all sessions. In addition to
type I and F agents, the internal models updated by the
rules (3) and (4) with the values of the learning rates
),( CT
in Table 1 were also hypothesized in type R
agent and in the participants as well, and they were used
to calculate the subjective transfer entropy by Equation
(9). 6. Conclusions
Figure 10 shows the differences of the time-averaged To achieve sustainable HAI, we proposed a new model
Autonomous Adaptive Agent with Intrinsic Motivation for Sustainable HAI
of intrinsically motivated adaptive agent, which tries to
maximize its influence on the human partner. The simu-
lation demonstrated how the model tries to keep satisfy-
ing its motivation by pursuing new relationships with the
partner and by avoiding situations where nothing can be
learned. To assess the effectiveness of the intrinsically
motivated adaptive agent, we conducted a comparative
HAI experiment with three types of agents. The results
showed that the model was effective in inducing subjec-
tive impressions of higher enjoy ability, charm, and sus-
tainability. Information theoretic analysis of the interac-
tion suggested that a balanced information transfer be-
tween the agent and human partner would be important
for sustainable HAI. The participants’ brain activity
measured by fNIRS indicated higher variability of activ-
ity at the left DLPFC during interaction with the pro-
posed agent, suggesting that the model kept affecting the
participants’ attention level.
Unlike the models of intrinsically motivated learner in
developmental robotics [6,7], our model did not incorpo-
rate the extension of dimensions in input-action space
and the internal model space. Such a developmental as-
pect will be effective for longer term sustainable HAI,
though experimental assessment of its effectiveness
would become more qualitative.
