J. Intelligent Learning Systems & Applications, 2010, 2, 55-56
doi:10.4236/jilsa.2010.22008 Published Online May 2010 (http://www.SciRP.org/journal/jilsa)
Copyright © 2010 SciRes. JILSA
Editorial: Special Section on Reinforcement Learning
and Approximate Dynamic Programming
Approximate dynamic programming (ADP) is to com-
pute near-optimal solutions to Markov decision problems
(MDPs) with large or continuous spaces. In recent years,
the research works on ADP have been brought together
with the reinforcement learning (RL) community [1-4].
RL is a machine learning framework for solving sequen-
tial decision making problems that can also be modeled
as the MDP formalism. The common objective of RL
and ADP is to develop efficient algorithms for sequential
decision making under uncertain complex conditions.
Therefore, there are many potential applications of RL
and ADP in real-world problems such as autonomous
robots, intelligent control, resource allocation, network
routing, etc.
This special section of JILSA focuses on key research
problems emerging at the junction of RL and ADP. After
a rigorous reviewing process, three papers were accepted
for publication in this special section.
The first paper by M. A. Wiering [5] focuses on the
applications of reinforcement learning with value func-
tion approximation in game playing. In the paper, three
different schemes were studied for learning to play Back-
gammon with temporal difference learning. The three
training schemes include: 1) self-play, 2) playing against
an expert program, and 3) viewing experts play against
each other. Extensive experimental results using tempo-
ral difference methods with neural networks were pro-
vided to compare the three learning schemes. It was il-
lustrated that the drawback of learning from experts is
that the learning program has few chances for explora-
tion. The results also indicate that observing an expert
play is the worst method and learning by playing against
an expert seems to be the best strategy.
The second paper by J. H. Zaragoza, and E. F.
Morales [6] proposed a relational reinforcement learning
approach with continuous actions, called TS-RRLCA,
which is based on the combination of behavioral cloning
and locally weighted regression. The TS-RRLCA ap-
proach includes two main stages to learn continuous ac-
tion policy for robots in partially known environments.
The first stage is to develop a relational representation of
robot states and actions and the rQ-learning algorithm is
applied with behavioral cloning so that optimized control
policies with discrete actions can be obtained efficiently.
In the second stage, the learned policy is transformed
into a relational policy with continuous actions through a
Locally Weighted Regression (LWR) process. The pro-
posed method was successfully applied to a simulated
and a real service robot for navigation and following
tasks with different conditions.
The combination of reinforcement learning or ap-
proximate dynamic programming with learning from
demonstration is studied in the third paper [7]. A learn-
ing strategy was proposed to generate a control field for
a mobile robot in an unknown and uncertain environment,
which integrates learning, generalization, and explora-
tion into a unified architecture. Some Simulation results
were provided to evaluate the performance of the pro-
posed method.
Although RL and ADP provide efficient ways for de-
veloping machine intelligence in a trial-and-error manner,
the incorporation of human intelligence is important for
the successful applications of RL and ADP. In this spe-
cial section on RL and ADP, all the three papers studied
the relationships between machine intelligence and hu-
man intelligence in different aspects. The results of the
first paper demonstrate that an expert program for game
playing will be very helpful to develop computer pro-
grams using RL [5]. The usage of relational RL to in-
corporate human examples was investigated in the sec-
ond paper [6]. In the third paper [7], the method of
learning from human demonstration was employed to
generate initial control field for an autonomous mobile
robots. Therefore, the results in this special section will
be good references for future research in related topics.
At last, I would like to thank all of the authors and
reviewers who have made contributions to this special
Xin Xu
[1] F. Y. Wang, H. G. Zhang and D. R. Liu, “Adaptive Dy-
namic Programming: An Introduction,” IEEE Computa-
tional Intelligence Magazine, May 2009, pp. 39-47.
[2] W. B. Powell, “Approximate Dynamic Programming:
Solving the Curses of Dimensionality,” Wiley, Princeton,
Editorial: Special Section on Reinforcement Learning and Approximate Dynamic Programming
NJ, 2007.
[3] R. S. Sutton and A. G. Barto, “Reinforcement Learning:
an Introduction,” MIT Press, Cambridge, MA, 1998.
[4] D. P. Bertsekas and J. N. Tsitsiklis, “Neuro-Dynamic
Programming. Belmont,” Athena Scientific, MA, 1996.
[5] M. A. Wiering, “Self-play and Using an Expert to Learn
to Play Backgammon with Temporal Difference Learn-
ing,” Journal of Intelligent Learning Systems and Appli-
cations, Vol. 2, 2010, pp. 55-66.
[6] J. H. Zaragoza and E. F. Morales, “Relational Rein-
forcement Learning with Continuous Actions by Com-
bining Behavioural Cloning and Locally Weighted Re-
gression,” Journal of Intelligent Learning Systems and
Applications, Vol. 2, 2010, pp. 67-77.
[7] D. Goswami and P. Jiang, “Experience Based Learning
Controller,” Journal of Intelligent Learning Systems and
Applications, Vol. 2, 2010, pp. 78-83.
Copyright © 2010 SciRes. JILSA