Editorial: Special Section on Reinforcement Learning and Approximate Dynamic Programming

doi:10.4236/jilsa.2010.22008

Paper Menu >>

Journal Menu >>

J. Intelligent Learning Systems & Applications, 2010, 2, 55-56

doi:10.4236/jilsa.2010.22008 Published Online May 2010 (http://www.SciRP.org/journal/jilsa)

Editorial: Special Section on Reinforcement Learning

and Approximate Dynamic Programming

Approximate dynamic programming (ADP) is to com-

pute near-optimal solutions to Markov decision problems

(MDPs) with large or continuous spaces. In recent years,

the research works on ADP have been brought together

with the reinforcement learning (RL) community [1-4].

RL is a machine learning framework for solving sequen-

tial decision making problems that can also be modeled

as the MDP formalism. The common objective of RL

and ADP is to develop efficient algorithms for sequential

decision making under uncertain complex conditions.

Therefore, there are many potential applications of RL

and ADP in real-world problems such as autonomous

robots, intelligent control, resource allocation, network

routing, etc.

This special section of JILSA focuses on key research

problems emerging at the junction of RL and ADP. After

a rigorous reviewing process, three papers were accepted

for publication in this special section.

The first paper by M. A. Wiering [5] focuses on the

applications of reinforcement learning with value func-

tion approximation in game playing. In the paper, three

different schemes were studied for learning to play Back-

gammon with temporal difference learning. The three

training schemes include: 1) self-play, 2) playing against

an expert program, and 3) viewing experts play against

each other. Extensive experimental results using tempo-

ral difference methods with neural networks were pro-

vided to compare the three learning schemes. It was il-

lustrated that the drawback of learning from experts is

that the learning program has few chances for explora-

tion. The results also indicate that observing an expert

play is the worst method and learning by playing against

an expert seems to be the best strategy.

The second paper by J. H. Zaragoza, and E. F.

Morales [6] proposed a relational reinforcement learning

approach with continuous actions, called TS-RRLCA,

which is based on the combination of behavioral cloning

and locally weighted regression. The TS-RRLCA ap-

proach includes two main stages to learn continuous ac-

tion policy for robots in partially known environments.

The first stage is to develop a relational representation of

robot states and actions and the rQ-learning algorithm is

applied with behavioral cloning so that optimized control

policies with discrete actions can be obtained efficiently.

In the second stage, the learned policy is transformed

into a relational policy with continuous actions through a

Locally Weighted Regression (LWR) process. The pro-

posed method was successfully applied to a simulated

and a real service robot for navigation and following

tasks with different conditions.

The combination of reinforcement learning or ap-

proximate dynamic programming with learning from

demonstration is studied in the third paper [7]. A learn-

ing strategy was proposed to generate a control field for

a mobile robot in an unknown and uncertain environment,

which integrates learning, generalization, and explora-

tion into a unified architecture. Some Simulation results

were provided to evaluate the performance of the pro-

posed method.

Although RL and ADP provide efficient ways for de-

veloping machine intelligence in a trial-and-error manner,

the incorporation of human intelligence is important for

the successful applications of RL and ADP. In this spe-

cial section on RL and ADP, all the three papers studied

the relationships between machine intelligence and hu-

man intelligence in different aspects. The results of the

first paper demonstrate that an expert program for game

playing will be very helpful to develop computer pro-

grams using RL [5]. The usage of relational RL to in-

corporate human examples was investigated in the sec-

ond paper [6]. In the third paper [7], the method of

learning from human demonstration was employed to

generate initial control field for an autonomous mobile

robots. Therefore, the results in this special section will

be good references for future research in related topics.

At last, I would like to thank all of the authors and

reviewers who have made contributions to this special

section.

Xin Xu

Editor-in-Chief,

JILSA

REFERENCES

[1] F. Y. Wang, H. G. Zhang and D. R. Liu, “Adaptive Dy-

namic Programming: An Introduction,” IEEE Computa-

tional Intelligence Magazine, May 2009, pp. 39-47.

[2] W. B. Powell, “Approximate Dynamic Programming:

Solving the Curses of Dimensionality,” Wiley, Princeton,

Editorial: Special Section on Reinforcement Learning and Approximate Dynamic Programming

NJ, 2007.

[3] R. S. Sutton and A. G. Barto, “Reinforcement Learning:

an Introduction,” MIT Press, Cambridge, MA, 1998.

[4] D. P. Bertsekas and J. N. Tsitsiklis, “Neuro-Dynamic

Programming. Belmont,” Athena Scientific, MA, 1996.

[5] M. A. Wiering, “Self-play and Using an Expert to Learn

to Play Backgammon with Temporal Difference Learn-

ing,” Journal of Intelligent Learning Systems and Appli-

cations, Vol. 2, 2010, pp. 55-66.

[6] J. H. Zaragoza and E. F. Morales, “Relational Rein-

forcement Learning with Continuous Actions by Com-

bining Behavioural Cloning and Locally Weighted Re-

gression,” Journal of Intelligent Learning Systems and

Applications, Vol. 2, 2010, pp. 67-77.

[7] D. Goswami and P. Jiang, “Experience Based Learning

Controller,” Journal of Intelligent Learning Systems and

Applications, Vol. 2, 2010, pp. 78-83.