Paper Menu >>
Journal Menu >>
J. Intelligent Learning Systems & Applications, 2010, 2, 55-56 doi:10.4236/jilsa.2010.22008 Published Online May 2010 (http://www.SciRP.org/journal/jilsa) Copyright © 2010 SciRes. JILSA Editorial: Special Section on Reinforcement Learning and Approximate Dynamic Programming Approximate dynamic programming (ADP) is to com- pute near-optimal solutions to Markov decision problems (MDPs) with large or continuous spaces. In recent years, the research works on ADP have been brought together with the reinforcement learning (RL) community [1-4]. RL is a machine learning framework for solving sequen- tial decision making problems that can also be modeled as the MDP formalism. The common objective of RL and ADP is to develop efficient algorithms for sequential decision making under uncertain complex conditions. Therefore, there are many potential applications of RL and ADP in real-world problems such as autonomous robots, intelligent control, resource allocation, network routing, etc. This special section of JILSA focuses on key research problems emerging at the junction of RL and ADP. After a rigorous reviewing process, three papers were accepted for publication in this special section. The first paper by M. A. Wiering [5] focuses on the applications of reinforcement learning with value func- tion approximation in game playing. In the paper, three different schemes were studied for learning to play Back- gammon with temporal difference learning. The three training schemes include: 1) self-play, 2) playing against an expert program, and 3) viewing experts play against each other. Extensive experimental results using tempo- ral difference methods with neural networks were pro- vided to compare the three learning schemes. It was il- lustrated that the drawback of learning from experts is that the learning program has few chances for explora- tion. The results also indicate that observing an expert play is the worst method and learning by playing against an expert seems to be the best strategy. The second paper by J. H. Zaragoza, and E. F. Morales [6] proposed a relational reinforcement learning approach with continuous actions, called TS-RRLCA, which is based on the combination of behavioral cloning and locally weighted regression. The TS-RRLCA ap- proach includes two main stages to learn continuous ac- tion policy for robots in partially known environments. The first stage is to develop a relational representation of robot states and actions and the rQ-learning algorithm is applied with behavioral cloning so that optimized control policies with discrete actions can be obtained efficiently. In the second stage, the learned policy is transformed into a relational policy with continuous actions through a Locally Weighted Regression (LWR) process. The pro- posed method was successfully applied to a simulated and a real service robot for navigation and following tasks with different conditions. The combination of reinforcement learning or ap- proximate dynamic programming with learning from demonstration is studied in the third paper [7]. A learn- ing strategy was proposed to generate a control field for a mobile robot in an unknown and uncertain environment, which integrates learning, generalization, and explora- tion into a unified architecture. Some Simulation results were provided to evaluate the performance of the pro- posed method. Although RL and ADP provide efficient ways for de- veloping machine intelligence in a trial-and-error manner, the incorporation of human intelligence is important for the successful applications of RL and ADP. In this spe- cial section on RL and ADP, all the three papers studied the relationships between machine intelligence and hu- man intelligence in different aspects. The results of the first paper demonstrate that an expert program for game playing will be very helpful to develop computer pro- grams using RL [5]. The usage of relational RL to in- corporate human examples was investigated in the sec- ond paper [6]. In the third paper [7], the method of learning from human demonstration was employed to generate initial control field for an autonomous mobile robots. Therefore, the results in this special section will be good references for future research in related topics. At last, I would like to thank all of the authors and reviewers who have made contributions to this special section. Xin Xu Editor-in-Chief, JILSA REFERENCES [1] F. Y. Wang, H. G. Zhang and D. R. Liu, “Adaptive Dy- namic Programming: An Introduction,” IEEE Computa- tional Intelligence Magazine, May 2009, pp. 39-47. [2] W. B. Powell, “Approximate Dynamic Programming: Solving the Curses of Dimensionality,” Wiley, Princeton, Editorial: Special Section on Reinforcement Learning and Approximate Dynamic Programming 56 NJ, 2007. [3] R. S. Sutton and A. G. Barto, “Reinforcement Learning: an Introduction,” MIT Press, Cambridge, MA, 1998. [4] D. P. Bertsekas and J. N. Tsitsiklis, “Neuro-Dynamic Programming. Belmont,” Athena Scientific, MA, 1996. [5] M. A. Wiering, “Self-play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learn- ing,” Journal of Intelligent Learning Systems and Appli- cations, Vol. 2, 2010, pp. 55-66. [6] J. H. Zaragoza and E. F. Morales, “Relational Rein- forcement Learning with Continuous Actions by Com- bining Behavioural Cloning and Locally Weighted Re- gression,” Journal of Intelligent Learning Systems and Applications, Vol. 2, 2010, pp. 67-77. [7] D. Goswami and P. Jiang, “Experience Based Learning Controller,” Journal of Intelligent Learning Systems and Applications, Vol. 2, 2010, pp. 78-83. Copyright © 2010 SciRes. JILSA |