Hybrid least-squares methods for reinforcement learning

Citation DataLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science), ISSN: 0302-9743, Vol: 2718, Page: 471-480

Publication Year2003

0
Citations
1
Usage
2
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Usage
1
- Abstract Views
  1
Captures
2
- Readers
  2

Conference Paper Description

Model-free Least-Squares Policy Iteration (LSPI) method has been successfully used for control problems in the context of reinforcement learning. LSPI is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. However it faces challenging issues in terms of the selection of basis functions and training sample. Inspired by orthogonal Least-Squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI is proposed in this paper. The suggested method uses simulation as a tool to guide the "feature configuration" process. The results on the learning control of Cart-Pole system illustrate the effectiveness of the presented method.

Bibliographic Details

DOI10.1007/3-540-45034-3_47

REPOSITORY URLhttps://scholarsmine.mst.edu/engman_syseng_facwork/1127

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=7044229319&origin=inward; http://dx.doi.org/10.1007/3-540-45034-3_47; http://link.springer.com/10.1007/3-540-45034-3_47; http://link.springer.com/content/pdf/10.1007/3-540-45034-3_47.pdf; https://scholarsmine.mst.edu/engman_syseng_facwork/1127; https://scholarsmine.mst.edu/cgi/viewcontent.cgi?article=2127&context=engman_syseng_facwork; https://dx.doi.org/10.1007/3-540-45034-3_47; https://link.springer.com/chapter/10.1007/3-540-45034-3_47; http://www.springerlink.com/index/10.1007/3-540-45034-3_47; http://www.springerlink.com/index/pdf/10.1007/3-540-45034-3_47

AUTHOR(S)

Hailin Li; Cihan H. Dagli

PUBLISHER(S)

Springer Science and Business Media LLC

TAG(S)

Mathematics; Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know