PlumX Metrics
Embed PlumX Metrics

Hybrid least-squares methods for reinforcement learning

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science), ISSN: 0302-9743, Vol: 2718, Page: 471-480
2003
  • 0
    Citations
  • 1
    Usage
  • 2
    Captures
  • 0
    Mentions
  • 0
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

Conference Paper Description

Model-free Least-Squares Policy Iteration (LSPI) method has been successfully used for control problems in the context of reinforcement learning. LSPI is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. However it faces challenging issues in terms of the selection of basis functions and training sample. Inspired by orthogonal Least-Squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI is proposed in this paper. The suggested method uses simulation as a tool to guide the "feature configuration" process. The results on the learning control of Cart-Pole system illustrate the effectiveness of the presented method.

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know