Hybrid least-squares methods for reinforcement learning
Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science), ISSN: 0302-9743, Vol: 2718, Page: 471-480
2003
- 1Usage
- 2Captures
Metric Options: CountsSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Metrics Details
- Usage1
- Abstract Views1
- Captures2
- Readers2
Conference Paper Description
Model-free Least-Squares Policy Iteration (LSPI) method has been successfully used for control problems in the context of reinforcement learning. LSPI is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. However it faces challenging issues in terms of the selection of basis functions and training sample. Inspired by orthogonal Least-Squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI is proposed in this paper. The suggested method uses simulation as a tool to guide the "feature configuration" process. The results on the learning control of Cart-Pole system illustrate the effectiveness of the presented method.
Bibliographic Details
http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=7044229319&origin=inward; http://dx.doi.org/10.1007/3-540-45034-3_47; http://link.springer.com/10.1007/3-540-45034-3_47; http://link.springer.com/content/pdf/10.1007/3-540-45034-3_47.pdf; https://scholarsmine.mst.edu/engman_syseng_facwork/1127; https://scholarsmine.mst.edu/cgi/viewcontent.cgi?article=2127&context=engman_syseng_facwork; https://dx.doi.org/10.1007/3-540-45034-3_47; https://link.springer.com/chapter/10.1007/3-540-45034-3_47; http://www.springerlink.com/index/10.1007/3-540-45034-3_47; http://www.springerlink.com/index/pdf/10.1007/3-540-45034-3_47
Springer Science and Business Media LLC
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know