Reinforcement learning in continuous time and space: Interference and not ill-conditioning is the main problem when using distributed function approximators

Baddeley, Bartholomew

File(s) not publicly available

Reinforcement learning in continuous time and space: Interference and not ill-conditioning is the main problem when using distributed function approximators

journal contribution

posted on 2023-06-07, 23:02 authored by Bartholomew Baddeley

Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and in this instance, RL techniques require the use of function approximators for learning value functions and policies. Often, local linear models have been preferred over distributed nonlinear models for function approximation in RL. We suggest that one reason for the difficulties encountered when using distributed architectures in RL is the problem of negative interference, whereby learning of new data disrupts previously learned mappings. The continuous temporal difference (TD) learning algorithm TD(lambda) was used to learn a value function in a limited-torque pendulum swing-up task using a multilayer perceptron (MLP) network. Three different approaches were examined for learning in the MLP networks; 1) simple gradient descent; 2) vario-eta; and 3) a pseudopattern rehearsal strategy that attempts to reduce the effects of interference. Our results show that MLP networks can be used for value function approximation in this task but require long training times. We also found that vario-eta destabilized learning and resulted in a failure of the learning process to converge. Finally, we showed that the pseudopattern rehearsal strategy drastically improved the speed of learning. The results indicate that interference is a greater problem than ill conditioning for this task.

History

Publication status

Published

Journal

IEEE Transactions on Systems, Man, and Cybernetics, Part B

ISSN

10834419

Publisher

IEEE

Publisher URL

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=4567536&contentType=Journals+&+Magazines&searchField=Search_All&queryText=Reinforcement+learning+in+continuous+time+and+space

Issue

4

Volume

38

Page range

950-956

Department affiliated with

Informatics Publications

Full text available

No

Peer reviewed?

Yes

Legacy Posted Date

2012-02-06

Usage metrics

Keywords

Uncategorised value

Licence

Copyright not evaluated

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) not publicly available

Reinforcement learning in continuous time and space: Interference and not ill-conditioning is the main problem when using distributed function approximators

History

Publication status

Journal

ISSN

Publisher

Publisher URL

Issue

Volume

Page range

Department affiliated with

Full text available

Peer reviewed?

Legacy Posted Date

Usage metrics

Categories

Keywords

Licence

Exports