TY - JOUR
T1 - Improving interactive reinforcement learning
T2 - What makes a good teacher?
AU - Cruz, Francisco
AU - Magg, Sven
AU - Nagai, Yukie
AU - Wermter, Stefan
N1 - Publisher Copyright:
© 2018, © 2018 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
PY - 2018/7/3
Y1 - 2018/7/3
N2 - Interactive reinforcement learning (IRL) has become an important apprenticeship approach to speed up convergence in classic reinforcement learning (RL) problems. In this regard, a variant of IRL is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using RL methods to afterward becoming an advisor for other learner-agents. In this work, we analyse internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behaviour in terms of the state visit frequency of the learner-agents. Moreover, we analyse system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.
AB - Interactive reinforcement learning (IRL) has become an important apprenticeship approach to speed up convergence in classic reinforcement learning (RL) problems. In this regard, a variant of IRL is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using RL methods to afterward becoming an advisor for other learner-agents. In this work, we analyse internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behaviour in terms of the state visit frequency of the learner-agents. Moreover, we analyse system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.
KW - artificial trainer-agent
KW - cleaning scenario
KW - Interactive reinforcement learning
KW - policy shape
UR - https://www.scopus.com/pages/publications/85042943876
U2 - 10.1080/09540091.2018.1443318
DO - 10.1080/09540091.2018.1443318
M3 - Article
AN - SCOPUS:85042943876
SN - 0954-0091
VL - 30
SP - 306
EP - 325
JO - Connection Science
JF - Connection Science
IS - 3
ER -