TY - GEN
T1 - Action Selection Methods in a Robotic Reinforcement Learning Scenario
AU - Cruz, Francisco
AU - Wuppen, Peter
AU - Fazrie, Alvin
AU - Weber, Cornelius
AU - Wermter, Stefan
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/23
Y1 - 2019/1/23
N2 - Reinforcement learning allows an agent to learn a new task while autonomously exploring its environment. For this aim, the agent chooses an action to perform among the available ones for a certain state. Nonetheless, a common problem for a reinforcement learning agent is to find a proper balance between exploration and exploitation of actions in order to achieve an optimal behavior. This paper compares multiple approaches to the exploration/exploitation dilemma in reinforcement learning and, moreover, it implements an exemplary reinforcement learning task within the domain of domestic robotics to show the performance of different exploration policies on it. We perform the domestic task using -greedy, softmax, VDBE, and VDBE-Softmax with online and offline temporal-difference learning. The obtained results show that the agent is able to collect larger and faster reward by using the VDBE-Softmax exploration strategy with both Q-learning and SARSA.
AB - Reinforcement learning allows an agent to learn a new task while autonomously exploring its environment. For this aim, the agent chooses an action to perform among the available ones for a certain state. Nonetheless, a common problem for a reinforcement learning agent is to find a proper balance between exploration and exploitation of actions in order to achieve an optimal behavior. This paper compares multiple approaches to the exploration/exploitation dilemma in reinforcement learning and, moreover, it implements an exemplary reinforcement learning task within the domain of domestic robotics to show the performance of different exploration policies on it. We perform the domestic task using -greedy, softmax, VDBE, and VDBE-Softmax with online and offline temporal-difference learning. The obtained results show that the agent is able to collect larger and faster reward by using the VDBE-Softmax exploration strategy with both Q-learning and SARSA.
UR - https://www.scopus.com/pages/publications/85062506336
U2 - 10.1109/LA-CCI.2018.8625243
DO - 10.1109/LA-CCI.2018.8625243
M3 - Conference contribution
AN - SCOPUS:85062506336
T3 - 2018 IEEE Latin American Conference on Computational Intelligence, LA-CCI 2018
BT - 2018 IEEE Latin American Conference on Computational Intelligence, LA-CCI 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE Latin American Conference on Computational Intelligence, LA-CCI 2018
Y2 - 6 November 2018 through 9 November 2018
ER -