TY - GEN
T1 - Multi-modal Feedback for Affordance-driven Interactive Reinforcement Learning
AU - Cruz, Francisco
AU - Parisi, German I.
AU - Wermter, Stefan
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/10
Y1 - 2018/10/10
N2 - Interactive reinforcement learning (IRL) extends traditional reinforcement learning (RL) by allowing an agent to interact with parent-like trainers during a task. In this paper, we present an IRL approach using dynamic audio-visual input in terms of vocal commands and hand gestures as feedback. Our architecture integrates multi-modal information to provide robust commands from multiple sensory cues along with a confidence value indicating the trustworthiness of the feedback. The integration process also considers the case in which the two modalities convey incongruent information. Additionally, we modulate the influence of sensory-driven feedback in the IRL task using goal-oriented knowledge in terms of contextual affordances. We implement a neural network architecture to predict the effect of performed actions with different objects to avoid failed-states, i.e., states from which it is not possible to accomplish the task. In our experimental setup, we explore the interplay of multi-modal feedback and task-specific affordances in a robot cleaning scenario. We compare the learning performance of the agent under four different conditions: Traditional RL, multi-modal IRL, and each of these two setups with the use of contextual affordances. Our experiments show that the best performance is obtained by using audio-visual feedback with affordance-modulated IRL. The obtained results demonstrate the importance of multi-modal sensory processing integrated with goal-oriented knowledge in IRL tasks.
AB - Interactive reinforcement learning (IRL) extends traditional reinforcement learning (RL) by allowing an agent to interact with parent-like trainers during a task. In this paper, we present an IRL approach using dynamic audio-visual input in terms of vocal commands and hand gestures as feedback. Our architecture integrates multi-modal information to provide robust commands from multiple sensory cues along with a confidence value indicating the trustworthiness of the feedback. The integration process also considers the case in which the two modalities convey incongruent information. Additionally, we modulate the influence of sensory-driven feedback in the IRL task using goal-oriented knowledge in terms of contextual affordances. We implement a neural network architecture to predict the effect of performed actions with different objects to avoid failed-states, i.e., states from which it is not possible to accomplish the task. In our experimental setup, we explore the interplay of multi-modal feedback and task-specific affordances in a robot cleaning scenario. We compare the learning performance of the agent under four different conditions: Traditional RL, multi-modal IRL, and each of these two setups with the use of contextual affordances. Our experiments show that the best performance is obtained by using audio-visual feedback with affordance-modulated IRL. The obtained results demonstrate the importance of multi-modal sensory processing integrated with goal-oriented knowledge in IRL tasks.
UR - https://www.scopus.com/pages/publications/85056510907
U2 - 10.1109/IJCNN.2018.8489237
DO - 10.1109/IJCNN.2018.8489237
M3 - Conference contribution
AN - SCOPUS:85056510907
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2018 International Joint Conference on Neural Networks, IJCNN 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 International Joint Conference on Neural Networks, IJCNN 2018
Y2 - 8 July 2018 through 13 July 2018
ER -