TY - GEN
T1 - Boosting Reinforcement Learning Algorithms in Continuous Robotic Reaching Tasks Using Adaptive Potential Functions
AU - Chen, Yifei
AU - Schomaker, Lambert
AU - Cruz, Francisco
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - In reinforcement learning, reward shaping is an efficient way to augment the reward signal, so to guide the learning process of an agent. A well-known reward shaping framework is the potential-based reward shaping (PBRS) framework, which uses a so-called potential function to guarantee the policy invariance after reward shaping, to prevent undesirable behavior. Different from using a predefined potential function in many works, [3] proposed a novel adaptive potential function (APF) method to learn the potential function concurrently with the RL training from the agent’s training history. However, the APF method was only deployed and evaluated in small discrete environments. This paper bridges the gap by adapting the APF method in robotics, a typical continuous scenario. We apply the APF method with the Deep Deterministic Policy Gradient (DDPG) algorithm to form a new APF-DDPG algorithm. To evaluate our method, we deploy the APF-DDPG to control a Baxtor robot for a series of reaching tasks in both simulations and the real world. The experimental results show that the APF-DDPG algorithm significantly outperforms the baseline DDPG algorithm. The code is available at https://github.com/yfchenShirley/APF_DDPG.
AB - In reinforcement learning, reward shaping is an efficient way to augment the reward signal, so to guide the learning process of an agent. A well-known reward shaping framework is the potential-based reward shaping (PBRS) framework, which uses a so-called potential function to guarantee the policy invariance after reward shaping, to prevent undesirable behavior. Different from using a predefined potential function in many works, [3] proposed a novel adaptive potential function (APF) method to learn the potential function concurrently with the RL training from the agent’s training history. However, the APF method was only deployed and evaluated in small discrete environments. This paper bridges the gap by adapting the APF method in robotics, a typical continuous scenario. We apply the APF method with the Deep Deterministic Policy Gradient (DDPG) algorithm to form a new APF-DDPG algorithm. To evaluate our method, we deploy the APF-DDPG to control a Baxtor robot for a series of reaching tasks in both simulations and the real world. The experimental results show that the APF-DDPG algorithm significantly outperforms the baseline DDPG algorithm. The code is available at https://github.com/yfchenShirley/APF_DDPG.
KW - Reinforcement learning
KW - Reward shaping
KW - Robot tasks
UR - https://www.scopus.com/pages/publications/85210894563
U2 - 10.1007/978-981-96-0351-0_5
DO - 10.1007/978-981-96-0351-0_5
M3 - Conference contribution
AN - SCOPUS:85210894563
SN - 9789819603503
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 52
EP - 64
BT - AI 2024
A2 - Gong, Mingming
A2 - Song, Yiliao
A2 - Koh, Yun Sing
A2 - Xiang, Wei
A2 - Wang, Derui
PB - Springer Science and Business Media Deutschland GmbH
T2 - 37th Australasian Joint Conference on Artificial Intelligence, AJCAI 2024
Y2 - 25 November 2024 through 29 November 2024
ER -