Improving Proximal Policy Optimization Algorithm in Interactive Multi-Agent Systems

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Proximal Policy Optimization (PPO), as an outstanding Reinforcement learning (RL) algorithm, has proven its efficiency when solving a wide range of problems. Compared to other reinforcement learning algorithms, it has the advantage of advanced stability and reliability. However, as an on-policy algorithm, it suffers from the problem of sample inefficiency and moderate training speed. In this paper, we utilize two methods, namely, share parameter and share trajectory to speed up the training process of the PPO algorithm. Moreover, we introduce a method that uses the adaptive blending concept to prevent unnecessary updates during the parameter-sharing process. We also introduce the technique of possibility for selection, along with the thresholding method to balance the exploitation and exploration when incorporating the trajectory-sharing method. Tests performed under a multi-agent environment setup show both methods converge significantly faster in comparison to the training process of the traditional PPO algorithm.

Idioma originalInglés
Título de la publicación alojada2024 IEEE International Conference on Development and Learning, ICDL 2024
EditorialInstitute of Electrical and Electronics Engineers Inc.
ISBN (versión digital)9798350348552
DOI
EstadoPublicada - 2024
Evento2024 IEEE International Conference on Development and Learning, ICDL 2024 - Austin, Estados Unidos
Duración: 20 may. 202423 may. 2024

Serie de la publicación

Nombre2024 IEEE International Conference on Development and Learning, ICDL 2024

Conferencia

Conferencia2024 IEEE International Conference on Development and Learning, ICDL 2024
País/TerritorioEstados Unidos
CiudadAustin
Período20/05/2423/05/24

Huella

Profundice en los temas de investigación de 'Improving Proximal Policy Optimization Algorithm in Interactive Multi-Agent Systems'. En conjunto forman una huella única.

Citar esto