Open Access iconOpen Access

ARTICLE

crossmark

Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning

by Yameng Yin1, Lieping Zhang2,*, Xiaoxu Shi1, Yilin Wang3, Jiansheng Peng4, Jianchu Zou4

1 Key Laboratory of Advanced Manufacturing and Automation Technology, Guilin University of Technology, Education Department of Guangxi Zhuang Autonomous Region, Guilin, 541006, China
2 Guangxi Key Laboratory of Special Engineering Equipment and Control, Guilin University of Aerospace Technology, Guilin, 541004, China
3 Guilin Mingfu Robot Technology Company Limited, Guilin, 541199, China
4 Key Laboratory of AI and Information Processing, Education Department of Guangxi Zhuang Autonomous Region, Hechi University, Yizhou, 546300, China

* Corresponding Author: Lieping Zhang. Email: email

Computers, Materials & Continua 2024, 81(2), 2769-2790. https://doi.org/10.32604/cmc.2024.056791

Abstract

By integrating deep neural networks with reinforcement learning, the Double Deep Q Network (DDQN) algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots. However, the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data. Targeting those problems, an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed. First, to enhance the precision of the target Q-value, the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network. Next, a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information. Additionally, a reward-prioritized experience selection method is introduced, which ranks experience samples according to reward values to ensure frequent utilization of high-quality data. Finally, simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments. The experimental results show that compared to the traditional DDQN algorithm, the proposed algorithm achieves shorter average running time, higher average return and fewer average steps. The performance of the proposed algorithm is improved by 11.43% in the fixed scenario and 8.33% in random environments. It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning, making it suitable for widespread application in autonomous navigation and industrial automation.

Keywords


Cite This Article

APA Style
Yin, Y., Zhang, L., Shi, X., Wang, Y., Peng, J. et al. (2024). Improved double deep Q network algorithm based on average q-value estimation and reward redistribution for robot path planning. Computers, Materials & Continua, 81(2), 2769-2790. https://doi.org/10.32604/cmc.2024.056791
Vancouver Style
Yin Y, Zhang L, Shi X, Wang Y, Peng J, Zou J. Improved double deep Q network algorithm based on average q-value estimation and reward redistribution for robot path planning. Comput Mater Contin. 2024;81(2):2769-2790 https://doi.org/10.32604/cmc.2024.056791
IEEE Style
Y. Yin, L. Zhang, X. Shi, Y. Wang, J. Peng, and J. Zou, “Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning,” Comput. Mater. Contin., vol. 81, no. 2, pp. 2769-2790, 2024. https://doi.org/10.32604/cmc.2024.056791



cc Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 385

    View

  • 168

    Download

  • 0

    Like

Share Link