Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning

Yameng Yin; Lieping Zhang; Xiaoxu Shi; Yilin Wang; Jiansheng Peng; Jianchu Zou

doi:10.32604/cmc.2024.056791

Open Access icon Open Access

ARTICLE

Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning

Yameng Yin¹, Lieping Zhang^2,*, Xiaoxu Shi¹, Yilin Wang³, Jiansheng Peng⁴, Jianchu Zou⁴

1 Key Laboratory of Advanced Manufacturing and Automation Technology, Guilin University of Technology, Education Department of Guangxi Zhuang Autonomous Region, Guilin, 541006, China
2 Guangxi Key Laboratory of Special Engineering Equipment and Control, Guilin University of Aerospace Technology, Guilin, 541004, China
3 Guilin Mingfu Robot Technology Company Limited, Guilin, 541199, China
4 Key Laboratory of AI and Information Processing, Education Department of Guangxi Zhuang Autonomous Region, Hechi University, Yizhou, 546300, China

* Corresponding Author: Lieping Zhang. Email: email

Computers, Materials & Continua 2024, 81(2), 2769-2790. https://doi.org/10.32604/cmc.2024.056791

Received 31 July 2024; Accepted 30 September 2024; Issue published 18 November 2024

Abstract

By integrating deep neural networks with reinforcement learning, the Double Deep Q Network (DDQN) algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots. However, the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data. Targeting those problems, an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed. First, to enhance the precision of the target Q-value, the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network. Next, a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information. Additionally, a reward-prioritized experience selection method is introduced, which ranks experience samples according to reward values to ensure frequent utilization of high-quality data. Finally, simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments. The experimental results show that compared to the traditional DDQN algorithm, the proposed algorithm achieves shorter average running time, higher average return and fewer average steps. The performance of the proposed algorithm is improved by 11.43% in the fixed scenario and 8.33% in random environments. It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning, making it suitable for widespread application in autonomous navigation and industrial automation.

Keywords

Double Deep Q Network; path planning; average Q-value estimation; reward redistribution mechanism; reward-prioritized experience selection method

Cite This Article

APA Style

Yin, Y., Zhang, L., Shi, X., Wang, Y., Peng, J. et al. (2024). Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning. Computers, Materials & Continua, 81(2), 2769–2790. https://doi.org/10.32604/cmc.2024.056791

Vancouver Style

Yin Y, Zhang L, Shi X, Wang Y, Peng J, Zou J. Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning. Comput Mater Contin. 2024;81(2):2769–2790. https://doi.org/10.32604/cmc.2024.056791

IEEE Style

Y. Yin, L. Zhang, X. Shi, Y. Wang, J. Peng, and J. Zou, “Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning,” Comput. Mater. Contin., vol. 81, no. 2, pp. 2769–2790, 2024. https://doi.org/10.32604/cmc.2024.056791

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning

Abstract

Keywords

Cite This Article

688

310

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link