A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

Han Li; Tianding Chen; Hualiang Teng; Yingtao Jiang

doi:10.31614/cmes.2019.05807

Open Access icon Open Access

ARTICLE

A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

Han Li¹, Tianding Chen^{2, *}, Hualiang Teng³, Yingtao Jiang⁴

1 College of Mathematics, Physics and Electronic Information Engineering, Wenzhou University, Wenzhou, Zhejiang, 325035, China.
2 School of Physics and Information Engineering, Minnan Normal University, Zhangzhou, Fujian, 363000, China.
3 Department of Civil and Environmental Engineering, Howard R. Hughes College of Engineering, University of Nevada, Las Vegas, NV, 454015, USA.
4 Department of Electrical and Computer Engineering, Howard R. Hughes College of Engineering, University of Nevada, Las Vegas, NV, 454015, USA.

* Corresponding Author: Tianding Chen. Email: email .

Computer Modeling in Engineering & Sciences 2019, 118(2), 253-274. https://doi.org/10.31614/cmes.2019.05807

Download PDF

Abstract

In any classical value-based reinforcement learning method, an agent, despite of its continuous interactions with the environment, is yet unable to quickly generate a complete and independent description of the entire environment, leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks, namely exploration and exploitation. This problem becomes more pronounced when the agent has to deal with a dynamic environment, of which the configuration and/or parameters are constantly changing. In this paper, this problem is approached by first mapping a reinforcement learning scheme to a directed graph, and the set that contains all the states already explored shall continue to be exploited in the context of such a graph. We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process, and thus, there is no need to face the exploration vs. exploitation tradeoff as all the existing reinforcement learning methods do. Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment, which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper. The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes, rendering it an algorithm of choice in applications involving dynamic environments.

Keywords

Reinforcement learning, graph, exploration and exploitation, maze.

Cite This Article

APA Style

Li, H., Chen, T., Teng, H., Jiang, Y. (2019). A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation. Computer Modeling in Engineering & Sciences, 118(2), 253–274. https://doi.org/10.31614/cmes.2019.05807

Vancouver Style

Li H, Chen T, Teng H, Jiang Y. A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation. Comput Model Eng Sci. 2019;118(2):253–274. https://doi.org/10.31614/cmes.2019.05807

IEEE Style

H. Li, T. Chen, H. Teng, and Y. Jiang, “A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation,” Comput. Model. Eng. Sci., vol. 118, no. 2, pp. 253–274, 2019. https://doi.org/10.31614/cmes.2019.05807

BibTex EndNote RIS

Copyright © 2019 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

Abstract

Keywords

Cite This Article

5123

2757

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link