A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation
Han Li1, Tianding Chen2, *, Hualiang Teng3, Yingtao Jiang4
CMES-Computer Modeling in Engineering & Sciences, Vol.118, No.2, pp. 253-274, 2019, DOI:10.31614/cmes.2019.05807
Abstract In any classical value-based reinforcement learning method, an agent, despite of its continuous interactions with the environment, is yet unable to quickly generate a complete and independent description of the entire environment, leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks, namely exploration and exploitation. This problem becomes more pronounced when the agent has to deal with a dynamic environment, of which the configuration and/or parameters are constantly changing. In this paper, this problem is approached by first mapping a reinforcement learning scheme to a directed graph, and… More >