TY  - EJOU
AU  - Wei, Yifei 
AU  - Qu, Yinxiang 
AU  - Zhao, Min 
AU  - Zhang, Lianping 
AU  - Yu, F. Richard 

TI  - Resource Allocation and Power Control Policy for Device-toDevice Communication Using Multi-Agent Reinforcement  Learning
T2  - Computers, Materials \& Continua

PY  - 2020
VL  - 63
IS  - 3
SN  - 1546-2226

AB  - Device-to-Device (D2D) communication is a promising technology that can 
reduce the burden on cellular networks while increasing network capacity. In this paper, we 
focus on the channel resource allocation and power control to improve the system resource 
utilization and network throughput. Firstly, we treat each D2D pair as an independent agent. 
Each agent makes decisions based on the local channel states information observed by itself. 
The multi-agent Reinforcement Learning (RL) algorithm is proposed for our multi-user 
system. We assume that the D2D pair do not possess any information on the availability 
and quality of the resource block to be selected, so the problem is modeled as a stochastic 
non-cooperative game. Hence, each agent becomes a player and they make decisions 
together to achieve global optimization. Thereby, the multi-agent Q-learning algorithm 
based on game theory is established. Secondly, in order to accelerate the convergence rate 
of multi-agent Q-learning, we consider a power allocation strategy based on Fuzzy Cmeans (FCM) algorithm. The strategy firstly groups the D2D users by FCM, and treats 
each group as an agent, and then performs multi-agent Q-learning algorithm to determine 
the power for each group of D2D users. The simulation results show that the Q-learning 
algorithm based on multi-agent can improve the throughput of the system. In particular, 
FCM can greatly speed up the convergence of the multi-agent Q-learning algorithm while 
improving system throughput.
KW  - D2D communication
KW  -  resource allocation
KW  -  power control
KW  -  multi-agent
KW  -  Qlearning
KW  -  fuzzy C-means

DO  - 10.32604/cmc.2020.09130