Deep Reinforcement Learning Empowered Edge Collaborative Caching Scheme for Internet of Vehicles

Xin Liu; Siya Xu; Chao Yang; Zhili Wang; Hao Zhang; Jingye Chi; Qinghan Li

doi:10.32604/csse.2022.022103

Loading [MathJax]/jax/output/CommonHTML/jax.js

[BACK]

Computer Systems Science & Engineering DOI:10.32604/csse.2022.022103
Article

Deep Reinforcement Learning Empowered Edge Collaborative Caching Scheme for Internet of Vehicles

Xin Liu1, Siya Xu1, Chao Yang2, Zhili Wang1,*, Hao Zhang3, Jingye Chi1 and Qinghan Li4

1State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100876, China
2Information and Communication Branch, State Grid Liaoning Electric Power Co., Ltd., Shenyang, 110000, China
3China Global Energy Interconnection Research Institute, Nanjing, 210000, China
4Syracuse University, New York, 13244, USA
*Corresponding Author: Zhili Wang. Email: zlwang@bupt.edu.cn
Received: 27 July 2021; Accepted: 30 August 2021

Abstract: With the development of internet of vehicles, the traditional centralized content caching mode transmits content through the core network, which causes a large delay and cannot meet the demands for delay-sensitive services. To solve these problems, on basis of vehicle caching network, we propose an edge collaborative caching scheme. Road side unit (RSU) and mobile edge computing (MEC) are used to collect vehicle information, predict and cache popular content, thereby provide low-latency content delivery services. However, the storage capacity of a single RSU severely limits the edge caching performance and cannot handle intensive content requests at the same time. Through content sharing, collaborative caching can relieve the storage burden on caching servers. Therefore, we integrate RSU and collaborative caching to build a MEC-assisted vehicle edge collaborative caching (MVECC) scheme, so as to realize the collaborative caching among cloud, edge and vehicle. MVECC uses deep reinforcement learning to predict what needs to be cached on RSU, which enables RSUs to cache more popular content. In addition, MVECC also introduces a mobility-aware caching replacement scheme at the edge network to reduce redundant cache and improving cache efficiency, which allows RSU to dynamically replace the cached content in response to the mobility of vehicles. The simulation results show that the proposed MVECC scheme can improve cache performance in terms of energy cost and content hit rate.

Keywords: Internet of vehicles; vehicle caching network; collaborative caching; caching replacement; deep reinforcement learning

1 Introduction

With the development of internet of vehicles, the content demand of mobile vehicles has increased rapidly. Artificial intelligence-driven vehicles need to constantly learn their surrounding environment and make instant decisions. Vehicles can be regarded as mobile devices to collect and process environmental data and support various information services. The development of Internet of Vehicles (IoV) and wireless technology provides a way to improve the driving experience [1]. Due to high latency and network energy consumption [2], traditional mobile cloud computing is not suitable for latency-sensitive applications [3,4], which affects the experience of vehicle users.

To solve the problems of high latency and energy consumption in IoV, road side units (RSU) are used to help collect vehicle data and provide vehicle-based information services, mobile edge computing (MEC) is used to calculate content popularity [5–7]. Caching popular content in RSU can bring valuable data, thereby reducing access latency and network energy consumption. However, the caching performance is limited by the RSU storage capacity. The storage capacity of single RSU is not enough to support the intensive content requests in IoV, which leads to an unbalanced and redundant cache in adjacent RSUs [8,9]. In addition, due to the high-speed mobility, the vehicle will quickly pass through its coverage region when sending a request to RSU, which means that the RSU cached content is easily out of date [10,11]. In order to improve the caching efficiency, the caching scheme should respond to the vehicle mobility and enable the RSU to selectively replace part of the cached content.

Therefore, an intelligent collaborative caching mechanism is needed to improve caching efficiency. Considering that collaborative caching can enable the cached content to be shared between RSUs, the implementation of collaborative caching at the edge of IoV can alleviate caching load of a single RSU. In addition, vehicle-to-vehicle (V2V) communication can further shorten the transmission distance and reduce the transmission delay. Therefore, we integrate MEC, collaborative caching and V2V caching to realize collaborative caching among cloud, RSU and vehicle.

However, the complex and dynamically changing IoV network environment makes it difficult to determine what content should be cached on RSU. Moreover, the caching optimization problem is a long-term mixed integer linear programming (LT-MILP), which has been proven to be NP-hard [12]. Through neural networks to continuously interact with the environment, deep reinforcement learning (DRL) has excellent performance in dealing with complex dynamic environments.

For this reason, we propose a MEC-assisted vehicle edge collaborative caching (MVECC) scheme based on the vehicle caching network. MVECC uses DRL to predict content popularity and cache popular content in RSUs. Each RSU shares the cached content to achieve collaborative caching. In MVECC, RSUs collect and upload the learning data to macro base station (MBS). Then MBS acts as a DRL agent to determine content caching and delivery schemes. In addition, we have also designed a mobility-aware caching replacement algorithm, RSU can selectively replace and update the cached content according to vehicle mobility. Our contributions can be summarized as follows:

• We build a MVECC scheme. In this scheme, we make full use of the caching capability of RSU and smart vehicles and design two content caching modes, including RSU caching and assisted caching vehicle (ACV) caching. In addition, MVECC includes four content delivery modes, namely content center server (CCS) delivery, RSU direct delivery, RSU collaborative indirect delivery and ACV assisted delivery. MVECC can maximize performance through the coordination of multiple caching and delivery modes.

• We propose a content caching algorithm based on deep deterministic policy gradient (DDPG) and a mobility-aware caching replacement algorithm. Through the learning process of DRL, MVECC continuously adapt to the dynamic changes of IoV network and determine caching scheme based on content popularity. We use Markov chains to simulate state changes. The content transmission energy consumption is regarded as a criterion for supervising the learning process. Agent decide the caching and delivery scheme based on historical experience. At the same time, caching replacement enables RSUs to replace the cached content in response to vehicle mobility, which ensures that RSUs do not store outdated and unpopular content.

2 Related Works

In order to solve the caching optimization problem, many excellent works have studied mobile edge computing and caching methods in wireless networks. In terms of caching allocation, Huang et al. [13,14] considered the best combination of multiple transmissions. Campolo et al. [15] evaluated a solution that can reduce V2R link gap and improve the utilization of wireless channel. Kwon et al. [16] encoded the network and realizes data transmission between multiple RSUs. Mei et al. [17] optimized the radio resources and coding scheme of V2V communication to reduce transmission delay.

In addition, based on the similarity and population of the user community in the V2V communication scenario, Zhao et al. [18] proposed an effective caching scheme to improve the content hit rate. Ding et al. [19] proposed a cache-assisted data transmission scheme, which uses a large number of cache servers deployed on the roadside to support vehicle applications. To solve the data loss caused by vehicle mobility, Abdelhamid et al. [20,21] proposed a vehicle-assisted content caching scheme to distribute content at each exit and entrance of road segment.

Above works solves the cache optimization problem in IoV with the help of edge caching, the cache efficiency of these schemes in processing intensive service requests is still limited by the storage capacity of RSU. Specifically, adjacent RSUs tend to cache the same content, resulting in redundant caching. Through the collaboration between RSUs, MBS can make cache decisions for each RSU from a global perspective and effectively avoid redundant cache.

Considering the randomness of content popularity and IoV network state, it is very complicated to make effective caching decisions in IoV network. As an intelligent learning solution, DRL can effectively overcome these problems and find an optimal caching scheme. There have been some works focusing on DRL-based caching solutions. To solve the problem of continuously acting variables in Markov decision process (MDP) model, Li et al. [22] used a deterministic strategy gradient learning algorithm to provide the optimal resource pricing strategy. Sadeghi et al. [23] introduced the concept of global content popularity and proposed a novel RL-based content caching scheme to implement the best caching strategy.

images

For reader convenience, the above references are presented in Tab. 1. Although several excellent caching schemes have been proposed to improve resource utilization and vehicle user experience. But most of these studies ignore the high-speed mobility of vehicle. After the vehicle moves, the RSU passing by the vehicle still caches the requested content, which may cause a waste of storage resources. To minimize the energy cost of vehicle caching network, this article introduce an artificial intelligence-assisted caching scheme, which allows multiple caching and delivery modes to coexist, and obtains the best caching and delivery by applying DRL. The simulation results show that we have proposed an effective and feasible reference scheme for vehicle caching network.

3 System Model

3.1 Network Model

In this section, we propose a DRL-based MVECC scheme. As shown in Fig. 1, the MVECC scheme includes a content center server (CCS), a macro base station (MBS), multiple RSUs and assisted caching vehicles (ACVs). We define RSU and ACV as caching nodes, suppose set $K={0,1,2,…,K}$ , $E={1,2,…,E}$ denote RSU and ACV respectively (where k = 0 denotes CCS), $M={1,2,…,M}$ represents mobile request vehicle (MRV), which sends out a content request. Let $F={1,…,F}$ . represent all content set, CCS stores all content files and can always satisfy any content request from MRV. RSUs are connected by optical fiber to realize the collaborative caching between cloud, edge and vehicles.

images

Figure 1: System model

The proposed MVECC scheme uses DRL agents to monitor environment and extract vehicle and content features to estimate vehicle mobility and content popularity. By using Deep Deterministic Policy Gradient (DDPG) algorithm and caching replacement scheme, content can be cached and delivered more efficiently at edge network. DDPG algorithm are calculated and executed on MBS, the content is cached on RSU and ACV. Fig. 1 shows the proposed scheme, where on the left is a DRL module, responsible for determining the best caching and delivery decision. On the right is a MVECC network, where the content is cached from CCS to RSU and ACV, and finally delivered to MRV.

In MVECC, MBS is acted as a DRL agent to monitor the network environment, DDPG are calculated and executed on RSU. In content caching process, caching nodes caches content from CCS according to content popularity; in content delivery process, MRV requests content, MBS selects the optimal delivery path to complete content delivery with lower system energy consumption. Specifically, if the requested content is cached in MRV itself, the content can be obtained without any system energy consumption. Otherwise, the content will be obtained from the communication range, such as obtaining content from ACV or RSU in communication region of MRV. When neither RSU nor ACV within MRV communication range has cached the required content, it will consider forwarding the cached content from adjacent RSU to implement the delivery process. When the cache nodes cannot deliver the cached content, the content will be obtained from CCS.

3.2 Content Caching and Delivery Model

Considering that the content popularity and location of MRV/ACV are time-varying and uncertain, the time-varying scale of content popularity update is much larger than MRV/ACV location changes. Therefore, the content caching process and delivery process should be constructed based on different time scales. As shown in Fig. 2, the large time scale is content caching process. Let $X={1,…,X}$ be the set of content caching time slot, $tx$ represents the $xth$ caching time slot, the content caching decision is made at the beginning of caching time slot. The small time scale is content delivery process, divide each $tx$ time slot into $y$ small time slots. Let $Y={1,…,Y}$ be the set of content delivery time slot, $txy$ represents the $yth$ delivery time slot in $xth$ caching time slot, content delivery decision is decided at delivery time slot.

images

Figure 2: Content caching and delivery time slot

We define caching matrix $α(tx)∈{0,1}$ to represent the content cache state on caching nodes at $tx$ time slot. $α(tx)[e][f]=1$ denotes content $f$ is cached on $ACVe$ , $α(tx)[k][f]=1$ denotes content $f$ is cached on $RSUk$ . Define the delivery matrix $β(txy)∈{0,1}$ , which represents the state of the caching nodes processing MRV content requests at $txy$ time slot. $β(txy)[m]=e$ and $β(txy)[m]=k$ denotes at $txy$ time slot, the content requested by $MRVm$ is delivered by $ACVe$ and $RSUk$ , respectively. Therefore, at $tx$ and $txy$ time slot, making content caching and delivery decisions is equivalent to update $α(tx)$ and $β(txy)$ , respectively.

To improve content delivery efficiency, we also propose a RSU collaborative content delivery model. Specifically, RSUs are linked by optical fiber links, they share the cached content with each other and perform content delivery process in a collaborative manner. For example, as shown in Fig. 1, $RSU4$ cannot perform the content delivery process because the content $f$ requested by MRV is not cached. Based on the proposed collaborative model, when $RSU2$ has cached content $f$ , then $f$ can be forwarded from $RSU2$ to $RSU4$ . At last, $f$ is delivered to MRV by $RSU4$ . By using the collaborative caching model, although additional content forwarding costs will be incurred, the number of times that MRV obtains content through backhaul link will be greatly reduced, which can effectively reduce content delivery energy consumption.

In summary, based on our proposed RSU collaborative content delivery model, there are four content delivery models for MRV content requests:

1. RSU direct delivery: as shown in step 1 of Fig. 1, the directly connected $RSU1$ has cached the requested content $f$ , $RSU1$ directly delivers content $f$ to MRV;

2. RSU indirect delivery: as shown in steps 2–3 of Fig. 1, the directly connected $RSU2$ has not cached content $f$ , while $RSU3$ caches content $f$ . At this time slot, $RSU2$ can get content by forwarding content $f$ from $RSU3$ , and then, deliver the content $f$ to MRV;

3. ACV assisted delivery: as shown in steps 4 of Fig. 1, in the communication range of MRV, ACV has cached content $f$ , ACV assisted delivers content $f$ to MRV;

4. CCS delivery: all caching nodes (RSU and ACV) cannot delivery content $f$ to MRV, the content $f$ is delivered by CCV to MRV.

3.3 Content Popularity Model

Most of the current work assumes that the popularity of content follows the Zipf distribution in mobile social networks. Since the MRV may pass through multiple RSU regions during the content delivery process, this assumption may not always be applicable in IoV network, and distribution information of content popularity cannot correctly reflect the real-time content requirements of vehicles.

To provide a reasonable content popularity model, we define a global content popularity based on content request probability. RSU collects the content request information of all MRV and ACV in the region, calculates and updates the content popularity at the beginning of $tx$ time slot. Let $pf$ denote the popularity of content $f$ , $sf$ denote the size of content $f$ , $df$ denote the maximum acceptable delay to obtain content $f$ . The popularity of content $f$ at $tx$ time slot is:

$pf(tx)=∑m=1Mqm,fxM,f∈F$ (1)

where $qm,fx∈{0,1}$ denotes $MRVm$ requested state for content $f$ at $tx$ time slot. $qm,fx=1$ denotes $MRVm$ request content $f$ at $tx$ time slot. According to content popularity $pf$ , MBS updates $α(tx)$ and makes content caching decisions. And then, a caching request is sent to CCS, the requested content will be download and cached on RSU or ACV.

3.4 Communication Model

In this section, we use the communication connection state between caching nodes (RSU, ACV) and MRV, as well as the communication connection state between cache nodes $RSUk$ and $RSUk′$ to express the path that cache nodes obtain content.

Let $lm,i,f(txy),i∈E∪K$ represent connection state between $MRVm$ and caching node $i$ at $txy$ time slot, $lm,i,f(txy)=1$ represent $MRVm$ get content $f$ from caching node $i$ . Let $hm,k,k′,f(txy)$ represent the connection state between caching node $RSUk$ and $RSUk′$ at $txy$ time slot. $hm,k,k′,f(txy)=1$ and $k,k′∈K$ represent connected, that is a collaborative delivery content process. $hm,k,k′,f(txy)=0$ and $k,k′∈K$ represent unconnected, that is delivery content without collaborative process.

3.5 Channel Transmission Model

In this section, based on the real mobile network scenario, we analyze the link state between MRV and caching nodes and build an energy consumption model.

The signal-to-noise ratio (SNR) between $MRVm$ and caching node $i$ at $txy$ time slot can be formulated as:

$SNRm,i(txy)=pm,i(txy)gm,i(txy)ξm,i(txy)dm,i(txy)kσm,i(txy)2,m∈M∪E,i∈E∪K$ (2)

where $pm,i(txy)$ and $dm,i(txy)$ represent transmission power and distance between $MRVm$ and caching node $i$ , respectively. $gm,i(txy)$ is the antenna gain, $ξm,i(txy)$ and $k$ are path loss at and path loss exponent, respectively. $σm,i(txy)$ is the additive white Gaussian noise.

Let $W0mbs$ , $Wkmbs$ and $Wembs$ represent the available bandwidth resources between CCS and MRV, RSU and ACV respectively. $wm,i(txy)m∈M,i∈E∪K$ represents the bandwidth resource allocated by caching node $i$ to $MRVm$ at $txy$ time slot. $wk,k′(txy)k,k′∈K$ represents the bandwidth resources between $RSUk$ and $RSUk′$ in collaborative delivery process.

According to Shannon's formula, the data transmission rate when $MRVm$ get content from caching node $i$ can be formulated as:

$rm,i,f(txy)=lm,i,f(txy)wm,i(txy)log2(1+SNRm,i(txy)),m∈M,i∈E∪K,f∈F$ (3)

4 Problem Formulation

In this section, we transform the joint optimization problem of content caching and delivery into MDP. The behavior of MRV request content is modeled as Markov chain, in which each vehicle changes state with probability. The basic elements of characterizing MDP are: state set $S$ , action set $A$ , and reward $R$ . The basic elements are defined below.

4.1 System State

RSU gets all MRVs content request information at $tx$ time slot, including MRV location $MRVm(Xm,Ym)$ , content popularity $pf$ , content size $sf$ and maximum access delay $df$ . Therefore, the system state includes the following parts:

$f(txy)$ : content requested by MRV at $txy$ time slot, $f(txy)∈F$ .

$z(txy)$ : location of MRV and ACV at $txy$ time slot.

$p(tx)$ : content popularity at $tx$ time slot.

$o(tx)$ : content delivery deadline, $o(tx)≤df$

$α(tx)$ : content cache state at $tx$ time slot.

The system state set $stxy$ can be formulated as:

$stxy={[f(txy)],[z(txy)],[p(tx)],[o(tx)],[α(tx)]}$ (4)

4.2 System Action

At content caching time slot, after receiving the content request from MRV, MBS calculates content popularity, and then decides which content to cache in caching node. At content delivery time slot, MBS selects the best content delivery path to reduce the content delivery cost. Therefore, the action space contains the following parts:

$α(tx)$ : content cache action at $tx$ time slot

$l(txy)$ : MRV get content $f$ from caching node

$h(txy)$ : RSU collaborative action with RSU' when delivering content

The system action space $atxy$ can be formulated as:

$atxy={[α(tx)],[l(txy)],[h(txy)],}$ (5)

4.3 Content Caching and Delivery Energy Consumption

4.3.1 Content Caching Energy Consumption

On the basis of the previous section, we can calculate that the delay of $RSUk$ get content $f$ from CCS at $tx$ time slot is:

$tk,0,f(txy)=sfrk,f∈F,k∈K$ (6)

the delay of $MCEe$ get content $f$ from CCS is:

$te,0,f(txy)=Sfre,f∈F,e∈E$ (7)

The updated content size on caching node $i$ at $tx$ time slot can be expressed as:

$ΔSi(tx)=∑f=1F(sf−ξ{αi(tx)[i][f]&αi(tx−1)[i][f]}),i∈E∪K$ (8)

where $ξ$ is the same content size on the caching node at $tx$ and $tx−1$ time slot, $sf$ is the content size, $r$ is the data transmission rate.

The content caching cost is the energy consumption cost when transmitting updated content from backhaul link. It can be formulated as the product of transmission time and transmission power.

Therefore, according to Eqs. (6)–(8), the energy consumption of caching node $RSUk$ and $ACVe$ are as follows:

$c1(tx)=λ1∑f=1F(sf−ξ{αk(tx)[k][f]&αk(tx−1)[k][f]})ripk$ (9)

$c2(tx)=λ2∑f=1F(sf−ξ{αe(tx)[e][f]&αe(tx−1)[e][f]})rm,i,f(txy)pe$ (10)

where $pk$ is the transmission power between $RSUk$ and CCS, $pe$ is the transmission power between $ACVe$ and CCS, $r$ is the data transmission rate, $λ1,λ2$ represent the weight factors on different cost.

4.3.2 Content Delivery Energy Consumption

On the basis of the previous section, we can calculate that the delay of $MRVm$ get content $f$ from caching node $i$ at $tx$ time slot is:

$tm,i,f(txy)=sfrm,i,f(txy)$ (11)

If the content is delivered by collaborative forwarding, the delay of forwarding content $f$ is:

$tk,k′,m,f(txy)=sfrk,k′(txy)$ (12)

where $sf$ is the content size, $r$ is the data transmission rate.

Therefore, according to Eqs. (11)–(12), the total energy consumption of content delivery process at $txy$ time slot is:

$c3(txy)=λ3∑i=1E+Ksfrm,i,f(txy)pm,i+λ4∑k=1Ksfrk,k′(txy)pk,k′$ (13)

where $pm,i$ is the transmission power between $MRVm$ and caching node $i$ , $pk,k′$ is the transmission power between $RSUk$ and $RSUk′$ , $λ3,λ4$ represent the weight factors on different cost functions.

4.3.3 Penalty Cost

If the whole content is not obtained before the delivery deadline, the penalty cost will be:

$c4(txy)=λ5pf(txy)dfs.t.tm,i,f(txy)+tk,k′,m,f(txy)>df$ (14)

where $pf$ is content popularity, $df$ is the maximum access delay, $λ5$ represent the weight factors.

4.3.4 Cost Function

According to Eqs. (6)–(14), the total system cost function at $txy$ time slot can be formulated as:

$c(txy)=∑k∈Kc1(tx)+∑e∈Ec2(txy)+∑m∈Mc3(txy)+c4(txy)=λ1∑k=1K∑f=1F(sf−ξ{αk(tx)[k][f] & αk(tx−1)[k][f]})pkri+λ2∑e=1E∑f=1F(sf−ξ{αe(tx)[e][f] & αe(tx−1)[e][f]})perm,i,f(txy)+λ3∑m=1M∑i=1E+Ksfrm,i,f(txy)pm,i+λ4∑m=1M∑k=1Ksfrk,k′(txy)pk,k′+λ5pf(txy)df$ (15)

where $λ1−λ5$ represent the weight factors on different cost functions.

4.3.5 Reward

After taking action $atxy$ , the system will get rewarded $rtxy$ . In the DRL method, actions affect not only the immediate reward, but also the next situation and all subsequent rewards. In order to get more rewards, DRL agent must like the actions that have been tried in the past and are considered to produce rewards effectively. This feature needs to be reflected in the total system energy consumption cost. The purpose of our problem is to minimize the energy consumption cost, the reward can be formulated as

$rtxy=−c(txy)$ (16)

5 MVECC Caching Scheme

5.1 Principle of DDPG Algorithm

For some actions, it may be a continuous value, or a very high-dimensional discrete value, so that the spatial dimension of the action is great. If the random policy is used, it is possible to study the probability of all possible actions like DQN, and calculate the value of each possible action, the sample amount of that requires is very large. Although at the same state, the transition probability is different, but the maximum probability is unique. Determinative strategies take only the maximum probability, the strategy is: $πθ(s)=a$ .

Therefore, based on the deterministic Actor-Critic model, DDPG provide an accurate estimate decisive policy function and value function. Similar to the Actor-Critic model, the actor network uses policy functions, responsible for generating action and interacts with the environment. The critic network uses value function, responsible for assessing the actor's performance and guides the actor's next action. This combination can be used to implement joint optimization of the proposed content caching and delivery issues. The schematic diagram of DDPG is shown in Fig. 3. There are three modules: 1) actor network; 2) critic network; 3) experience replay buffer.

images

Figure 3: Algorithm flow chart

5.2 DDPG Based Mobile Edge Collaborative Caching Algorithm

We develop DDPG based solutions to jointly optimize caching and delivery strategies and minimize energy consumption costs, the algorithm is shown in Tab. 2. Network parameters are initialized at the beginning of the training process. The training dataset is collected by MBS. Then, vehicle information is calculated according to different models, and the parameters are input into the training network in the form of state and action matrix. Finally, after continuous training, the model will converge, and the output data is the optimal cache and delivery decision.

images

5.3 Mobile-Aware Caching Replacement Scheme

RSU should selectively replace outdated content based on the mobility of MRV. For example, when MRV moves from RSU1 to RSU2, RSU1 still stores the content requested by MRV, which causes a waste of storage resources. Fig. 4 shows the detailed process of three caching replacement rounds. MRV1, MRV2, and MRV3 request content $a$ , $b$ , and $c$ respectively. The table shows the caching status of RSU in each round.

images

Figure 4: Caching replacement process

In the first caching round, MRV1 passes through the coverage area of RSU1 and requests content $a$ . Assuming that RSU1 has cached content $a$ , RSU1 directly delivers content $a$ to MRV1.

In the second caching round, MRV1 leaves the coverage area of RSU1. At the same time, MRV2 will pass the coverage area of RSU1 and request content $b$ . Content $a$ that has been cached on RSU1 will not be delivered again. Therefore, in the second caching round, RSU1 replaces content $a$ with content $b$ , and RSU2 still caches content $a$ .

At the beginning of the third caching round, the cache status of each RSU is: RSU1 cache content $b$ , RSU1 cache content $a$ , and RSU3 cache content $a$ . MRV1 leaves the coverage area of RSU2, MRV2 will pass the coverage area of RSU2, MRV3 will pass the coverage area of RSU1, and request content $c$ . At this time, the cached content $b$ on RSU1 and the cached content $a$ on RSU2 will no longer be delivered. Therefore, in the third caching round, RSU1 replaces content $c$ with content $b$ , RSU2 replaces content $b$ with content $a$ , RSU3 still caches content $a$ . We found that during the first caching round, RSU1 has cached content $b$ . Therefore, before RSU1 replaces content $c$ with content $b$ , RSU2 can first obtain content $b$ from RSU1.

The flow of caching algorithm is shown in Fig. 5. In DDPG based collaborative caching algorithm, the input dataset is the state information of IoV environment, and the output dataset is content caching and delivery decisions. According to the decisions and node cache state, the cache replacement scheme determines the contents that need to be replaced.

images

Figure 5: Flow chart of collaborative caching scheme

6 Simulation Results and Discussions

6.1 Parameter Settings

In this section, we use Python to build a simulation environment for the MVECC scheme. Using the TensorFlow platform to implement a DDPG-based collaborative caching solution (DDPG-Caching). Tab. 3 summarizes the main parameters used in the simulation.

images

As a reference, we implemented a random caching scheme and a deep Q-network (DQN)-based caching scheme (DQN-Caching) [24] as benchmark caching scheme. In the random caching scheme, the system randomly makes content caching and delivery decisions. In the DQN-caching scheme, there is no RSU collaborative caching, and only the RSU or ACV is responsible for delivery. We take energy consumption and content hit rate as performance measures. The energy consumption is generated by the content transmission, it can be calculated by Eqs. (9)–(13). We take the ratio of content hit number to the total requests number as content hit rate, which measures the caching efficiency of RSUs. The content hit rate can be formulated as:

$htf=∑m=1Mam,fxM,f∈F$ (17)

where $am,fx∈{0,1}$ denotes content hit state, $M$ is the total requests number.

6.2 System Performance Analysis

images

Figure 6: Comparison of energy consumption

Fig. 6 shows the comparison of system energy consumption costs when MRV = 200 and ACV = 50. As the number of training increases, we can draw the following observations from Fig. 6. First of all, the random cache scheme has the highest average system energy cost, the energy cost does not decrease with the increase of the number of trainings. This is because the random content caching and delivery decisions are not optimal system decisions. Therefore, random cache has a higher energy cost and does not have convergence. Secondly, because the collaboration of RSU and the assisted caching of ACV provide a more effective caching and delivery mode, the MVECC scheme can effectively reduce system costs. Third, compared with other benchmark schemes, through the DDPG reinforcement learning algorithm, our proposed MVECC scheme successfully solves the complex problem of high-dimensional action space in the joint optimization. So that MVECC can quickly find the best caching and delivery solution.

Next, in order to verify the performance of our proposed collaborative caching scheme, we compared the DQN-Caching algorithm with the proposed DDPG-Caching algorithm in terms of content hit rate and energy consumption.

images

Figure 7: Content hit rate under different ACV numbers

As shown in Fig. 7, compared with the non-collaborative algorithm, the proposed MVECC scheme increases the content hit rate by about 15%. As the number of ACVs increases, MVECC can provide MRV with more opportunities for V2V content delivery. In addition, through the caching and delivery decision made by DRL agent, RSU can cache more valuable content and deliver the content with a higher hit rate. Since DQN does not have strong learning ability when dealing with high-latitude spatial problems, the more ACVs there are, the higher the content hit rate of DRL based-MVECC will be.

images

Figure 8: System energy consumption under different ACV and MRV numbers

Then, we use the overall system energy consumption as a performance indicator. As shown in Fig. 8, our proposed MVECC scheme has the lowest system energy consumption. The reason is that in the MVECC scheme, RSUs can respond to MRV content requests on the edge side, which can reduce system latency by reducing transmission distance and sharing cached content. When there are more MRV numbers, the number of requests will increase accordingly, so the system energy consumption will also increase. The increase in ACV will provide more V2V opportunities, so the system energy consumption will continue to decrease. Since there is no learning process for random caching, the content caching and delivery decisions executed each time are random rather than optimal, which results in a large system energy consumption. However, for DRL, when the number of MRV and ACV increases, there will be a disaster of dimensionality. Therefore, compared with the DQN caching algorithm, the DRL agent can avoid the dimensionality disaster through the DDPG algorithm and determine the best caching and delivery strategy for MVECC. Therefore, the optimal caching and delivery decisions made by the DRL agent in the MVECC scheme can still minimize the energy consumption.

images

Figure 9: System energy consumption under different cache capacity

Finally, we compared the system energy consumption under different cache capacities. As shown in Fig. 9, as the cache capacity increases, the system energy consumption continues to decrease. This is because more content can be cached on the edge side, MRV content requests will be responded to more on the edge side. The number of content requests forwarded by the link to the CCS will be greatly reduced, thereby effectively reducing the high energy consumption costs on the backhaul link. In addition, the DRL agent selects the optimal content caching and delivery decision for the MVECC scheme, so that the caching node can cache the content that is more likely to be delivered under the limited cache capacity. RSU and ACV have more opportunities for content delivery, so that the system has the lowest energy consumption.

7 Conclusion

In this article, we take the vehicle caching network as an example to study the collaborative caching problem in the internet of vehicles. We propose a novel MEC-assisted vehicle edge collaborative caching (MVECC) scheme to improve cache efficiency. With the help of the roadside unit (RSU) and assisted caching vehicle, MVECC has two content caching modes and four content delivery modes coexist. MVECC also uses DRL to actively cache popular content on RSU, so as to reduce cache energy consumption. In addition, based on the vehicle mobility, we designed a mobility-aware cache replacement strategy to dynamically update the cached content on RSU. The simulation results show that the method is better than the random caching scheme and the DQN caching scheme in terms of content hit rate and system energy consumption. We do not consider the impact of link state changes on the transmission process, in our future work, we intend to study this problem and design a caching scheme based on federated learning to better adapt to user behavior.

Acknowledgement: Thanks for the technical support provided by the State Key Laboratory of Networking and Switching Technology.

Funding Statement: This work is supported by the Science and Technology Project of State Grid Corporation of China: Research and Application of Key Technologies in Virtual Operation of Information and Communication Resources.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

References

1. Y. Kim, N. An, J. Park and H. Lim, “Mobility support for vehicular cloud radio-access-networks with edge computing,” in 2018 IEEE 7th Int. Conf. on Cloud Networking (CloudNetTokyo, pp. 1–4, 2018. [Google Scholar]

2. Y. Yan, Y. Dai, Z. Zhou, W. Jiang and S. Guo, “Edge computing-based tasks offloading and block caching for mobile blockchain,” Computers Materials & Continua, vol. 62, no. 2, pp. 905–915, 2020. [Google Scholar]

3. D. Zhu, Y. Sun, X. Li and R. Qu, “Massive files prefetching model based on LSTM neural network with cache transaction strategy,” Computers Materials & Continua, vol. 63, no. 2, pp. 979–993, 2020. [Google Scholar]

4. M. Yu, R. Li and Y. Chen, “A cache replacement policy based on multi-factors for named data networking,” Computers Materials & Continua, vol. 65, no. 1, pp. 321–336, 2020. [Google Scholar]

5. G. Qiao, S. Leng, S. Maharjan, Y. Zhang and N. Ansari, “Deep reinforcement learning for cooperative content caching in vehicular edge computing and networks,” IEEE Internet of Things Journal, vol. 7, no. 1, pp. 247–257, 2020. [Google Scholar]

6. Z. Li, W. Wei, T. Zhang, M. Wang, S. Hou et al., “Online multi-expert learning for visual tracking,” IEEE Transactions on Image Processing, vol. 29, pp. 934–946, 2020. [Google Scholar]

7. Q. Deng, Z. Li, J. Chen, F. Zeng, H. Wang et al., “Dynamic spectrum sharing for hybrid access in OFDMA-based cognitive femtocell networks,” IEEE Transactions on Vehicular Technology, vol. 67, no. 11, pp. 10830–10840, 2018. [Google Scholar]

8. M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,” IEEE Transactions on Information Theory, vol. 60, no. 5, pp. 2856–2867, 2014. [Google Scholar]

9. S. Wang, X. Zhang, Y. Zhang, L. Wang, J. Yang et al., “A survey on mobile edge networks: Convergence of computing, caching and communications,” IEEE Access, vol. 5, pp. 6757–6779, 2017. [Google Scholar]

10. X. Wang, S. Leng and K. Yang, “Social-aware edge caching in fog radio access networks,” IEEE Access, vol. 5, pp. 8492–8501, 2017. [Google Scholar]

11. L. Yao, A. Chen, J. Deng, J. Wang and G. Wu, “A cooperative caching scheme based on mobility prediction in vehicular content centric networks,” IEEE Transactions on Vehicular Technology, vol. 67, no. 6, pp. 5435–5444, 2018. [Google Scholar]

12. H. Wang, Y. Li, X. Zhao and F. Yang, “An algorithm based on markov chain to improve edge cache hit ratio for blockchain-enabled IoT,” China Communications, vol. 17, no. 9, pp. 66–76, 2020. [Google Scholar]

13. J. Huang, C. Zhang and J. Zhang, “A multi-queue approach of energy efficient task scheduling for sensor hubs,” Chinese Journal of Electronics, vol. 29, no. 2, pp. 242–247, 2020. [Google Scholar]

14. J. Huang, J. Liang and S. Ali, “A simulation-based optimization approach for reliability-aware service composition in edge computing,” IEEE Access, vol. 8, pp. 50355–50366, 2020. [Google Scholar]

15. C. Campolo and A. Molinaro, “Improving V2R connectivity to provide ITS applications in IEEE 802.11p/WAVE VANETs,” in 2011 18th Int. Conf. on Telecommunications, Ayia Napa, pp. 476–481, 2011. [Google Scholar]

16. J. Kwon and H. Park, “Reliable data dissemination strategy based on systematic network coding in V2I networks,” in 2019 Int. Conf. on Information and Communication Technology Convergence (ICTCJeju Island, Korea (Southpp. 744–746, 2019. [Google Scholar]

17. J. Mei, K. Zheng, L. Zhao, Y. Teng and X. Wang, “A latency and reliability guaranteed resource allocation scheme for LTE V2V communication systems,” IEEE Transactions on Wireless Communications, vol. 17, no. 6, pp. 3850–3860, 2018. [Google Scholar]

18. W. Zhao, Y. Qin, D. Gao, C. Foh and H. Chao, “An efficient cache strategy in information centric networking vehicle-to-vehicle scenario,” IEEE Access, vol. 5, pp. 12657–12667, 2017. [Google Scholar]

19. R. Ding, T. Wang, L. Song, Z. Han and J. Wu, “Roadside-unit caching in vehicular ad hoc networks for efficient popular content delivery,” in 2015 IEEE Wireless Communications and Networking Conf. (WCNCNew Orleans, LA, USA, pp. 1207–1212, 2015. [Google Scholar]

20. S. Abdelhamid, H. S. Hassanein and G. Takahara, “On-road caching assistance for ubiquitous vehicle-based information service,” IEEE Transactions on Vehicular Technology, vol. 64, no. 12, pp. 5477–5492, 2015. [Google Scholar]

21. B. Hu, L. Fang, X. Cheng and L. Yang, “Vehicle-to-vehicle distributed storage in vehicular networks,” in 2018 IEEE Int. Conf. on Communications (ICCKansas City, MO, USA, pp. 1–6, 2018. [Google Scholar]

22. Z. Li, T. Chu, I. Kolmanovsky, Y. Xiang and X. Yin, “Cloud resource allocation for cloud-based automotive applications,” Mechatronics, vol. 50, no. 4, pp. 356–365, 2018. [Google Scholar]

23. A. Sadeghi, F. Sheikholeslam and G. B. Giannakis, “Optimal and scalable caching for 5G using reinforcement learning of space-time popularities,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 180–190, 2018. [Google Scholar]

24. J. Tang, H. Tang, X. Zhang, K. Cumanan, G. Chen et al., “Energy minimization in D2D-assisted cache-enabled internet of things: A deep reinforcement learning approach,” IEEE Transactions on Industrial Informatics, vol. 16, no. 8, pp. 5412–5423, 2020. [Google Scholar]

This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.