[BACK]
Intelligent Automation & Soft Computing
DOI:10.32604/iasc.2022.023055
images
Article

Deep Reinforcement Extreme Learning Machines for Secured Routing in Internet of Things (IoT) Applications

K. Lavanya1,*, K. Vimala Devi2 and B. R. Tapas Bapu3

1Velammal Engineering College, Chennai, 600066, India
2Vellore Institute of Technology, Vellore, 632014, India
3S A Engineering College, Chennai, 600077, India
*Corresponding Author: K. Lavanya. Email: lavanya201180@gmail.com
Received: 26 August 2021; Accepted: 17 January 2022

Abstract: Multipath TCP (SMPTCP) has gained more attention as a valuable approach for IoT systems. SMPTCP is introduced as an evolution of Transmission Control Protocol (TCP) to pass packets simultaneously across several routes to completely exploit virtual networks on multi-homed consoles and other network services. The current multipath networking algorithms and simulation software strategies are confronted with sub-flow irregularity issues due to network heterogeneity, and routing configuration issues can be fixed adequately. To overcome the issues, this paper proposes a novel deep reinforcement-based extreme learning machines (DRLELM) approach to examine the complexities between routes, pathways, sub-flows, and SMPTCP connections in different topologies. Using DRLELM, throughput of the network is estimated. The extreme learning machines (ELM) preserves the run time wastage in multipath networks with faster convergence. Also, the Novel multipath TCP routing protocol integrates the logistic chaotic algorithm for the secured data transmission. Final results shows that the proposed framework outperformed other existing algorithms.

Keywords: Reinforcement learning; extreme learning machine; multipath TCP; IoT; logistic chaotic algorithms

1  Introduction

The advancements and evolution of wireless data transmission make the networks seem to be on a multi-track pattern. Even so, the conventional TCP (transmission control protocol) is inherently configured to be a homogeneous sensor framework and is unable to use several paths at the same time. Multipath TCP (SMPTCP), introduced by the Internet Engineering Task Force, focuses on improving bit rate and transfer latency back through using several alternative routes. Because SMPTCP can make full use of network services among multi-home users, it has received great attention as a powerful platform in cellular communications as well as information centres [14].

Random-based methods including Equal-Cost Multi-Path (ECMP) [5] were widely often used routing protocols in SMPTCP. Even then, sub-flows during an SMPTCP link are combined, so SMPTCP assumes that different paths among domains would be distorted and paired, but random-based methods really aren’t expressly configured for SMPTCP and may probably wind up using the same routes for various sub-flows, that will lead to preventable delays and consume a significant amount of available computing resources.

Besides, since conventional routing methods have rather specific guidelines of service providers, the links are usually redirected more along the routes, although there are several routes available in the network. As a result, these links cannot completely impact numerous sub-flows supported by SMPTCP, and the quality of evidence continues minimal [6,7].

SMPTCP has several benefits when it comes to the simultaneous transmission of the sensor network, however, with the exponential growth of Internet of Things (IoT), ever more computers are information over the Internet, leading to increased transferring of information. Cyber threats are indeed common in dynamic real-world settings. The IoT ecosystem is highly susceptible to denial of service (DDoS) threats, which also pose a significant risk to privacy [8,9]. The event of changes in connection speeds triggers wireless networks to have a significant impact on SMPTCP reliability, like out-of-order packet data arriving and buffer congestion, particularly for time-sensitive graphical video services [10].

The standard data communication system that relies on the fixed theoretical equations will no longer satisfy the complexities and precision specifications of the upcoming IoT. In a way to produce the best use of the benefits of advanced communication standards, investigators have reported in-depth studies on the implementation of neural networks, the development of new communication systems, the security of privacy, and the implementation of multi-track internet protocol features. This demonstrates that data communication protocols and architectures with intelligent learning and complex adaptive functions need to be researched to easily evaluate multipath reliability and efficiently handle transmission quality. Hence the paper proposes the implementation of Hybrid DRLELM for better path prediction for efficient video transmission over the network. Also, the integration of logistic chaotic encryption algorithm has made more resistance against the IoT attacks. Extensive experimentation has been conducted for using medical video datasets and various parameters such as accuracy, precision, recall, and throughput has been calculated and compared with the other existing protocols.

The contribution of the proposed method is organized as follows. Section 2 discusses the different related works regarding the SMPTCP routing protocol. Section 3 presents the background and preliminary overview of deep reinforcement learning and extreme learning machines. The proposed DRL-ELM framework has been explained in Section 4. The experimentations, dataset descriptions, results, and comparative analysis are presented in Section 5. Lastly, the article is concluded in Section 6.

2  Related Works

The state of the service varies in actual environments, and the actual status of the network is still stagnating on the responder’s assessment. The SMPTCP inter control strategy and bandwidth fitting method regarding the standard static mathematical formula are impossible to fulfill the complexity and consistency specifications. To overcome these problems, studies in this field also attempted to build some studies into an efficient route management framework that can easily regulate its use of routes.

Ferlin et al. [11] implemented a realistic shared bottleneck detection (SBD) protocol for SMPTCP. Considering comprehensive emulations, they showed that SMPTCP-SBD outdoes all currently implemented congestion-coupled SMPTCP systems by specifically identifying bottlenecks. Throughout the case of a non-shared bottleneck situation, they find throughput improvements of 40 percent for two main sub and gains rise dramatically as the range of sub grows, hitting over 100 percent for five sub-flows. In particular, the mutual bottleneck situation indicates that SMPTCP-SBD stays reasonable to TCP. The emulation findings are complemented by standard results confirming its protection for implementation. The oscillation leading to the increase of noise due to multiple subs needs to be considered which can mislead the network.

Dong et al. [12] developed a modern mVeno technique that produces great use of the congestion facts of all sub-flows contributing to a TCP link to change the data transmission of every sub-flow. Especially, mVeno alters the incremental raise step of Veno such that both sub-flows can be efficiently coupled by simply increasing the fraction of window size depending mostly on acquiring acknowledgement (ACK). The weighted factor for every congestion window that tunes the sub-flow is calculated by discriminating link failures due to the random central connecting error or network traffic. Also, they deployed mVeno on a Linux system and perform detailed tests as well as in the test bed and in actual wireless area network (WAN) to verify its efficacy. Even though the suggested framework does not automatically discriminate the traffic load as packet error incidents happen, this cannot efficiently pick paths in wireless networks with unpredictable dropped packets.

Kaiping et al. [13] studied and addressed the conceptual specifics of the SMPTCP norm and summarizes the existing crisis of the relevant issues, including simulation and actual implementation, out-of-order regulation, combined congestion control, management of energy use, protection, and other factors such as route discovery, accessibility, and Quality of service (QoS), data centre environment.

Cao et al. [14] suggested an innovative quality of experience (QoE)-energy-aware, multipath TCP (SMPTCP-QE), an content management solution for SMPTCP-based cellular telephones. The key concept of SMPTCP-QE is formulated as having: first, a device rate-aware energy-efficient sub-flow control approach for trade-off throughput efficiency and resource utilization for cell phones; and then using the accessible bandwidth-aware congestion window rapid recovery plan to achieve that the transmitter prevents excessive sluggish and utilizes wireless resources quickly; and, secondly, an introduction to SMPTCP-QE.

Chung et al. [15] developed a new path design methodology for SMPTCP based on Deep Learning to solve the SMPTCP issue. It handles the use of paths between multiple links centered on a judgment determined by machine learning algorithms. We use different quality metrics, including signal power, data rate, and TCP throughput, count of Access Pointss (APs), and round trip time (RTT), for the reliable route. The proposed SMPTCP doesn’t easily identify and control the route of short consequences.

Li et al. [16] developed a recursive tile coding approach for state consolidation and a feature analysis method for Q-learning, that can effectively extract the best outcome. Because of the asynchronous nature of the smart congestion control (SmartCC), the model preparation and implementation phases are decoupled, as well as the training process would not result in unnecessary pause and latency in the decision-making phase of SMPTCP congestion management. The comprehensive performance assessment studies, which demonstrate that SmartCC substantially increases cumulative throughput and outclasses state-of-the-art frameworks in several quality metrics. Through implementing a learning-based congestion process model in a real network context, a range of realistic problems emerge, including the exploration approach at the crowd sourcing level, the expense of model testing, and the promise of real-time decision-making.

Xu et al. [17] proposed a validation scheme using the Deep Reinforcement Learning (DRL) Control System for Multipath TCP congestion control architecture theory. DRL-congestion control (DRL-CC) uses a single entity to automatically and collectively monitor the aggregation of all functional SMPTCP flows on the destination computer to optimize the overall usefulness. The innovation of this architecture is the use of the versatile Recurrent Neural Network (RNN), long short term memory (LSTM), within the DRL archetype for the training and description of all active flows and their complexities. Such bottleneck routing protocols do not function well in a complicated and highly dynamic system in which several variables including random failure, a wide variety of RTTs, loss connections to throwaway prototyping may have an impact on its performance when it seems difficult to specify the optimal or indeed the optimal rule for each potential situation that could occur during runtime.

Hesmans et al. [18] discussed SMPTCP data routing protocols and congestion control algorithms only take into account system throughput, they disregard the primary concern difference in transmitting data induced by the uncertainty of information types and long run-time delays. Influenced by these factors, this work proposes a novel DRL-ELM method with extreme learning machines as a visible way to discern the comparatively prioritized routing path, focus on ensuring that greater information to be transmitted through a good quality pathway, strengthen the network throughput of increased data, and reduce transmission delays for an efficient communication framework for IoT applications.

3  Preliminary Works

3.1 Deep Reinforcement Learning Scheme

Reinforcement learning is a type of machine learning technique. It will learn from the errors, learn the process of reaching the aim by repeated attempts, and eventually discover the right way to resolve the issue. The reinforcement learning paradigm is the deep-Q network (DQN), a mixture of Q-learning and modeling techniques. The distinction between the deep-Q network model and Q-learning would be that the deep-Q model uses neural networks to save state and action information, so the main update feature of deep–Q model is also focused on Q-learning. The essence of Q-learning is to locate the Q-value and using the Q-value to refine the model predictive control. Q-value is the purposes of each SMPTCP condition, behavior, and compensation. The goal is to maximize path selection to increase transmission quality, and the model requires state knowledge, intervention, and incentive for decision making.

3.2 Extreme Learning Machines

Extreme learning machines (ELM), a current sort of feed-forward neural network model with only one layer. This approach completely relies on forwarding propagation with social facets: an aggregate interpretation centered on an infinite input data set as well as a regional estimation focused on a minimal collection. ELM is being used to speed up the learning process by deep RL module and thus preserves runtime of the transmission.

Consider the training set as P = {(st, tk)|sk ∈ Qn, tk ∈ Ql, k = 1, 2, …L and the activation function be f and the total hidden node count be N. The complete ELM process is based on the following three steps.

1)   Define arbitrarily input network parameters or cores and hidden node preference be α and hidden node preference be β

2)   Measure the performance H function of the hidden units

3)   Evaluate the output weight of T = HY

In which H′ is the general linear Moore-Penrose inverse of the output matrix H.

4  Proposed Method

To maximize the average performance, we propose a new DRL based routing protocol, which sorts the Q-value path and selects the best path with the aid of a deep Q-network. Our perspective would be that we send signals to the specific path by knowing the DRL to improve the overall throughput of the SMPTCP. Selecting the route is a typical Markov decision process-based issue. So we’re proposing a complex data sequencing model called on deep reinforcement learning and ELM to fix the issues. Fig. 1. Shows the proposed architecture of the DRL-ELM model for any SMPTCP system.

images

Figure 1: Schematic of proposed DRLELM model

Using DRL strategy, initially, the actor-network and the critic network are needed to evolve with the mapping of deep neural networks for data mapping. The critic network is responsible for correlating and collecting the data to get transmitted from the environment and this is updated as the active state. The end value can be calculated as the maximum value-added with the number of epoch times. Once the values are updated in the actor and critic network, both are compared by the deep neural model using gradient decent model. The algorithm follows the greedy algorithm policy for the calculation of cost. The proposed algorithm gathers data during transmissions and eventually learns the utility of distribution in each process. Fig. 1 shows the schematic of proposed DRLELM.

The key concept underlying our proposed algorithm is to transmit information to the preferred route, which has a greater Q-value with a resource constraint, and to ensure that perhaps the packets meet the in-order recipient. Secondly, each path chosen by the DRL algorithm is allocated a function within the set. The function is chosen if its associated input is higher than 0; else the feature is discarded. Given the parameters, the variables are part of the DRL-ELM optimal solution. The ELM for path optimization is shown in Fig. 2.

images

Figure 2: Flow diagram of ELM path selection process

4.1 Logistic Chaotic Secure Encryption

After determining the best path among the multipath available, high randomness security algorithm has been integrated for the secured data transmission in the path selected. For the secured data transmission, logistic encryption systems are highly sensitive and are unpredictable. The logistic maps are generated initially with basic permutation parameters. The bifurcation parameter varies from 0 to 4 when the iteration proceeds. The logistic chaos maps are mainly generated with highly random outcomes. Thus the logistic map output is of highly non-linear pseudo-random image encryption.

The chaotic logistic maps are basically expressed as

xm+1=μxm(1xm) (1)

Values ranging from the condition 0 ≤ μ ≤ 4 are the bifurcation parameter of logistic maps. The initial condition is chosen as x0 ∈ (0, 1) and as the iteration continues for m number of sequences the threshold 3.5699456 < μ ≤ 4 defines the chaotic nature of the system. The proposed logistic map which is expressed as

{am+1=C(am|am|)Cfmbm+1=fmh2α+hαC[(22am)fm2amdm]em+1=dmh2α+h2αC[2(1am)dm2amfmam]} (2)

C Denotes the control parameter α is the dissipation constant am and dm are complex conjugate coordinates. Thus the model is highly sensible such that a small change in data generates different pseudo-random sequences. As the iteration proceeds, the model exhibits the non-linearity required for image encryption. These encrypted medical images are transmitted over the path which is chosen over the path selected by the proposed DRLELM. Fig. 3 shows the chaotic Behaviors for Different Initial Conditions

images

Figure 3: Chaotic behaviors for different initial conditions

The overall working mechanism for the proposed logistic maps based data security algorithm is given as follows as

Step 1: The input medical videos are converted into frames and each frame are arranged into the matrix

Step 2: Using the logistic map Eqs. (1)(4), pseudo-random sequences are generated which are then scaled to 256.

Step 3: The Intermediate key is generated by performing the permutations between the Image matrix (Step 1) and Chaotic Logistic Maps

Step 4: The key which is generated in Step 3, is then used to encrypt every frame of the videos using the diffusion process.

Step 5: Transmit the data over the path selected by the Deep Extreme Reinforcement Learning Algorithm.

5  Experimental Results and Discussion

The core instruments of our studies shall contain the following:

i)   The wired server, Linux kernel runs on the Raspberry pi Model 3, the kernel variant is 2.6.15. The database is linked to the server using the Ethernet connection;

ii)   Two handheld servers, i.e., two Android devices as the Video chat Voice Call server. We also implemented deep learning in application-level SMPTCP route control, trying to take advantage of functionality and ease of providing a range of details from cellular communication technologies.

iii)   Real time medical image datasets were utilized to test the proposed DRLELM framework

The implemented DRLELM, k-Means algorithm [19], and k-NN [20] algorithm are integrated into the regularization technique. In the simulation study, the characteristics of the LTE as well as Wi-Fi framed by the International Telecommunications Union (ITU) have been used, which include broadband connections and postponement interval principles and packet drop rates. In important to maintain the consistency of the study, a wireless routing module was established from each smartphone and the two wi-fi networks were using the same bandwidth. We obtained random data inside the range of parameters to simulate different route environments. The performance of the algorithm has been evaluated using the following metrics

Accuracy=NoofPathsDetectedTotalNoofPathsAvailable×100 (3)

Sensitivity=TPTP+TN×100 (4)

Specificity=TNTP+TN×100 (5)

where TP and TN Represent True Positive and True Negative values used for detecting the performance of the system. The performance metrics of the proposed algorithms along with the different machine learning algorithms has been discussed as follows

Fig. 4 shows the accuracy of path detection for the different algorithms in which the proposed deep learning algorithm has 99.5% accuracy and accuracies of reinforcement is 94%, 90% for KNN, 89% for Random Forest, 84% for decision tree respectively. The integration of deep extreme learning algorithm along with the reinforcement logics has increased the efficiency by 15%–20% than existing reinforcement algorithm in detecting the suitable paths for the video transmission. From Figs. 5 and 6, it is clear that the proposed algorithm has exhibited a higher sensitivity and specificity rate than the existing machine learning algorithms. Moreover, we have compared other parameters such as network throughput analysis and response time using different machine learning algorithms which are shown in Figs. 79.

images

Figure 4: Comparative analysis for the accuracy of detection between the proposed algorithm and other existing algorithms

images

Figure 5: Comparative analysis for the sensitivity between the proposed algorithm and other existing algorithms

images

Figure 6: Comparative analysis for the specificity between the proposed algorithm and other existing algorithms

images

Figure 7: Comparative analysis for the throughput between the proposed algorithm and other existing algorithms for 10 samples/secs

images

Figure 8: Comparative analysis for the throughput between the proposed algorithm and other existing algorithms for increased 20 video samples/secs

images

Figure 9: Comparative analysis of DRLELM response time with other algorithms for 10 video samples/secs

From Fig. 7, it is found that the throughput is maintained at 98% using the proposed deep learning models whereas other algorithms maintain the 94% to 90% throughput respectively.

Figs. 7, 8 shows the throughput and Fig. 9 shows the response time analysis for the proposed algorithm and different machine learning algorithms. From Figs. 7 and 8, it is found that integration of the DRLELM has made the throughput of the proposed algorithm to maintain at 98% whereas other machine learning algorithm experiences the drastic reduction of throughput as the videos samples increases respectively. From Fig. 9, it is clear that, the proposed DRLELMoutperformed other existing algorithms in terms of response time. This shows the significant inclusion of the DRLELM methodology has maintained the high throughput even though the video samples are increased.

Security Analysis:

It is utmost important to validate the sensitivity of key generated for its resistance towards mild to strong attacks. The important parameters to test the performance of encryption key includes NPCR and UACI. For different encrypted images, the NPCR and UACI metrics are calculated according to the Eqs. (6) and (7) which are given as follows as

NPCR=i,jE(i,j)L100 (6)

UACI=1Li,j|f(i,j)f(i,j)|256100 (7)

where

E(i,j)=[1,f(i,j)f(i,j)0,f(i,j)=f(i,j) (8)

After the extensive experimentations, number of changing pixel rate (NPCR) and the unified averaged changed intensity (UACI) of the real time medical image datasets are found to be 99.65% and 33.67% respectively which shows that it can defend more attacks such as brute force attacks in an IoT environment.

6  Conclusion

In this article, we presented a DRLELM approach with an SMPTCP data schedule which identifies the best path encounter strategy to choose which best path underneath the accessible and distinct devices. The proposed DRL-ELM method gathers the information on the environment for each path and transmits the data to the path chosen by the reinforcement learning model. The ELM boosts the path selection with faster convergence for uncompromised runtime. The throughput attained is higher when compared with existing models with around 15%–20% improvement for SMPTCP applications. Future works include deployment in complex working environments and rigor testing.

Funding Statement: The authors received no specific funding for this study.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

References

  1. F. Song, M. Zhu, Y. Zhou, I. You and H. Zhang, “Smart collaborative tracking for ubiquitous power iot in edge-cloud interplay domain,” IEEE Internet of Things Journal, vol. 7, no. 7, pp. 6046–6055, 2020.
  2. J. Wu, B. Cheng and M. Wang, “Improving multipath video transmission with raptor codes in heterogeneous wireless networks,” IEEE Transactions on Multimedia, vol. 20, no. 2, pp. 457–472, 2018.
  3. F. Song, Z. Ai, Y. Zhou, I. You, K. R. Choo et al., “Smart collaborative automation for receive buffer control in multipath industrial networks,” IEEE Transactions on Industrial Informatics, vol. 16, no. 2, pp. 1385–1394, 2020.
  4. F. Alan, C. Raiciu, M. Handley and B. Olivier, “TCP extensions for multipath operation with multiple addresses: Draft-ietf-MPTCP-multiaddressed-03,” Internet Draft; Draft-Ietf-Mptcp multiaddressed, vol. 07, pp. 1–68, 2011.
  5. R. Ji, Y. Cao, X. Fan, Y. Jiang, G. Lei et al., “Multipath TCP-based iot communication evaluation: From the perspective of multipath management with machine learning,” Sensors, vol. 20, no. 6573, pp. 1–14, 2020.
  6. S. K. Saha, S. Aggarwal, D. Koutsonikolas and J. Widmer, “AMuSe: An agile multipath-tcp scheduler for dual-band 802.11 ad/wireless lans,” in Int. Conf. on Mobile Computing and Networking, New Delhi, pp. 705–707, 2018.
  7. B. Y. L. Kimura, D. C. S. F. Lima, L. A. Villas and A. A. F. Loureiro, “Interpath contention in multipath TCP disjoint paths,” IEEE/ACM Transactions on Networking, vol. 27, no. 4, pp. 1387–1400, 2019.
  8. D. A. F. Saraiva, V. R. Q. Leithardt, D. de Paula, A. S. Mendes, G. V. Gonz et al., “PRICES: Comparison of symmetric key algorithms for IOT devices,” Sensors, vol. 19, no. 4312, pp. 1–23, 2019.
  9. X. Yin, J. Liu, X. Cheng, B. Zeng and X. Xiong, “A Low-complexity design for the terminal device of the urbanIOT-oriented heterogeneous network with ultra-high-speed OFDM processing,” Sustainable Cities and Society, vol. 61, 2020. https://doi.org/10.1016/j.scs.2020.102323.
  10. M. Z. Shafiq, F. Le, M. Srivatsa and A. X. Liu, “Cross-path inference attacks on multipath TCP,” Proceedings of the ACM Workshop on Hot Topics in Networks, pp. 1–7, 2013. https://doi.org/1145/2535771.2535782.
  11. S. Ferlin, Ö. Alay, T. Dreibholz, D. A. Hayes and M. Welzl, “Revisiting congestion control for multipath TCP with shared bottleneck detection,” in IEEE Int. Conf. on Computer Communications, San Francisco, CA, USA, pp. 1–9, 2016.
  12. P. Dong, J. Wang, J. Huang, H. Wang and G. Min, “Performance enhancement of multipath TCP for wireless communications with multiple radio interfaces,” IEEE Transactions on Communications, vol. 64, no. 8, pp. 3456–3466, 2016.
  13. X. Kaiping, C. Ke, N. Dan, Z. Hong and H. Peilin, “Survey of mpTCP-based multipath transmission optimization,” Journal of Computer Research and Development, vol. 53, no. 11, pp. 2512–2529, 2016.
  14. Y. Cao, S. Chen, Q. Liu, Y. Zuo, H. Wang et al., “QoE-Driven energy-aware multipath content delivery approach for SMPTCP-based mobile phones,” China Communications, vol. 14, no. 2, pp. 90–103, 2017.
  15. J. Chung, D. Han, J. Kim and C. Kim, “Machine learning based path management for mobile devices over MPTCP,” in IEEE Int. Conf. on Big Data and Smart Computing (BigComp), Jeju, Korea (Southpp. 206–209, 2017.
  16. W. Li, H. Zhang, S. Gao, C. Xue, X. Wang et al., “SmartCC: A reinforcement learning approach for multipath TCP congestion control in heterogeneous networks,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 11, pp. 2621–2633, 2019.
  17. Z. Xu, J. Tang, C. Yin, Y. Wang and G. Xue, “Experience-driven congestion control: When multi-path TCP meets deep reinforcement learning,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1325–1336, 2019.
  18. B. Hesmans, G. Detal, S. Barre, R. Bauduin and O. Bonaventure, “SMAPP: Towards smart multipath TCP-enabled applications,” in ACM Conf. on Emerging Networking Experiments and Technologies, Heidelberg, Germany, pp. 1–7, 2015.
  19. J. A. Hartigan and M. A. Wong, “Algorithm as 136: A k-means clustering algorithm,” Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, no. 1, pp. 100–108, 1979.
  20. M. Manjusha and R. Harikumar, “Performance analysis of knn classifier and k-means clustering for robust classification of epilepsy from EEG signals,” in Int. Conf. on Wireless Communications, Signal Processing and Networking, Chennai, India, pp. 2412–2416, 2016.
images This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.