Computers, Materials & Continua DOI:10.32604/cmc.2022.026477 | |
Article |
Optimized Artificial Neural Network Techniques to Improve Cybersecurity of Higher Education Institution
1Information Systems Department, Faculty of Computing and Information Technology King Abdulaziz University, Jeddah, 21589, Saudi Arabia
2Information Technology Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
3Centre of Artificial Intelligence for Precision Medicines, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
4Mathematics Department, Faculty of Science, Al-Azhar University, Naser City, 11884, Cairo, Egypt
5Mathematics Department, College of Science, University of Bisha, Bisha, Saudi Arabia
6Mathematics Department, Faculty of Science, Damanhour University, Damanhour, Egypt
*Corresponding Author: Mahmoud Ragab. Email: mragab@kau.edu.sa
Received: 28 December 2021; Accepted: 22 February 2022
Abstract: Education acts as an important part of economic growth and improvement in human welfare. The educational sectors have transformed a lot in recent days, and Information and Communication Technology (ICT) is an effective part of the education field. Almost every action in university and college, right from the process from counselling to admissions and fee deposits has been automated. Attendance records, quiz, evaluation, mark, and grade submissions involved the utilization of the ICT. Therefore, security is essential to accomplish cybersecurity in higher security institutions (HEIs). In this view, this study develops an Automated Outlier Detection for CyberSecurity in Higher Education Institutions (AOD-CSHEI) technique. The AOD-CSHEI technique intends to determine the presence of intrusions or attacks in the HEIs. The AOD-CSHEI technique initially performs data pre-processing in two stages namely data conversion and class labelling. In addition, the Adaptive Synthetic (ADASYN) technique is exploited for the removal of outliers in the data. Besides, the sparrow search algorithm (SSA) with deep neural network (DNN) model is used for the classification of data into the existence or absence of intrusions in the HEIs network. Finally, the SSA is utilized to effectually adjust the hyper parameters of the DNN approach. In order to showcase the enhanced performance of the AOD-CSHEI technique, a set of simulations take place on three benchmark datasets and the results reported the enhanced efficiency of the AOD-CSHEI technique over its compared methods with higher accuracy of 0.9997.
Keywords: Higher security institutions; intrusion detection system; artificial intelligence; deep neural network; hyperparameter tuning; deep learning
The network environment of education institutions is uncontrollable, with different types of users namely residents, researchers, students, faculty, etc. [1]. There are several incidences where information in education institutions was the aim of hacking attempts. In education institutions, several measures have been taken to control suspected traffics. Novel attack takes advantage of computer vulnerability that doesn't have a solution at present. They are hard to identify, through reactive and proactive security methods. There is two technique for detecting attacks –anomaly detection and signature-based detection. Signature-based detection depends on matching attack patterns with signatures saved in a repository [2]. This technique isn't effective with attacks that signature is unavailable. In anomaly detection, standard profile pattern is preserved and any deviation or abnormality from this pattern is described. Anomaly detection could identify novel attacks; however, it leads to a higher amount of false positives. This approach requires heavy human contribution to upgrade signature repository and standard profiles. This is a time-consuming and expensive procedure [3]. The upgrading speed is slower when compared to the speed of new intrusion. Novel attack discovery needs defenders to be on-guard, however, this is impossible for automatically interfaced system. Few types of automated defense method are needed for preventing this attack. Automated signature generation and attack detection schemes support intrusion detection systems (IDS) to report and capture this attack. No single approach could assist in resolving this issue. Integration of methods–like signature generation algorithm, honeypots, IDS, analysis, and tracking–is required.
Network Intrusion Detection System (NIDS) was rapidly advanced in industry and academia responding to the growing cyberattacks against commercial enterprises and governments worldwide. The yearly cost of cybercrime is rising endlessly [4]. The more disturbing cybercrimes are resulting from denial of services, web-based attacks, and malicious insiders. Organizations could lose the intellectual property with this malevolent software crept into the network that might result in disruption to a country's critical national framework. Organization deploys antivirus software, firewall, and NIDS for securing computer systems from unauthorized accessing [5]. One of the attentive areas to solve cyberattacks rapidly is to distinguish the attack method earlier from the system utilizing NIDS. The NIDS is developed for detecting malevolent activity includes distributed denial of service (DDoS), virus, and worm attacks. The crucial factor for NIDS is reliability abnormality, detection speed, and accuracy. To fulfill the requirement of an IDS, the researcher has discovered the likelihood of utilizing machine learning (ML) and deep learning (DL) methods [6]. The two technique comes under the class of artificial intelligence (AI) and aims at learning effective data from the big data. This technique has received much recognition in the fields of network security [7], in the past few years because of the development of graphics processor units (GPU). The above two methods are effective tools in learning important features from the network traffics and predicting the normal and abnormal events on the basis of learned patterns [8]. The ML-based IDS heavily based on feature engineering for learning important data from the network traffics. Meanwhile, DL-based IDS don't depend on feature engineering and are good at learning complicated features automatically from the raw information because of their deep framework.
Vinayakumar et al. [9] define how consecutive data modelling is a related process in Cybersecurity. Sequence is temporal features implicitly or explicitly. The recurrent neural network (RNN) method is a set of artificial neural network (ANN) that has seemed as a principle, powerful method for learning dynamic temporal behavior in a random length of largescale sequence data. Moreover, stacked RNN (S-RNN) has the possibility of quickly learning complicated temporal behavior, involving sparse representation. Agarwal et al. [10] introduced a certain factor that makes complex for an IDS to detect and monitor web-based attacks. Also, the study presents a complete review of the current detection system developed exclusively for observing web traffics. Moreover, recognize different dimensions to compare the IDS from distinct perceptions based on the functionality and design. Also, we presented a conceptual architecture of web-based IDS with a prevention method for offering systematic guidelines for the system performance.
Zhou et al. [11] present an IDS method and it is depending on the ensemble learning and feature selection methods. Initially, a heuristic method named correlation based feature selection (CFS)-bat algorithm (BA) is presented for reduction dimension that chooses optimum subset on the basis of correlations among the features. Next, present an ensemble model which integrates C4.5, random forest (RF), and Forest using Penalizing Attribute (Forest PA) algorithm. Akashdeep et al. [12] developed a smart technique that implements feature ranking based on the data correlation and gain. Then, reduction feature is performed by integrating rank attained from data correlation and gain with a method for identifying useless and useful characteristics. This reduction feature is later given to an feed forward neural network (FFNN) model for testing and training on KDD99 datasets.
Jin et al. [13] designed an IDS called SwiftIDS, i.e., able to analyse huge traffic information in higher-speed network at an appropriate time and keep acceptable recognition performance. SwiftIDS accomplishes this aim by two techniques. One method is that light gradient boosting machine (LightGBM) is adapted as the IDS for handling the huge data traffics. Li et al. [14] present effective DL methods such as autoencoder (AE)-IDS based random forest (RF) technique. This approach created the training set with feature grouping and FS. When the training process gets completed, the method could forecast the fallouts with AE that significantly decreases the recognition time and efficiently enhanced the predictive performance.
This study presented a novel automated outlier detection technique for cybersecurity in higher education institutions (HEI), named AOD-CSHEI technique. The AOD-CSHEI technique originally executes data pre-processing in two stages namely data conversion and class labelling. Also, the Adaptive Synthetic (ADASYN) is exploited for the removal of outliers in the data. Further, the sparrow search algorithm (SSA) with DNN model is used for classifying the data into the existence or absence of intrusions in the HEIs network. Lastly, the SSA is utilized to effectually adjust the hyper parameter of the DNN. To demonstrate the improved outcomes of the AOD-CSHEI technique, a wide ranging experimental analysis is carried out using three benchmark datasets.
The remaining sections of the paper is organized as follows. Section 2 elaborates the proposed model, Section 3 offers the performance validation, and Section 4 draws the conclusion.
2 The Proposed AOD-CSHEI Technique
This study has presented a new AOD-CSHEI technique to identify the presence of intrusions or attacks in the HEIs and the overall process is given in Fig. 1. At the initial stage, the input data is pre-processed in two stages namely data conversion and class labelling. The AOD-CSHEI technique performs different subprocesses namely pre-processing, ADASYN based outlier detection, DNN based classification, and SSA based hyperparameter tuning. In this work, the SSA with DNN model is used for the classification of data into the existence or absence of intrusions in the HEIs network and the SSA is utilized to effectually adjust the hyper parameters of the DNN model.
2.1 ADASYN Based Outlier Detection
During the removal of outlier's process, the ADASYN technique receives the pre-processed data as input to eradicate the outliers that exist in it. The fundamental concept of ADASYN technique is to describe the weight distribution of minority sample by determining the degree of learning difficulty of minority sample [15]. For binary classification problems, the dataset
2.2 DNN Based Classification Model
At this stage, the DNN model gets executed to determine the presence of intrusions or attacks in the HEIs. The DNN is a network system i.e., depending upon DL approach. This technique is extensively applied in the image classification, computational biology, and signal prediction fields due to its benefits namely ease of understanding and simple structure. The internal architecture of the DNN comprises input, output, and hidden layers; each layer is fully connected. The input layer has m neuron, as well as w and b, denote the weight and bias, correspondingly [17]. The gradient backpropagation method is employed for updating parameters in the DNN. This parameter includes bias b and weight w of all the connection layers. There might be an unavoidable error between the output and the input sample label at the time of network training. Once the DNN method begins to train, few initialized network parameter needs to be fixed namely network model parameter (the amount of neurons from the hidden layers, the amount of neurons from the input layer, the amount of neurons in the output layer, and the activation function), epoch, momentum, batch size, initial learning rate.
2.3 SSA Based Hyperparameter Tuning Process
For boosting the efficacy of the DNN, the SSA is applied to properly tune the hyper parameter of the DNN. In general, sparrow is the type of bird i.e., more common one since it tends to relate with group and survives more near to us. For experimental purpose, virtual sparrow is utilized for searching food source. The position of the sparrow is determined as follows:
In which n indicates the amount of sparrows and d denotes the dimensional of parameter that must be tuned as follows:
While the values existing in all the rows of
In the equation, t denotes the existing iteration,
1. When
2. When
In the event of scrounger, it is essential for enforcing the rules (1) and (2). After winning the battle, they obtain producer food instantaneously; otherwise, they persevere to achieve the rules (1) and (2):
In the equation,
In which
OBL is a powerful mechanism utilized for optimization to increase the convergence speed of distinct metaheuristic approaches [19]. The efficient model of the OBL includes the validation of the existing population in the similar round to describe the optimum candidate for given problems. The idea of OBL was applied efficiently in and the concept of opposite value is needed to be determined for describing OBL.
In this section, the experimental result analysis of the AOD-CSHEI methodology takes place using three benchmark dataset [20]. A comparative analysis is made with decision tree (DT), logistic regression (LR), Naïve Bayesian (NB), ANN, support vector machines (SVM), Adaboost, and LightGBM techniques.
Tab. 1 provides a detailed comparative study of the AOD-CSHEI technique with existing techniques on the test NSL-KDD data set. Fig. 3 offers the accuracy analysis of the AOD-CSHEI technique and existing techniques on the testing and training of NSL-KDD datasets.
The outcomes illustrated that the NB model has shown ineffectual outcomes with the least values of accuracy. At the same time, the ANN, DT, and LR models have obtained slightly improved values of accuracy. Followed by, the Adaboost model has resulted in moderately increased accuracy values. Though the LightGBM and SVM techniques have reached reasonable accuracy values, the presented AOD-CSHEI technique has accomplished maximum training and testing accuracy of 0.9936 and 0.9152 respectively.
Next, the training time (TRT) and testing time (TST) analysis of the AOD-CSHEI approach take place on NSL-KDD dataset has been demonstrated in Fig. 4. The figure reported that the SVM method has showcased worse outcomes with the maximum values of TRT and TST. In line with, the Adaboost model has obtained slightly reduced TRT and TST. Followed by, the LR, NB, and ANN models have accomplished somewhat decreased values of TRT and TST. Although the DT and LightGBM models have resulted in reasonable values of TRT and TST, the presented AOD-CSHEI technique has reached to effective outcome with the lower TRT and TST values of 3.04 min and 0.67 min respectively.
Fig. 5 demonstrates the ROC analysis of the AOD-CSHEI methodology on NSL-KDD dataset. The figure exposed that the AOD-CSHEI technique has reached enhanced outcome with the minimum ROC of 99.9714.
Tab. 2 offers a detailed comparative study of the AOD-CSHEI technique with existing techniques on the test UNSW-NB15 dataset. Fig. 6 provides the accuracy analysis of the AOD-CSHEI approach and existing methods on the training and testing set of UNSW-NB15 datasets. The results demonstrated that the NB system has exhibited ineffectual outcomes with the least values of accuracy. At the same time, the ANN, DT, and LR approaches have reached somewhat higher values of accuracy. Then, the Adaboost model has resulted in moderately increased accuracy values. Afterward, the LightGBM and SVM technique has reached reasonable accuracy values, the projected AOD-CSHEI technique has accomplished maximum training and testing accuracy of 0.8918 and 0.8852 correspondingly.
Next, the TRT and TST analysis of the AOD-CSHEI technique take place on UNSW-NB15 dataset is exhibited in Fig. 7. The figure obvious that the SVM algorithm has illustrated least outcome with the superior values of TRT and TST. Likewise, the Adaboost system has obtained slightly decreased TRT and TST. Followed by, the LR, NB, and ANN models have accomplished somewhat lower values of TRT and TST. But, the DT and LightGBM methodologies have resulted in reasonable values of TRT and TST, the presented AOD-CSHEI technique has reached to effectual outcome with the lower TRT and TST values of 0.54 min and 0.36 min correspondingly.
Fig. 8 showcases the Receiver operating characteristic (ROC) curve analysis of the AOD-CSHEI technique on UNSW-NB15 dataset. The figure exposed that the AOD-CSHEI approach has attained improved outcomes with the reduced ROC of 96.7291.
Tab. 3 gives a detailed comparative study of the AOD-CSHEI method with existing techniques on the test CICIDS2017 dataset. Fig. 9 offers the accuracy analysis of the AOD-CSHEI technique and existing techniques on the training and testing set of CICIDS2017 dataset. The outcomes demonstrated that the NB technique has revealed ineffectual outcomes with the least values of accuracy. Simultaneously, the ANN, DT, and LR models have obtained slightly increased values of accuracy. Similarly, the Adaboost approach has resulted in moderately enhanced accuracy values. Though the LightGBM and SVM techniques have reached reasonable accuracy values, the presented AOD-CSHEI technique has accomplished maximum training and testing accuracy of 0.9997 and 0.9991 correspondingly.
Afterward, the TRT and TST analysis of the AOD-CSHEI technique take place on CICIDS2017 dataset is depicted in Fig. 10. The figure revealed that the SVM model has showcased worse outcomes with the maximum values of TRT and TST. Along with that, the Adaboost system has obtained slightly reduced TRT and TST. After that, the LR, NB, and ANN models have accomplished somewhat decreased values of TRT and TST. Although the DT and LightGBM models have resulted in reasonable values of TRT and TST, the presented AOD-CSHEI methodology has gained to effective outcome with the lower TRT and TST values of 2.56 min and 0.14 min correspondingly.
Fig. 11 exhibits the ROC analysis of the AOD-CSHEI approach on CICIDS2017 dataset. The figure exposed that the AOD-CSHEI methodologies have attained improved outcome with the lower ROC of 99.9904. The above mentioned result analysis reported the supremacy of the AOD-CSHEI technique over the recent approaches.
This study has presented a new AOD-CSHEI technique to identify the presence of intrusions or attacks in the HEIs. The AOD-CSHEI technique performs different subprocesses namely pre-processing, ADASYN based outlier detection, DNN based classification, and SSA based hyperparameter tuning. In this work, the SSA with DNN model is used for the classification of data into the existence or absence of intrusions in the HEIs network and the SSA is utilized to effectually adjust the hyper parameters of the DNN model. In order to showcase the enhanced efficacy of the AOD-CSHEI technique, a set of simulations take place on three benchmark datasets and the results reported the enhanced efficiency of the AOD-CSHEI technique over its compared methods. Therefore, the AOD-CSHEI technique has been utilized as an effective tool for cybersecurity in HEIs. In the future, the AOD-CSHEI technique can be placed in the online learning process of HEIs.
Acknowledgement: The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number (IFPRC-154-611-2020) and King Abdulaziz University, DSR, Jeddah, Saudi Arabia.
Funding Statement: This project was supported financially by Institution Fund projects under grant no. (IFPRC-154-611-2020).
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
1. C. E. Bondoc and T. G. Malawit, “Cybersecurity for higher education institutions: Adopting regulatory framework,” Global Journal of Engineering and Technology Advance, vol. 2, no. 3, pp. 016–021, 2020. [Google Scholar]
2. J. B. Ulven and G. Wangen, “A systematic review of cybersecurity risks in higher education,” Future Internet, vol. 13, no. 2, pp. 39, 2021. [Google Scholar]
3. A. Aliyu, L. Maglaras, Y. He, I. Yevseyeva, E. Boiten et al., “A holistic cybersecurity maturity assessment framework for higher education institutions in the United Kingdom,” Applied Sciences, vol. 10, no. 10, pp. 3660, 2020. [Google Scholar]
4. E. Kim and R. Beuran, “On designing a cybersecurity educational program for higher education,” in Proc. of the 10th Int. Conf. on Education Technology and Computers -ICETC ‘18, Tokyo, Japan, pp. 195–200, 2018. [Google Scholar]
5. C.-W. Liu, P. Huang and H. C. Lucas, “Centralized IT decision making and cybersecurity breaches: Evidence from U.S. higher education institution,” Journal of Management Information Systems, vol. 37, no. 3, pp. 758–787, 2020, https://doi.org/10.2139/ssrn.2850178. [Google Scholar]
6. T. Crick, J. H. Davenport, A. Irons, S. Pearce and T. Prickett, “Maintaining the focus on cybersecurity in UK higher education,” ITNOW, vol. 61, no. 4, pp. 46–47, 2019. [Google Scholar]
7. M. Y. Alghamdi and Y. A. Younis, “The use of computer games for teaching and learning cybersecurity in higher education institutions,” Journal of Engineering Research, vol. 9, no. 3A, pp. 143–152, 2021. [Google Scholar]
8. F. B. Schneider, “Cybersecurity education in universities,” IEEE Security & Privacy, vol. 11, no. 4, pp. 3–4, 2013. [Google Scholar]
9. R. Vinayakumar, K. P. Soman and P. Poornachandran, “Evaluation of recurrent neural network and its variants for intrusion detection system (IDS),” International Journal of Information System Modeling and Design, vol. 8, no. 3, pp. 43–63, 2017. [Google Scholar]
10. N. Agarwal and S. Z. Hussain, “A closer look at intrusion detection system for web applications,” Security and Communication Networks, vol. 2018, pp. 1–27, 2018. [Google Scholar]
11. Y. Zhou, G. Cheng, S. Jiang and M. Dai, “Building an efficient intrusion detection system based on feature selection and ensemble classifier,” Computer Networks, vol. 174, pp. 107247, 2020. [Google Scholar]
12. I. Manzoor and N. Kumar, “A feature reduced intrusion detection system using ANN classifier,” Expert Systems with Applications, vol. 88, pp. 249–257, 2017. [Google Scholar]
13. D. Jin, Y. Lu, J. Qin, Z. Cheng and Z. Mao, “SwiftIDS: Real-time intrusion detection system based on LightGBM and parallel intrusion detection mechanism,” Computers & Security, vol. 97, pp. 101984, 2020. [Google Scholar]
14. X. Li, W. Chen, Q. Zhang and L. Wu, “Building auto-encoder intrusion detection system based on random forest feature selection,” Computers & Security, vol. 95, pp. 101851, 2020. [Google Scholar]
15. A. Alhudhaif, “A novel multi-class imbalanced EEG signals classification based on the adaptive synthetic sampling (ADASYN) approach,” PeerJ Computer Science, vol. 7, pp. e523, 2021. [Google Scholar]
16. A. Amin, S. Anwar A. Adnan, M. Nawaz, N. Howard et al., “Comparing oversampling techniques to handle the class imbalance problem: A customer churn prediction case study,” IEEE Access, vol. 4, pp. 7940–7957, 2016. [Google Scholar]
17. Z. Zhu, X. Cui, K. Zhang, B. Ai, B. Shi et al., “DNN-Based seabed classification using differently weighted MBES multifeatures,” Marine Geology, vol. 438, pp. 106519, 2021. [Google Scholar]
18. J. Xue and B. Shen, “A novel swarm intelligence optimization approach: Sparrow search algorithm,” Systems Science & Control Engineering, vol. 8, no. 1, pp. 22–34, 2020. [Google Scholar]
19. J. Adaikalaraj and T. Vengattaraman, “An efficient load scheduling technique using oppositional sparrow search algorithm for cloud computing environment,” Advances in Mathematics: Scientific Journal, vol. 10, no. 1, pp. 423–432, 2021. [Google Scholar]
20. J. Liu, Y. Gao and F. Hu, “A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM,” Computers & Security, vol. 106, pp. 102289, 2021. [Google Scholar]
This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |