[BACK]
Computer Systems Science & Engineering
DOI:10.32604/csse.2021.014384
images
Article

Anomaly Detection in ICS Datasets with Machine Learning Algorithms

Sinil Mubarak1, Mohamed Hadi Habaebi1,*, Md Rafiqul Islam1, Farah Diyana Abdul Rahman and Mohammad Tahir2

1International Islamic University Malaysia, Jalan Gombak, 53100, Malaysia
2Sunway University, Selangor, 47500, Malaysia
*Corresponding Author: Mohamed Hadi Habaebi. Email: habaebi@iium.edu.my
Received: 17 September 2020; Accepted: 14 December 2020

Abstract: An Intrusion Detection System (IDS) provides a front-line defense mechanism for the Industrial Control System (ICS) dedicated to keeping the process operations running continuously for 24 hours in a day and 7 days in a week. A well-known ICS is the Supervisory Control and Data Acquisition (SCADA) system. It supervises the physical process from sensor data and performs remote monitoring control and diagnostic functions in critical infrastructures. The ICS cyber threats are growing at an alarming rate on industrial automation applications. Detection techniques with machine learning algorithms on public datasets, suitable for intrusion detection of cyber-attacks in SCADA systems, as the first line of defense, have been detailed. The machine learning algorithms have been performed with labeled output for prediction classification. The activity traffic between ICS components is analyzed and packet inspection of the dataset is performed for the ICS network. The features of flow-based network traffic are extracted for behavior analysis with port-wise profiling based on the data baseline, and anomaly detection classification and prediction using machine learning algorithms are performed.

Keywords: Industrial control system; SCADA; intrusion detection system; machine learning; anomaly detection

1  Introduction

Control system is defined as the hardware and software component of an Industrial Automation and Control System (IACS). The key components of the industrial control system (ICS) include Supervisory Control and Data Acquisition (SCADA), Human Machine Interface (HMI), Programmable Logic Controllers (PLC), Remote Terminal Unit (RTU), and Distributed Control System (DCS). A SCADA system helps to collect data from field sensors that enable us to control the system through a human-machine interface (HMI) software.

Cybersecurity solutions for Information Technology (IT) are well established and secured but less work has been done on cybersecurity for operational technology (OT) (Sentryo, 2019). In the IT sector, the confidentiality of information has the highest priority, whereas, in OT-ICS, the highest priority is for the availability of information. The OT systems lack the cybersecurity culture and with increased digitization, more cyber-attacks surface. IT risks are sources of fraud, financial losses, privacy, and data leaks, wherein OT risks are sources of health, safety, and environmental casualties. At present, OT networks have an inconsistent deployment of security policies and standards wherein IT networks have strong security policies [1]. Applications and protocols in the OT domain are customized in SCADA, HMI, and DCS, whereas for the IT domain it is already standardized in email, internet, video, etc.

Intrusion detection systems have proved to be a reliable security process for anomaly detection in traditional IT, which identifies all inbound and outbound network traffic for security breach and check the traffic for matching signatures. Then, it signals an alarm when the matching is not found. Network-based IDS (NIDS) scans entire networks and detects malicious traffic activity, whereas Host-based IDS (HIDS) scans for a specific host and monitors each system event.

Intrusion detection systems can work conjointly with IT security systems, but unfortunately, IT systems do not meet the industrial requirements. However, the ICS cyber threats are growing at an alarming rate on industrial automation applications. The continuity of services with the safe operation is of great importance since many ICSs are in a position where a failure can result in a threat to human lives, environmental safety, or production output.

Some of the main challenges faced by OT ICS are [2] the lack of asset visibility for brownfield control systems, ongoing modifications, and upgradations in process plants. Multiple Original Equipment Manufacturers (OEM) in single plant operation are using different communication protocols. ICS vendors are not familiar with IT cybersecurity protocols or technology, and they do not have hands-on experience with ICS devices due to a shortage of experienced cybersecurity personnel.

Furthermore, many universities have difficulties to build their own OT ICS Cyber Range lab facilities dedicated to industrial use-case scenarios to carry out the research activities due to financial constraints. Currently, many researchers utilize publicly available ICS datasets for analysis of detection techniques with machine learning algorithms, as the industrial entities are reluctant to disclose the operational datasets to the public due to the sensitivity and criticality of industrial assets.

In recent years, cyber-attacks on industrial control systems had been increased many-fold due to the digitization of the industrial sector. The prime examples of notable recent industrial control system cyber-attack incidents include- Stuxnet attack on Iran nuclear facility, the Duqu & Flame attack on Iran offshore facility, the Havex remote access trojan, the Shamoon attack on Saudi Aramco, the Petya Ransomware attack in India, and the Triton-Triconex Safety Instrumented System attack on Saudi Aramco [2].

Little research had been carried to identify the advantages of using machine learning in ICS SCADA systems with real network traffic data testbed simulation and its behavior analysis for anomaly detection. The architecture of a typical modern SCADA reference model is shown in Fig. 1 consists of the following layers [3].

images

Figure 1: Typical SCADA reference model [3]

The root causes of cyber vulnerabilities in ICS SCADA systems are due to poorly secured legacy systems, delayed patch updates of software vulnerabilities, lack of cyber-security situational awareness, remote access for maintenance, large deployment areas, distributed operating mode, growing interconnectivity, and lack of built-in security with SCADA protocols.

The contribution of this paper is to highlight the machine learning techniques, for attack detection with SCADA public dataset and introduce innovative data profiling with flow-based behavior analysis using packet inspection of network traffic data. The dataset is processed and profiled for modeling for the abnormal prediction detection with the anomaly-based machine learning algorithms for intrusion detection in ICS systems.

The rest of the paper is subdivided as follows. Section 2 deals with the literature review of Machine learning techniques for ICS SCADA intrusion detection systems and different types of public SCADA datasets. It also explicates the ML-based SCADA IDS steps and the performance metrics criteria for the evaluation of algorithms. Section 3 provides ML analysis of the public dataset and its performance evaluation comparison along with validation. Section 4 illustrates the network traffic analysis by data profiling and the baseline is determined by the traffic flow activities of the network with packet inspection. The feature extracted processed dataset is modeled to predict the abnormality classification in the network traffic data with the comparison of different machine learning algorithms and the paper concludes with future works for behavior analysis with multiple port-based protocol analysis and multiple anomaly criteria with hybrid machine learning algorithms. The paper concludes in section 5.

2  Related Work

This section discusses the Machine learning applications which are predominantly deployed across various industries and their applications due to their computing power, data collection, and storage capabilities. An intrusion detection system (IDS) integrated with machine learning (supervised and unsupervised techniques) can improve the detection rates of attacks for SCADA systems [4].

Machine learning algorithms are widely implemented in the intrusion detection system (IDS) to overcome the high false positives issue in prediction. Different machine learning techniques- such as supervised and unsupervised, which uses statistical techniques to learn, classify and predict the outcome methods, can be analyzed as mentioned in Tabs. 1 and 2 [5].

Table 1: Machine learning—Supervised methods

images

Table 2: Machine learning—Unsupervised methods

images

In supervised methods, the pre-labeled dataset feature is required (classification/regression) whereas unsupervised methods do not need pre-labeled data (dimensional reduction, clustering) for analysis. Clustering is mainly applied for forensic analysis, regression for network packet parameters prediction, and comparison with the normal ones, whereas classification is applied to identify different classes of network attacks such as scanning and spoofing [6]. An anomaly detection method for deception attacks in the industrial control system is introduced by investigating the behavior of normal, attack-free activities [7]. Different existing intrusion detection systems using artificial neural networks (ANN) for detecting malicious network activity for different datasets have been reviewed [8]. A novel dataset focused on IoT combined network, power features and attacks utilizing WEKA application to train, test, and cross-validate the dataset for classification of detection with Naive Bayes (NB), support vector machines (SVMs), multilayer perceptron (MLP), Random Forest (RF), ZeroR ML classifiers has been introduced [9].

2.1 Public SCADA Datasets

A framework for security testbed for Modbus/TCP-based has been introduced in [9] for SCADA security evaluation and testing environment. The analysis of machine learning algorithms for SCADA systems can be performed with industrial public datasets, mathematical modeling of the system, and using ICS cyber test kit with OT network traffic simulation. The building of detection models using SCADA data is performed using manual definition- which is time-consuming and expensive or using the machine learning strategies that automatically build a detection model based on the training data set [10].

In [11], SCADA dataset features were extracted from captured network traffic and the performance of different supervised ML algorithms such as Logistic Regression (LR), Random Forest (RF), Decision Tree (DT), Naive Bayes (NB), K-Nearest Neighbor (KNN) compared. In [12], the multi-class power system datasets include 37 scenarios of both normal and attack instances, and the three ML techniques of KNN, NB, RF methods were analyzed.

In [13], a testbed was designed for supervised ML approach for anomaly detection of energy monitoring-based water supply system, and the three different datasets obtained from the testbed were analyzed with Random forest, KNN, and SVM algorithms. A real-time dataset which includes normal traffic along with 35 types of cyber-attacks, is utilized to train and test the ML classifiers intrusion detection system but shows a high false positive rate for the algorithms [14].

To overcome the present IDS drawbacks, the implementation of Machine learning techniques with IDS integrated along with real operational technology traffic data has become a vital and innovative concept. Furthermore, a cyber-physical ICS testbed can provide a hands-on simulation platform with real-time network data with various types of cyber-attacks for security evaluation and testing environment for research purposes.

2.2 Machine Learning Algorithms–ICS SCADA

Machine Learning (ML) based intrusion detection in SCADA systems follow the steps below as represented in Fig. 2 [15].

images

Figure 2: ML-based SCADA IDS steps

In the data cleaning and mining stage, missing values in the SCADA dataset are corrected, split randomly into training and test sets for better results. Then data normalization is followed where the improper features are replaced with mean and normalized values. Then the dataset features are extracted, and the model is built based on the machine learning algorithms to detect anomalies in the dataset. Finally, the IDS performance can be analyzed with parameters, such as Accuracy, precision, sensitivity (recall), receiver operating curve (ROC), F-score, etc.

The common IDS, like Snort and Bro, alerts can be classified as mentioned in Tab. 3, with True Negative (TN) for no attack-no alert, False Positive (FP) for no attack-alert, False Negative (FN) for attack-no alert, and True Positive (TP) for attack-alert.

Table 3: IDS evaluation matrix

images

3  Methodology and Analysis with Public Datasets

This section describes machine learning analysis of public datasets. SCADA datasets with attack vectors are used for the evaluation of different machine learning algorithms’ performances. Most of the datasets such as KDD 1999, DARPA, Gao’s dataset is outdated and are associated with information technology systems, which are also unsuitable for SCADA IDS research. An improved Cyber-physical SCADA dataset from Mississippi state university’s in-house SCADA lab which contains both normal and attacks traffic is used to evaluate the ML algorithms performance for SCADA IDS. The dataset contains network traffic data with 274,628 instances having normal activity along with 35 cyber-attacks class subtypes of data flow.

Each instance of Modbus RTU packets contains 20 features with Man in the middle (MITM) attacks, 214580 normal instances (78%), and 60048 (22%) attack instances are represented in below Tab. 4.

Table 4: Modbus RTU packets instance features

images

The dataset is randomly split into 80% training sets for modeling and the rest 20% test sets for ML algorithm evaluation. The dataset includes 274628 instances with a training set of 219,702 and a test set with 54,926 observation instances.

3.1 Machine Learning with R-Studio

The binary classification is evaluated with the programming tool R Studio, which is an open-source environment for statistical computing for data analysis. The dataset (in .csv format) fetched by the R-studio program, is corrected and split into training and test sets. The data features are normalized and modeled with a training set for ML algorithms. The supervised methods (logistic regression/KNN) are used to model and train the dataset and binary classification of pre-labeled output label feature number#18 as shown in Fig. 3, is predicted and compared with the test dataset and its performance is evaluated with the confusion matrix.

images

Figure 3: Modbus RTU packets instance [16]

Logistic regression function is used to model with the training set and probability for binary attribute classification is predicted on the test set which detects the normal/attack status. The logistic regression (LR) confusion matrix evaluation is performed with the R-studio platform for the test dataset and provides a prediction accuracy of 99.99%, for total observations of 54926 instances, as mentioned in below Tab. 5.

Table 5: Logistic regression confusion matrix evaluation

images

The data analysis with the KNN ML technique mentioned in Tab. 6, shows an accuracy parameter of 83.72% for the total observations of 7500 instances.

Table 6: KNN confusion matrix evaluation

images

3.2 Machine Learning Algorithms-WEKA Platform

This section describes the machine learning analysis of public datasets. The WEKA platform has pre-processing tools called filters for attribute selection, normalization purposes, and classifier models for predicting nominal or numeric quantities, such as support vector machines, logistic regression, BayesNet, decision trees–J48 method, Meta-classifiers: bagging, boosting, and voting stacking algorithms. The dataset in ARFF format is fetched by the application, which is then pre-processed. And relevant features are filtered based on a ranking method for training the dataset with base learner Decision tree-J48 classifier ML algorithm. The classification performance is evaluated with another BayesNet ML classifier. The hybrid (J48 and BayesNet) ML classifier is applied with the base learner–J48 decision tree and meta classifier–BayesNetwork to obtain the best prediction capabilities. Once the dataset is loaded with all features, pre-processed, and the feature extraction method is built, the dataset attributes are ranked based on the information gain parameter. Then the decision tree-J48 classifier is used to train the feature filtered dataset, as the base learner, while the BayesNet classifier is used for the hybrid model for classification performance evaluation as shown in Fig. 4.

images

Figure 4: Hybrid ML algorithm classifier

The following five feature attributes are extracted, filtered, and ranked based on the information gain ranking attribute selection criteria method, as mentioned in Tab. 7.

Table 7: Feature extraction

images

The ML performance metrics of hybrid classifier algorithm with instances–25000 and 274628, 5-fold cross-validation, for the five attributes are evaluated in Tab. 8.

Table 8: Hybrid classifier algorithm for 25000 instances

images

The ML classification performance of different algorithms has been compared in Tab. 9.

Table 9: ML classification algorithms performance

images

The machine learning algorithms performances evaluated are benchmarked in Tab. 10, with the following results on SCADA and KDD datasets.

Table 10: Benchmark ML performance on datasets

images

4  Methodology and Results with ICS Network Dataset

The goal of this section is to analyze the network traffic data which is encapsulated in network packets as a .pcap file format. The .pcap file is taken from Wireshark and converted into .CSV file with the Spyder platform for machine learning prediction analysis.

The Wireshark is used to capture, analyze signals, and data traffic over the communication channel. Such a channel varies from a local computer bus to a satellite link, that provides a means of communication using a standard communication protocol (networked or point-to-point). The network traffic data (pcap file) is used for prediction analysis and convert the .pcap file to .csv format for ease of use and analysis.

The activity between components is a set of traffic between two components/devices. The traffic dataset is imported with Spyder python and the initial observation is as in Tab. 11.

Table 11: Dataset with column description

images

The dataset has 86799 instances with 06 data columns, without any null values and ‘NA’ character values, as highlighted in Fig. 5.

images

Figure 5: Dataset with data column and datatype

4.1 Profiling of Network Traffic Data

The packet inspection of network data traffic is performed with both pre-processing and post-processing techniques. The traffic flow-based intrusion detection serves as an anomaly-based intrusion detection system where the baseline is determined by the flow of the network. Pre-processing of network traffic data is performed with the Spyder platform [17]. The Scientific Python Development Environment (Spyder) is an integrated development environment (IDE) that has libraries: such as Regular expression to filter and remove the unwanted expressions: =, [ ] < > from the dataset.

The traffic flows are a port-wise set of packets and have different protocols such as TCP, UDP, ICMP has different flow properties. The Transmission Control Protocol (TCP) operates at transport layer of OSI layer for services/applications with dedicated destination port: HTTP (80), FTP (20, 21), Telnet (23), SMTP (25), DNS (53), HTTPS (443), Modbus (502), ISO-on-TCP–Siemens S7 Communication (S7comm) Protocol (RFC 1006)—(102), whereas the User Datagram Protocol (UDP) includes: SNMP (161), Syslog (514) [18].

The pre-processing of data is based on the communication protocols which is “TCP” and is assigned to discrete output value: 1, for further analysis and classification. The dataset is post-processed with a column named Info, which is then further extracted and assigned to each separate column for prediction and classification analysis. This profiling is done with functions with the Spyder python platform.

The feature extraction of the dataset is obtained by splitting the Info column of network traffic data which is a critical part of data analysis and each of the relevant features from Info columns such as source port, destination port, Ack, Seq, Len Packet, Window is extracted by filtering unwanted characters, which is vital for training and testing is shown in Tab. 12.

Table 12: Post-processed dataset with feature extraction

images

The behavior analysis of traffic data is performed with packet inspection, the data flow is analyzed, and classification output is identified based on the column Info data baseline, as shown in Fig. 6. In normal scenario result, an output of ‘1’ is assigned for classification, whereas in anomaly scenario result, an output of ‘0’ is assigned based on keywords “Dup ACK,” “Previous segment not captured.”

[TCP Dup ACK 400#3] 43624 > 80 [ACK] Seq=1 Ack=310012 Win=65535 Len=0

TCP Dup ACK 86798#1 32586 502 ACK Seq=1141 Ack=2186 Win=64768 Len=0

images

Figure 6: Profiled classification result

The output has 44814 counts for result value ‘0’ where-as the result value ‘1’ has 41220 counts. The network traffic dataset having labels and relevant features are trained and modeled with different machine learning algorithms and result classification is predicted with machine learning analysis.

4.2 Results of Anomaly Prediction with Machine Learning Algorithms

The dataset which is processed for model evaluation as mentioned in Fig. 7, is split into 80% train data and 20% test data and the training data is utilized to model with independent variables for different Machine Learning Algorithms (MLA) and accuracy classification is predicted for test and train data with confusion matrix parameters, mentioned in Fig. 8.

images

Figure 7: A processed dataset for model evaluation

images

Figure 8: MLA train and test dataset accuracies

The logistic regression model is evaluated, and the predicted binary outputs are represented as a probability function that is converted to discrete ‘0’ and ‘1’. The training accuracy is at 65.23% where-as maximum test accuracy is at 65.15%.

K-Nearest Neighbor (KNN) model is applied with nearest neighbors algorithm and the K-value which is the threshold point at which the performance of train/test accuracy start to dip or decrease is determined for train and test dataset. K-value is the odd increment value. Each instance and the training dataset has a K-value of 9 for the 5th element while the test dataset has a K-value of 7 for the 4th element as identified in below Fig. 9. The training accuracy is at 71.43% where-as maximum test accuracy is at 69.75%.

images

Figure 9: KNN K-value for train and test dataset

The Naïve Bayes model has two options for independent variables. The Gaussian method has more accuracy for continuous variables, whereas the Multinomial method has higher accuracy for categorical discrete independent variables. The training accuracy is at 61.69% where-as maximum test accuracy is at 61.30%.

The decision tree model has two options—the entropy method for information gain where the root node is identified, and the other option is the Gini method for the impurity measurement. The Decision tree with entropy method exhibits the highest train accuracy parameter with 96.18 whereas test accuracy Random Forest model averages the multiple decision trees and provides better accuracy. The training accuracy is at 61.69% where-as maximum test accuracy is at 61.30%.

Artificial Neural Network (ANN) is a hybrid model with the black box technique, where each layer can have 100 networks. The training accuracy is at 65.94% where-as maximum test accuracy is at 65.45%. Support Vector Machine (SVM) uses a hyperplane (linear boundary) method and has different kernel types-rbf, poly, sigmoid which can reduce overfitting.

The comparison of machine learning algorithms performance is mentioned in Tab. 13.

Table 13: MLA train and test dataset accuracy performances

images

5  Conclusion and Future Works

The signature-based detection for ICS OT cyber-attacks using R-studio and WEKA platform utilizing public datasets has been analyzed. It is noted that the public datasets are not accurate and do not suit industrial use-case scenarios. An innovative behavior analysis of network traffic in ICS with a baseline model is performed with the Spyder python platform. The flow-based network traffic data is profiled with single communication protocol-based behavior analysis for the normal scenario of the ICS network traffic with packet inspection and classification is predicted with different machine learning algorithms for anomaly detection. The Cyber Security Management Systems (CSMS) provide well-established methods with high accuracy for protecting the control system assets from cyber-attacks which includes the development of the basic cybersecurity policies, and its compliance with ISA/IEC 62443 standards.

In future work, the real-time ICS network traffic data will be extracted and a completely generic anomaly detection system without the need for prior knowledge of variables will be proposed to be developed, as in [19,20]. The Packet Capture (PCAP) files for real-time network data analysis can be performed with industrial sensors to obtain the relevant metadata from the OT network. The behavior analysis with multiple port-based protocol analysis and multiple anomaly criteria with hybrid machine learning algorithms using real-time industrial control system integrating cyber-attack test cases with portable ICS cyber kit will be implemented in future works. Advanced cyber-attacks such as reconnaissance, interruption (DoS), interception (MITM), firmware analysis can also be simulated with penetration test tools.

Funding Statement: This work was conducted at the IoT and wireless communication protocols laboratory, International Islamic University Malaysia and is partially sponsored by the Publication-Research initiative grant scheme no. P-RIGS18-003-0003.

Conflicts of Interest: The authors of this article declare no conflict of interest.

References

 1.  ISA. (2013). “Security for industrial automation and control systems, Part 3-3: System Security Requirements and Security Levels,”. [Google Scholar]

 2.  D. McMillen. (2016). “Security attacks on industrial control systems,” . [Online]. Available: https://securityintelligence.com/attacks-targeting-industrial-control-systems-ics-up-110-percent/. [Google Scholar]

 3.  L. Van. (2015). “Sequential detection and isolation of cyber-physical attacks on SCADA systems,” Ph.D. Thesis. University of Technology of Troyes. [Google Scholar]

 4.  M. Keshk, N. Moustafa, E. Sitnikova and G. Creech. (2017). “Privacy preservation intrusion detection technique for SCADA systems,” in Military Communications and Information Systems Conf. (MilCIS). Canberra, Australia, 1–6, pp. [Google Scholar]

 5.  L. A. Maglaras and J. Jiang. (2018). “Intrusion detection in SCADA systems using machine learning techniques,” Ph.D. Thesis. University of Huddersfield, UK. [Google Scholar]

 6.  Q. Qassim. (2019). “An anomaly detection technique for deception attacks in industrial control systems,” in IEEE 5th Intl. Conf. on Big Data Security on Cloud. Washington, DC, USA, 267–272. [Google Scholar]

 7.  R. L. Perez, F. Adamsky, S. Ridha and E. Thomas. (2018). “Machine learning for reliable network attack detection in scada systems,” in 17th IEEE Int. Conf. on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom-18). New York, USA. [Google Scholar]

 8.  I. Solomon, A. Jatain and B. Shalini. (2019). “Neural network-based intrusion detection: State of the art.” India: International Conf. on Sustainable Computing in Science, Technology and Management (SUSCOM-2019Amity University Rajasthan. [Google Scholar]

 9.  J. Foley, N. Moradpoor and H. Ochen, “Employing a machine learning approach to detect combined internet of things attacks against two objective functions using a novel dataset,” Security and Communication Networks, vol. 2020, no. 2, pp. 1–17, 2020. [Google Scholar]

10. A. Almalawi, X. Yu, Z. Tari, A. Fahad and I. Khalil, “An unsupervised anomaly-based detection approach for integrity attacks on scada systems,” Computers & Security, vol. 46, pp. 94–110, 2014. [Google Scholar]

11. N. Moradpoor and A. Hall. (2020). “Insider threat detection using supervised machine learning algorithms on an extremely imbalanced dataset,” International Journal of Cyber Warfare and Terrorism, vol. 10, no. 2, pp. 1–26. [Google Scholar]

12. M. A. Teixeira, T. Salman, M. Zolanvari, R. Jain, N. Meskin et al., “SCADA system testbed for cybersecurity research using machine learning approach,” Future Internet, vol. 10, no. 76, pp. 1–15, 2018. [Google Scholar]

13. A. Robles-Durazno, N. Moradpoor, J. McWhinnie and G. Russell. (2018). “A supervised energy monitoring-based machine learning approach for anomaly detection in a clean water supply system,” in Proceedings of the IEEE Int. Conf. on Cyber Security and Protection of Digital Services, Glasgow, UK. [Google Scholar]

14. I. Turnipseed. (2015). “A new SCADA dataset for intrusion detection research,” Ph.D. Thesis. Mississippi State University, USA. [Google Scholar]

15. A. Polyakov. (2018). “Machine learning for cybersecurity 101, Dzone, AI Zone, . [Online]. Available: https://dzone.com/articles/machine-learning-for-cybersecurity-101. [Google Scholar]

16. R. Vinayakumar. (2019). “Deep learning approach for intelligent intrusion detection system,” IEEE Access, vol. 7, pp. 41525–41550. [Google Scholar]

17. P. Singh. (2018). “Machine learning with PySpark,” Apress, Springer Nature Publishing Co., NY, USA. [Google Scholar]

18. A. Almehmadi. (2018). “SCADA networks anomaly-based intrusion detection system,” in SIN’18: Proceedings of the 11th Int. Conf. on Security of Information and Networks, Cardiff, UK, pp. 1–4. [Google Scholar]

19. R. Malaiya, D. Kwon, C. Suh, H. Kim and J. Kim. (2019). “An empirical evaluation of deep learning for network anomaly detection,” IEEE Access, vol. 7, pp. 140806–140817. [Google Scholar]

20. S. D. Anton, L. Ahrens, D. Fraunholz and H. D. Schotten. (2018). “Time is of the essence: Machine learning-based intrusion detection in industrial time series data,” IEEE Int. Conf. on Data Mining Workshops (ICDMWpp. 1–6. [Google Scholar]

images This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.