In the global scenario one of the important goals for sustainable development in industrial field is innovate new technology, and invest in building infrastructure. All the developed and developing countries focus on building resilient infrastructure and promote sustainable developments by fostering innovation. At this juncture the cloud computing has become an important information and communication technologies model influencing sustainable development of the industries in the developing countries. As part of the innovations happening in the industrial sector, a new concept termed as ‘smart manufacturing’ has emerged, which employs the benefits of emerging technologies like internet of things and cloud computing. Cloud services deliver an on-demand access to computing, storage, and infrastructural platforms for the industrial users through Internet. In the recent era of information technology the number of business and individual users of cloud services have been increased and larger volumes of data is being processed and stored in it. As a consequence, the data breaches in the cloud services are also increasing day by day. Due to various security vulnerabilities in the cloud architecture; as a result the cloud environment has become non-resilient. To restore the normal behavior of the cloud, detect the deviations, and achieve higher resilience, anomaly detection becomes essential. The deep learning architectures-based anomaly detection mechanisms uses various monitoring metrics characterize the normal behavior of cloud services and identify the abnormal events. This paper focuses on designing an intelligent deep learning based approach for detecting cloud anomalies in real time to make it more resilient. The deep learning models are trained using features extracted from the system level and network level performance metrics observed in the Transfer Control Protocol (TCP) traces of the simulation. The experimental results of the proposed approach demonstrate a superior performance in terms of higher detection rate and lower false alarm rate when compared to the Support Vector Machine (SVM).
Cloud computing has been in the lime light for around two decades and it holds many features for improving business efficiencies, cost-benefiting, and advantages over conventional computing mechanisms. As per a recent survey by the international data group already 69% of the business firms are utilizing cloud services and 18% of the remaining has plans to implement cloud services at any of the point in their business operations in near future. At the same time a report from the Dell Inc., says that the firms who have adapted to modern big data computing, cloud services and security are earning 53% faster revenue than their competing firms. These reports exhibits that the business firms and the leaders are reaping the benefits of the cloud services in their business operations. They use this modern state of art cutting edge technologies to efficiently implement their operations, provide better service to their customers, and in parallel achieve high profit margins. Gartner predicts an exponential growth of cloud services industry by 2022. The worldwide public cloud services market is projected to grow 17.5 percent in 2019 to total $214.3 billion, up from $182.4 billion in 2018, according to Gartner, Inc. The world economy greatly relies on the manufacturing industries for employment and wealth creation. The cloud computing has become the optimal solution for industry to implement their automation processes by adapting to machine-to-machine translation. Also for storing and managing the ever increasing production and other data, use of cloud services become essential. The various services offered to the end users of cloud in manufacturing sector are presented in
The cloud computing environment faces a number of security challenges and most of them can be fixed up to a certain extent using current anomaly based Intrusion Detection Systems (IDS) [
Most of the earlier approaches for anomaly detection in cloud environment utilized Machine learning techniques. These techniques have the ability to improve their performance over time by updating its knowledge on the pattern observed in the input data. Whenever a new pattern is observed in the input data the machine learning model parameters are updated to detect the similar anomalies in the future traffic flow [
Cloud faces different types of security issues and challenges which affects the growth of cloud services utilization rate. The overall aim is to develop a resilient cloud services for manufacturing sector by implementing an intelligent anomaly detection system as an integrated service within the cloud environment. The objectives of the proposed research work are to identify the service misuse at the client as anomalies and classify the network traffic behavior into two categories as either Normal or Abnormal by deep learning model using network flow data. The efficiency of the proposed method is analyzed based on a dataset created in the cloud simulation. The dataset is comprised of a vector of features extracted from the simulated cloud network with Virtual Machine (VM) migration. The simulation consists of a traffic generator which synthesis both normal and network-based attacks of different categories and intensities. The performance metrics used for studying the efficiency of the anomaly detection in identifying network level attacks are precision, recall, accuracy, and F1-score. For effective implementation of the anomaly detection in the cloud environment, it is designed as a service that can be offered along with infrastructure to the clients. The elasticity of the cloud under various simulated network level attacks and anomaly detection as a service can be evaluated by including VM migration.
The proposed approach includes techniques for efficiently processing, analyzing, and evaluating real time data for detecting anomaly patterns. The challenge is to develop scalable, fault-tolerant anomaly detection method and embed it within a resilient framework for providing warnings promptly in case of adverse conditions during the VM migration. The proposed approach utilizes deep learning architectures to characterize the normal behavior in cloud environment and detect anomalies in the cloud network. The focus is extended to develop methods for capturing and analyzing real time network flow-data in the cloud environment. Further, optimal features are extracted from the captured data and, abnormal traffic patterns are discriminated from the normal traffic patterns using trained Auto Encoder.
In general the architecture of IDS is complex in nature which includes a variety of concepts, and techniques that change with respect to the environment. The working principle of IDS relies on two main approaches for detecting anomalies and these approaches differ by the analysis method and processing techniques. The first approach utilizes the signature of the abnormal traffics and the second approach tracks for a deviation in the normal traffic. The signature based detection approach is considered to be better in terms of lower false alarm rate but they suffer due to inability in detecting newer type of attacks. But the anomaly based detection methods are able to detect an attack for which no signature is available. This paper focus on the anomaly based detection techniques.
The efficiency of the detection process majorly depends on the quality of the features extracted or engineered from the session/flow in the network. Domain knowledge is essential for a better feature engineering and the deep the learning algorithms have the ability to learn features automatically. The algorithms work in end-to-end nature and at present gaining more importance in the IDS research. These algorithms can analyze the raw data, learns features, and classifies normal from abnormal traffic. Convolutional Neural Network (CNN) based detection algorithms have been proposed in literatures which uses Network Security Laboratory (NSL)-Knowledge Discovery in Databases (KDD) and University of South Wales (UNSW)-NB 15 datasets [
The sparse Auto-Encoder (AE) introduces inserts a sparsity constraint on the auto-encoder to enhance the ability of detecting new patterns [
Adversarial learning methodologies could increase the accuracy of detection in small or imbalanced dataset. One of the recent literatures has used GAN for data augmentation [
A deep learning framework was developed for anomaly detection in cloud workloads where the usage patterns are analyzed for identifying failures in the cloud because of contention for resources. The resource utilization and performance of the working system are reviewed at regular intervals to model the normal and abnormal behaviors in the cloud network. A hybrid deep neural architecture is used to forecast the near future resource utilization and performance measures of the cloud service in the first stage. In stage two of the analysis the hybrid model is used for classifying the cloud behavior as either normal or abnormal. The proposed anomalous detector is evaluated in the virtual environment using docker containers. The hybrid model is constructed by combining bidirectional long short term memory and long short term memory architectures [
Using the auto-scaling characteristics of the cloud an online malware detection approach was proposed in [
Automatic detection and classification of traffic pattern as anomalous is a challenging task that was handled using different approaches and techniques proposed in various literatures. Conventional machine learning techniques and algorithms are sub optimal and are not capable of extracting or uncovering patterns from a high dimensional data. They cannot capture the complex patterns in a high dimensional and voluminous data. This is the reason for engaging deep learning techniques and algorithms to detect anomalies in the cloud network. Based on the taxonomy presented in
In a recent work [
To train a deep learning model a large volume of training samples are required and these data are preprocessed to resolve the problem of high dimensionality using dimensionality reduction, clustering, and sampling techniques. Later these data are discriminated using a deep classifier. For constructing a deep classifier a training and test data set are essential and additionally a validation data is used to tune the model hyper parameters. Using the training dataset the model parameters are tuned and test data is used for evaluating the performance of the trained model [
In the cloud network the traffic originates from different heterogeneous sources and it varies immediately due to the elasticity of the services offered to the clients. The large volume of normal traffic and in contrast a low volume of abnormal traffic in the network poses certain challenges. Some of conventional IDS follow signature based approach in detecting the abnormal attacks. In contrast to signature based techniques, anomaly based techniques have been used in cloud computing at various levels which are highly capable of finding new attacks.
The inbound and outbound network traffics are continuously monitored by the IDS and upon further analysis it raises an alarm when there is anomaly being detected. Based on the detection approach used, the IDS can be classified as either signature based or anomaly based intrusion detection. The predefined rules are matched against the pattern extracted from the present network traffic data. Then they are classified as intrusion attack if they vary from the usual network traffic pattern. This approach yield high accuracy in detecting the known category of attacks and generates low false alarm rate. It does not have ability to detect the new category of attacks as the predefined rules don't match the pattern observed in the new attacks. The anomaly based intrusion detection has the ability to detect even new category of attacks. The accuracy of the anomaly based detection is more when compared to the signature based approach as per the theoretical proofs specified in the literatures. The main drawback of the anomaly based detection is high false alarm rate [
The analysis of network traffic provides more insights on the behavior on the performance of the cloud environment. Due to such revenue growth of the cloud service providers it becomes essential to develop cloud traffic monitoring and analysis methods to increase the availability, and security of the cloud environment. Monitoring and analyzing such a huge volume of network traffic data is a more challenging task. The traditional methods used for monitoring and analyzing network traffic will not suit for cloud environment. The cloud network pattern differs greatly from the patterns observed in a corporate distributed network.
Different type of anomaly detection techniques have been implemented so far in various literatures, but majority of them has not focused on the analysis of impact of elasticity of the cloud during VM migration. The proposed methodology focuses on evaluating the performance of the anomaly detection under such challenging conditions. The support for service migration and migration of VM to other physical nodes in the cloud network exploits the elastic property of the cloud for dynamic movement of the cloud resources. For effective management of cloud resources in online and perfect balancing of computing load across physical nodes of the cloud; this real time migration becomes essential. The anomaly detection module might consider the live migration as an anomaly (false positive) or some time an anomalous event occurring during the migration the detection may be masked (false negative).
The network packets are used as source of data for the detection of anomalies and hence the U2Land R2L attacks can be detected effectively. As the packet headers contain the IP addresses the attack source can be detected precisely. Analysis of information extracted from the packets can be done in real time. A single packet does not reveal much information about the context and hence detecting attacks like distributed DoS become difficult. The detection process includes parsing packets, and analysis of the packet payloads.
For efficient training of deep learning models a large volume of balanced anomaly dataset is essential and it is obtained from the traces collected during the simulation of the cloud environment. The traces of TCP streams obtained from the simulated cloud network include both genuine and attack traffic. The effects of VM migration will also be reflected in the traces. The incoming traffic to the physical node becomes different as the VM migrates to a different physical machine. Also the anomalies especially volume based attacks will be best characterized by the traces of the TCP streams as they consume more bandwidth from the normal traffic. The data for training the deep learning model is collected at various level of the cloud including the traffic level, hypervisor and physical machine level. A feature vector is extracted from each one second bin of the traffic traces and the system level performance metrics such as CPU utilization rate, memory utilization, network bandwidth consumption, number of I/O operations, and number of processes waiting for execution in the queue were considered. The network level performance metrics included in the feature vector are number of lost packets, volume of traffic in the network port, and overall load rate of the network.
The acquired dataset is highly imbalanced and an efficient model cannot be build using this dataset. Hence the dataset must be balanced before training the model. The aim is to use GAN architectures to discover the pattern in the data that makes them more realistic as uncovering patterns in the data is not possible using other methods. The generative networks help to balance the dataset by synthesizing new samples for minority classes. It is proved that GAN have shown impressive results when compared to other generative architectures including variation AE, and restricted boltzman machines. Synthesizing new samples is a complex task when the dimensionality of the data is high [
The flow of input data through layers in the GAN network is presented in
To overcome the vanishing gradient problem the following alternate cost function can be used:
In [
Rather than adding noise, a new cost function based on the Wasserstein distance was proposed in [
When the lipschitz function is derived from
The loss function can be formulated as to find the Wasserstein distance between
In order to maintain the
The anomalous traffic are detected based on the compact representation of the feature vectors obtained using a deep AE. The AE learns to map the given input data to a compact representation with two unsupervised training steps. The AE can be classified as a generative model and it has the ability to learn and extract the similarity and correlation in the input data [
The first part of the above equation makes the model sensitive to the given input data and the second terms prevent the model from overfitting on the training data. The trade-off between these two different objectives can be achieved by tuning the alpha scaling parameter. Based on the nature and characteristics of the given output the AE model can be considered as non-linear generalization of principal component analysis and the AE model has the ability to learn non-linear relationship between given input and expected output. The AE model helps to separate the normal data from the anomalous data by transforming the given input data on to new axes. The AE consists of two neural networks namely encoder and decoder. The encoder compresses the data points into a lower dimensional representation and the decoder network attempts to reconstruct the original input points from the latent representation generated by the encoder network. The AE parameters are tuned by minimizing the reconstruction error which is the difference between the input data points and reconstructed data points. The AE is trained in an unsupervised mode with features extracted from normal traffic data both under normal circumstances and VM migration. The AE will be capable of reconstructing the normal traffic data points and fails to reconstruct the anomalous traffic data points. The reconstruction error is used as anomalous score. The performance of the AE trained in an unsupervised learning strategy is compared with a binary classifier, SVM. The SVM model is trained with a Radial Basis Function (RBF) kernel function.
The non-linear kernel represents the similarity between two different vectors as a function of the squared norm of their distance. That is, if the two vectors are close together then,
The simulation of the cloud environment is accomplished in CloudSim 5.0 and the metricslisted in Section 3.1 are extracted from the normal network traffic. For creating the VM migration within the simulated cloud environment, initially the list of over utilized host is collected and the backup of over utilized VM to be migrated is done. Then they are mapped to a new suitable host using a migration map. The VM migration is considered to essential to test the resilience of the cloud services under different attacks when they are exposed to a variety of attacks. The following attacks were introduced in the simulated cloud network; Net Scan (NS), and DoS. The network traces and the metrics of the host machine are collected under multiple time instances of the simulation. Then required features are extracted from the traces and labeled as normal or abnormal respectively. Two set of feature set were extracted; one under VM migration and another without migration. Both during migration and normal period anomalous traffic were introduced in the cloud simulating the above mentioned attacks. The model was tested to detect network level attacks as more end users will be accessing the services of the cloud thus increasing the attack surface. More computational power will be allocated in the form of VM in on-demand basis. The anomalous traffic is generated by injecting the attacks into the legitimate traffic at irregular intervals. The VMs are migrated live among the nodes during the simulation may be during normal traffic period or anomalous traffic period. The network traces collected for every 1-second been using a packet analyzer script embedded within the simulated network. The harmonic mean
As described earlier the data augmentation task is implemented using WGAN to balance the volume of normal and anomalous traffic. The threshold value c, considered as one of the important hyperparameter is fixed based on Bayesian optimization technique. The
The weight clipping method acts as a weight regulator and it reduces the performance of the model thereby limiting the capacity of the model to learn complex function. Hence instead of gradient clipping, Gradient penalty technique was adopted in the experiments. The plot in
Gradient penalty was calculated based on the following procedure;
Estimate the gradient values with respect to the input data.
Create a combined data by weighing the synthetic and real data using epsilon and fusing them together. Compute the discriminator's output for the fused data. Calculate the gradient penalty for the estimated gradient.
Consider the magnitude the gradient Find the penalty
The performance of the AE and SVM based classification was analyzed using Receiver operating Characteristics (RoC) curves shown in
The results presented in
Anomalous traffic density | VM migration | Recall | Precision | Accuracy | F-score | G-mean |
---|---|---|---|---|---|---|
High | Yes | 0.9248 | 0.8795 | 1.00 | 0.9365 | 0.9378 |
No | 0.9895 | 0.9912 | 1.00 | 0.9923 | 0.9934 | |
Low | Yes | 0.8806 | 0.8216 | 0.9856 | 0.8950 | 0.8978 |
No | 0.9694 | 0.9585 | 0.9842 | 0.9612 | 0.9742 |
For analyzing the performance of the VMs deployed with certain workloads memory, CPU utilization, and cap values are tabulated in
The proposed hybrid approach does include a deep architecture for synthetic generation of samples and a conventional machine learning algorithm, SVM for detecting anomalies. The selection of important hyper-parameters and architecture of the deep network introduces certain computational complexity as more layers are used to construct the deep model. In general the linear SVM detects the anomaly with a complexity of
Cap | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9 | T10 | AVG |
---|---|---|---|---|---|---|---|---|---|---|---|
6.25 | 6.22 | 6.22 | 6.22 | 6.22 | 6.3 | 6.28 | 6.2 | 6.22 | 6.22 | 6.235 | |
12.5 | 12.55 | 12.57 | 12.35 | 12.57 | 12.47 | 12.43 | 12.6 | 12.45 | 12.53 | 12.502 | |
18.55 | 18.85 | 18.77 | 18.65 | 18.85 | 18.57 | 18.85 | 18.65 | 18.8 | 18.85 | 18.739 | |
24.82 | 24.82 | 24.82 | 24.85 | 24.55 | 24.85 | 24.85 | 24.85 | 24.82 | 24.82 | 24.805 | |
24.85 | 24.82 | 24.82 | 24.85 | 24.85 | 24.82 | 24.85 | 24.85 | 2.485 | 24.82 | 24.838 | |
24.85 | 24.85 | 24.85 | 24.85 | 24.85 | 24.85 | 24.85 | 25.82 | 24.85 | 25.82 | 24.85 | |
25.82 | 25.82 | 24.85 | 24.85 | 24.85 | 24.85 | 24.85 | 25.82 | 24.85 | 25.82 | 25.238 | |
24.82 | 24.85 | 24.85 | 24.82 | 24.85 | 24.85 | 24.852 | 24.85 | 24.85 | 25.82 | 24.941 |
This paper explored the effect of VM migration on the performance of the anomaly detection and proposed features and robust classification approach to manage the resilience of the cloud service to overcome security issues. Experiments were conducted by simulating the VM migration and varying the density of the anomalous traffic in the network. The deep learning based detection mechanism and the features extracted from the network traces helped to retain the resilience of the cloud environment. The reconstruction error of the AE model is used as the anomalous score to detect deviation in the network traffic patterns. This anomaly detection is tested in a simulation environment wherein the anomaly detection was executed in parallel with other events of the simulation. This work focused on detecting two attacks namely NS and DoS. Future work will focus on detecting more number of attacks by enhancing the features set used in this work with more optimal features. The results are found to be satisfactory in resolving the security concerns in cloud services for its application in manufacturing sector.
As a benchmarked dataset is not available to test the resilience of the cloud infrastructure, data samples from the simulated network have been generated, balanced using GAN network and classified as either anomalous or normal using an AE model. The trained model is able to detect anomalous traffic only in similar cloud environment simulated in the experiments. To overcome this limitation and develop a generic deep learning based anomaly detection system further data samples must be collected from a real time network, benchmarked and must be used for training the deep learning model.