Wireless sensor networks (WSNs) are considered promising for applications such as military surveillance and healthcare. The security of these networks must be ensured in order to have reliable applications. Securing such networks requires more attention, as they typically implement no dedicated security appliance. In addition, the sensors have limited computing resources and power and storage, which makes WSNs vulnerable to various attacks, especially denial of service (DoS). The main types of DoS attacks against WSNs are blackhole, grayhole, flooding, and scheduling. There are two primary techniques to build an intrusion detection system (IDS): signature-based and data-driven-based. This study uses the data-driven approach since the signature-based method fails to detect a zero-day attack. Several publications have proposed data-driven approaches to protect WSNs against such attacks. These approaches are based on either the traditional machine learning (ML) method or a deep learning model. The fundamental limitations of these methods include the use of raw features to build an intrusion detection model, which can result in low detection accuracy. This study implements entity embedding to transform the raw features to a more robust representation that can enable more precise detection and demonstrates how the proposed method can outperform state-of-the-art solutions in terms of recognition accuracy.
With the variety of applications of wireless sensor networks (WSNs) in areas such as environmental observation, smart cities, military surveillance, and healthcare [
Deep learning often outperforms traditional machine learning on unstructured data problems such as image and audio recognition and natural language processing, and is increasingly regarded as the ideal solution for such data. This has led to the question of whether these methods can be similarly successful with structured (tabular) data, such as is used in this study. Structured data are organized in a tabular format whose columns and rows represent features and data points, respectively. Current state-of-the-art solutions for structured data are ensemble methods such as random forest [
The present study proposes an end-to-end ANN architecture with entity embedding that outperforms recent state-of-the-art intrusion detection models for WSNs on a public WSN-DS dataset [
The remainder of this paper is organized as follows. Section 2 examines related studies. Section 3 explains materials and methods used in the proposed solution. Section 4 discusses experiments using the proposed model on the WSN-DS dataset. Section 5 presents conclusions and suggested future research.
Applications of WSNs include recognizing climate changes, monitoring habitats and environments, and other surveillance applications [
IDSs use two main approaches. Rule- or signature-based IDS [
Some work has been evaluated on the WSN-DS dataset [
We used entity embedding to develop a more accurate classification, and experimentally demonstrated how the proposed method can outperform state-of-the-art solutions.
This section has six subsections. Subsection 3.1 describes the dataset. Subsection 3.2 explains the artificial neural network (ANN) model primarily used in this work. Subsection 3.3 examines the entity embedding of categorical variables [
We used the WSN-DS simulated dataset [
We explore the ANN, the main model used in this paper. The multilayer perceptron (MLP) is the primary ANN architecture [
The goal of backpropagation is to fine-tune a network’s weights through a sequence of forward and backward iterations to minimize the cost function until the error stops improving or the maximum number of epochs is reached. The algorithm first initializes the weights of the network using a statistical initializer.
Seq.# | Feature | # of levels | Description | |
---|---|---|---|---|
1 | Time | 196 | Present time for current sensor node | |
2 | Is-CH | 2 | Set 1 if the node is cluster head, and 0 otherwise | |
3 | Who-CH | 7088 | Cluster head ID in present round | |
4 | Dist-To-CH | 13956 | Distance between each sensor node and its cluster head in current round | |
5 | Consumed Energy | 69352 | Consumed energy by each sensor in prior round | |
6 | ADV-S | 85 | # of advertise broadcast messages sent through cluster head | |
7 | ADV-R | 31 | # of advertise messages received by cluster head | |
8 | JOIN-S | 2 | # of joint request messages sent by nodes to cluster head | |
9 | JOIN-S | 101 | # of joint request messages received in cluster head node by its belonging sensor nodes | |
10 | SCH-S | 95 | # of advertise broadcast messages sent to sensor nodes during time division multiple access (TDMA) schedule | |
11 | SCH-R | 2 | # of advertise broadcast messages received in sensor nodes by cluster head for TDMA schedule | |
12 | Rank | 100 | Order of sensor during TDMA schedule | |
13 | Data-S | 192 | # of data packets sent by each sensor node to its cluster head | |
14 | Data-R | 1345 | # of data packets received by node from its cluster head | |
15 | Data-Sent-BS | 237 | # of data packets sent from sensor node to base station | |
16 | Dist-CH-BS | 305 | Distance between cluster head and base station | |
17 | Send-code | 16 | Send code for cluster | |
18 | Attack-Type | 5 | Label or traffic class (normal, DoS, blackhole, grayhole, flooding, scheduling) |
A forward step finds the network output for randomly selected training examples
where
After applying the forward step, we get
Then the gradient of error is calculated by computing the partial derivative of error with respect to each weight.
The second step is backward, and calculation is performed from the output layer toward the first hidden layer. The gradient and
Entity embedding of categorical variables [
As an example, assume that entity embedding is applied over a day of the week as a categorical variable of the neural network model. This leads to the development of a
For the above example, the one-hot encoding input layer includes seven neurons that indicate the number of levels of the categorical variable, and the embedding layer has four neurons, corresponding to the desired mapping dimension, as shown in
In this work, a multi-layer ANN-based architecture is proposed for a WSN intrusion detection system, as shown in
As most input features for WSN intrusion detection have a limited number of levels, as shown in
Seq.# | Notation | Description | |
---|---|---|---|
1 | # of categorical input features (14). | ||
2 | m | Size of continuous embedding vector for categorical features, equal to 10 in proposed model | |
3 | |||
4 | |||
5 | Obtained by concatenating the learned embedding vectors of all features plus the three binary features (with two levels only). |
||
6 | One-hot Encoding | Produces |
|
7 | Embedding matrix | Weights connecting one-hot encoding and embedding vector. This matrix is learned during training. | |
8 | || | Concatenation between neurons of all embedding vectors and other binary features. | |
9 | W | Weights connecting layers in classification phase to learn optimized classification parameters. |
As most input features for WSN intrusion detection have a limited number of levels, as shown in
Then, in the first layer, each categorical input feature is transformed to a one-hot representation. The embedding layer of size 10 is added on top of it. This layer is used to learn the embedding matrix that maps each input feature to a 10-dimensional continuous vector. This layer helps place values with similar effects near each other in the vector space, thus exposing the fundamental continuity of the data to ensure a robust representation of the input features.
Next, all learned embedding vectors and the binary features (which only include two levels) that are not required for learning its embedding are concatenated together.
Then, two fully-connected layers (256–256 neurons) are added on top of the concatenated layer and used to learn the classification weights. A softmax layer of five neurons is added on top of these layers, which helps compute the probability distribution over the attack labels. Intuition regarding the selected values of the hyperparameters (# of layers, # of neurons in each layer, embedding size, learning rate, and activation function) is related in Subsection 3.6.
Finally, the training data are fed into the network, whose parameters (weights) are optimized using an adjusted version of stochastic gradient descent (Adam) and a backpropagation algorithm. Algorithm 2 presents the steps used to train the proposed architecture.
Different measures are available to compute evaluation scores for the classification problem. For intrusion detection systems, significant metrics include detection rate, recall, true positive rate (TPR), precision, false alarm rate/false positive rate (FPR), and F-score, defined as follows:
where TP, FP, FN, and TN are the numbers of true positives, false positives, false negatives, and true negatives, respectively.
In terms of a multi-class classification problem, the evaluation measures for all classes can be averaged as either a micro- or macro-average to obtain the general performance of a model. A micro-average is dominated by the majority class, as it treats each data point (instance) equally, whereas a macro-average gives equal weight to each class. We study an intrusion detection dataset with significantly more data points of a normal class than of abnormal classes. We use macro-averaging to prevent the majority (normal) class from dominating the performance of the model. To accelerate decision-making during model optimization and compare the proposed model to state-of-the-art methods, we use a macro-average F-score. In addition, a micro-average F-score is calculated to compare certain related work that uses a micro-average as a metric. Metrics such as recall, precision, and false-alarm rate are also used to obtain detailed information about the model’s performance.
The proposed ANN architecture has several hyperparameters that must be selected carefully, as they impact the model’s performance. A grid search technique is implemented to select the optimal combination of hyperparameters. We adjust the embedding size, number of hidden layers, and number of neurons in each layer, activation function, and learning rate.
The best performance was achieved with the ReLU activation function, which is computationally more efficient, as it uses only a threshold as an input value.
Subsection 4.1 presents the experimental settings used in this study. Subsection 4.2 shows the effectiveness of the proposed feature representation using entity embedding. Subsection 4.3 compares the proposed model to state-of-the-art solutions.
To ensure precision in comparisons, settings for comparative methods were those of the work in which they were proposed. The holdout technique was used to split the dataset into 60% training and 40% testing, with stratified sampling to ensure a consistent ratio of classes.
Hyperparameter | Value |
---|---|
Embedding size | 10 |
Learning rate | 0.001 |
Activation function | ReLU |
Number of hidden layers | 2 |
Number of neurons in each hidden layer | 256 |
Optimizer | Adam |
Loss | Categorical cross-entropy |
Batch size | 4096 |
Dropout rate | 0.5 |
Epochs | 100 |
To demonstrate the power of the proposed feature representation method, its performance was compared using two approaches.
In the first approach, the feature representation methods were varied and the network architecture was unchanged.
In the second approach, the proposed method was evaluated using varying network architectures to demonstrate its effectiveness.
Feature representation | Description | Micro F-score (%) | Macro F-score (%) |
---|---|---|---|
Ordinal encoding | Transform nonnumeric labels to numeric labels with order | 99.50 | 97.20 |
Binary encoding | Transform ordinal encoding to binary code; digits from binary string are split into separate columns. | 99.60 | 97.90 |
Proposed (Entity embedding) | Explained in Subsection 3.3 | 99.60 | 98.30 |
Number of hidden layers | # Neurons in each layer | Micro F-score (%) | Macro F-score (%) |
---|---|---|---|
1 | 128 | 99.60 | 97.90 |
1 | 512 | 99.60 | 98.20 |
1 | 1024 | 99.60 | 98.20 |
2 | 512–256 | 99.60 | 98.30 |
2 (proposed) | 256–256 | 99.60 | 98.30 |
3 | 512–256–128 | 99.60 | 98.20 |
3 | 256–256–256 | 99.60 | 98.20 |
We show how the proposed model can outperform the state-of-the-art in terms of micro and macro F-score. It can be observed that the proposed method significantly outperforms all in terms of macro F-score, and slightly outperforms in terms of micro F-score. The slight difference in the micro-average is due to the domination of its result by the majority class (normal).
Ref. | Year | Approach | Micro F-score (%) | Macro F-score (%) |
---|---|---|---|---|
[ |
2020 | Naive Bayes | 88.00 | 51.20 |
[ |
2020 | SVM | 90.00 | 41.20 |
[ |
2020 | Decision Tree | 93.00 | 49.80 |
[ |
2020 | K-Nearest Neighbor | 96.00 | 78.80 |
[ |
2020 | CNN | 97.00 | 82.40 |
[ |
2016 | ANN with one hidden layer | 98.60 | 91.40 |
[ |
2019 | Feature selection using information gain ratio Bagging algorithm using C4.5 | 99.40 | 96.30 |
Proposed model | 2020 | Entity Embedding-ANN | 99.60 | 98.30 |
It is evident from
Attack type | TPR (Recall) | FPR | Precision |
---|---|---|---|
Normal | 0.998 | 0.018 | 0.998 |
Flooding | 0.994 | 0.001 | 0.904 |
Scheduling | 0.922 | 0.000 | 0.995 |
Grayhole | 0.756 | 0.003 | 0.911 |
Blackhole | 0.928 | 0.009 | 0.730 |
Micro Avg. | 0.985 | 0.017 | 0.987 |
Macro Avg. | 0.920 | 0.006 | 0.908 |
Attack type | TPR (Recall) | FPR | Precision |
---|---|---|---|
Normal | 0.998 | 0.021 | 0.998 |
Flooding | 0.982 | 0.001 | 0.902 |
Scheduling | 0.924 | 0.000 | 0.976 |
Grayhole | 0.961 | 0.002 | 0.963 |
Blackhole | 0.982 | 0.001 | 0.965 |
Micro Avg. | 0.994 | 0.019 | 0.994 |
Macro Avg. | 0.965 | 0.005 | 0.961 |
Attack type | TPR (Recall) | FPR | Precision |
---|---|---|---|
Normal | 0.999 | 0.002 | 0.998 |
Flooding | 0.989 | 0.000 | 0.996 |
Scheduling | 0.936 | 0.000 | 0.995 |
Grayhole | 0.981 | 0.000 | 0.978 |
Blackhole | 0.987 | 0.000 | 0.978 |
Micro Avg. | 0.997 | 0.000 | 0.996 |
Macro Avg. | 0.978 | 0.000 | 0.989 |
We proposed a deep learning architecture that uses ANNs with categorical entity embedding, and sought to demonstrate that it can produce an effective intrusion detection system for WSNs. Entity embedding was proven able to create a robust representation for raw features that can lead to better performance compared to the state-of-the-art. In future work, entity embedding representation will be used with ensemble classification instead of classical ANN to take advantage of the classification power of ensemble methods such as random forest and gradient boosted tree. This work was limited to a single dataset because of a lack of publicly available datasets collected for WSN DoS attacks. In future work, we intend to obtain another dataset to detect different DoS attacks.
This publication was supported by the Deanship of Scientific Research at Prince Sattam bin Abdulaziz University.