The negative selection algorithm (NSA) is an adaptive technique inspired by how the biological immune system discriminates the self from non-self. It asserts itself as one of the most important algorithms of the artificial immune system. A key element of the NSA is its great dependency on the random detectors in monitoring for any abnormalities. However, these detectors have limited performance. Redundant detectors are generated, leading to difficulties for detectors to effectively occupy the non-self space. To alleviate this problem, we propose the nature-inspired metaheuristic cuckoo search (CS), a stochastic global search algorithm, which improves the random generation of detectors in the NSA. Inbuilt characteristics such as mutation, crossover, and selection operators make the CS attain global convergence. With the use of Lévy flight and a distance measure, efficient detectors are produced. Experimental results show that integrating CS into the negative selection algorithm elevated the detection performance of the NSA, with an average increase of 3.52% detection rate on the tested datasets. The proposed method shows superiority over other models, and detection rates of 98% and 99.29% on Fisher’s IRIS and Breast Cancer datasets, respectively. Thus, the generation of highest detection rates and lowest false alarm rates can be achieved.
The biological immune system (BIS), a unique, powerful, and orchestrated system against the influx of pathogens, viruses, and bacteria, protects the body from being damaged and infected. The BIS handles this process through the recognition and detection of foreign elements (non-self) and thereby causing their annihilation. The white blood cells (lymphocytes), that is, the
Despite the success rate of the NSA in different application domains, it comes with its own deficiencies and drawbacks, which are attributed to its random detectors [
The organization of this article is as follows. Section 2 reviews related improvements on the negative selection algorithm. Detailed in Section 3 is the proposed cuckoo search algorithm as utilized in the optimization of the negative selection algorithm. Experiments are presented in Section 4. A conclusion and directions for future work are provided in Section 5.
Various distinctive improvements have been proposed to optimize the random detectors of the negative selection algorithm. Completely different solutions for the generation of more robust detectors have also been explored. The fruit fly optimization (FFO) and
Relying on the NSA, an efficient proactive artificial immune system for anomaly detection and prevention (EPAADPS) was proposed [
The detector generation scheme of the real-valued negative selection algorithm with variable-sized detectors (V-Detectors) will play a crucial role in obtaining adequate performance stability and efficiency. The production of detectors is by random acquisition; however, covering non-self space effectively is not guaranteed. The cuckoo search (CS) algorithm is introduced to improve the quality of the V-Detectors’ detectors, which are referred to as CS-V-Detectors. The mutation, crossover, and selection operators enable the CS to attain global convergence and optimality. By undergoing these processes, the best candidate detectors are produced, and ultimately enhance the traditional random generation of detectors. A fitness function is needed to acquire potent detectors. This function is dependent on the Euclidean distance between two overlapping detectors. The implementation of the proposed algorithm is detailed next.
The cuckoo search (CS) algorithm is a population-based stochastic global search algorithm. The main steps of detector generation with CS are enumerated below and summarized in
The generation of detectors with CS begins by initiating a random population of detectors with the use of a lower bound and upper bound based on the designed variables. The random detectors are uniformly distributed. A candidate
where
Upon population initialization and identifying the best candidate detector
where an
The steps develop into a formative process of random walk, with a power-law step length distribution. Mantegna’s algorithm [
where
where
Here,
where
where
The crossover operator acts on the detector solution obtained from
where
These processes for CS are repeated for the detector solutions. Each detector is then matched to the self-samples using the matching rule of Euclidean distance. The training (self) dataset samples is represented in
The self-sample
We have a generated detector
The value of distance
The detector is checked to ascertain whether it can be detected by previously stored detectors using the Euclidean distance. If the minimum distance between the detector and previously stored detector is less than the radius of the previous detector, the detector is eliminated. Otherwise, it is stored for the detection stage. This continues until the required number of detectors covering the non-self space is reached. These detectors then effectively monitor the system’s status during the detection stage.
For the detector stage, the test sample dataset is matched with detectors from the generation stage. Euclidean distance is applied for the matching process. There is a match if the minimum distance between the test sample and detectors is less than the detector’s radius, and thus the sample is labeled as non-self. If the distance is greater than the detector’s radius, the sample is classified as self, which dictates no match.
The goal of the detectors of the V-Detectors is to thoroughly comb the non-self space; however, the detectors overlap. This overlaps hinders the detectors’ coverage and ultimately has a negative effect on the performance of the V-Detectors algorithm. To tackle the overlapping of detectors, a fitness function is introduced and implemented. The fitness function is a minimization problem. The minimum distance between two detectors
In more explicit terms, we are given two detectors
The driving force behind this study is to discover the detection potency of the proposed CS-V-Detectors algorithm. The UCI repository [
Evaluation criteria for CS-V-Detectors are standard measures. These measures are the detection rate and false alarm rate. The equations for these metrics are:
Simulations are performed on 3.40 GHz CPU Intel Pentium® Core i7 Processor configured using 4 GB RAM.
The results on Fisher’s IRIS dataset are contained in
Algorithm | Detection Rate (%) | False Alarm Rate (%) |
---|---|---|
SVM | 96.00 | 2.00 |
ANN | 97.33 | 1.30 |
Naïve Bayes | 96.00 | 2.00 |
95.30 | 2.30 | |
FuzzyNN | 96.70 | 1.70 |
Random Forest | 94.00 | 3.00 |
V-Detectors | 92.16 | 0.00 |
CS-V-Detectors | 98.00 | 0.00 |
The ANN came close to CS-V-Detectors, with a detection rate of 97.33%. FuzzyNN was next, with 96.70%, followed by the SVM and Naïve Bayes (both with a 96% detection rate). The
It can be seen from
As shown in
Algorithm | Detection Rate (%) | False Alarm Rate (%) |
---|---|---|
SVM | 97.00 | 3.40 |
ANN | 95.30 | 5.40 |
Naïve Bayes | 96.00 | 3.30 |
95.10 | 6.30 | |
FuzzyNN | 96.40 | 4.40 |
Random Forest | 95.70 | 5.20 |
V-Detectors | 96.85 | 0.00 |
CS-V-Detectors | 99.29 | 0.00 |
Algorithm | Detection Rate (%) | False Alarm Rate (%) |
---|---|---|
SVM | 58.30 | 57.60 |
ANN | 71.60 | 32.40 |
Naïve Bayes | 55.40 | 38.80 |
62.90 | 38.80 | |
FuzzyNN | 68.40 | 36.40 |
Random Forest | 69.60 | 31.20 |
V-Detectors | 74.44 | 0.00 |
CS-V-Detectors | 76.71 | 0.00 |
The receiver operating characteristic (ROC) curves corresponding to the experiments on Fisher’s IRIS and Breast Cancer datasets are presented in
Algorithm performance can also be compared by looking at the area under the ROC curve (AUC). The AUC is generated through the reduction of ROC into a single scalar, with values ranging from 0 to 1. Algorithms that have an area less than 0.5 are considered unrealistic; the best algorithms have AUC values near 1. The AUC for the algorithms are listed in
Algorithm | Fisher’s IRIS | Breast Cancer | Liver Disorders |
---|---|---|---|
SVM | 0.9700 | 0.9680 | 0.5035 |
ANN | 0.9802 | 0.9495 | 0.6960 |
Naïve Bayes | 0.9700 | 0.9635 | 0.5830 |
0.9650 | 0.9440 | 0.6205 | |
FuzzyNN | 0.9750 | 0.9600 | 0.6600 |
Random Forest | 0.9550 | 0.9525 | 0.6920 |
V-Detectors | 0.9608 | 0.9843 | 0.8722 |
CS-V-Detectors | 0.9900 | 0.9965 | 0.8836 |
This section compares CS-V-Detectors with variants of NSA models that have already been proposed. The focus is directed at performance on Fisher’s IRIS and Breast Cancer datasets. The various NSA algorithms considered for Fisher’s IRIS are artificial bee colony-NSA (ABC-NSA) [
Algorithm | Detection Rate (%) |
---|---|
ABC-NSA | 97.67 |
I-detector | 99.00 |
FB-NSA | 95.33 |
FFB-NSA | 94.00 |
Specialized detectors-NSA | 90.00 |
CS-V-Detectors (Proposed) | 98.00 |
For the Breast Cancer dataset, the algorithms for comparison are antigen density clustering-NSA (ADC-NSA) [
Algorithm | Detection Rate (%) |
---|---|
ADC-NSA | 99.41 |
ASSC-NSA | 97.97 |
DNSA | 92.39 |
HC-RNSA | 94.50 |
PRR-2NSA | 94.68 |
ASD-RNSA | 97.92 |
CS-V-Detectors (Proposed) | 99.29 |
This research introduced and proposed a detector generation algorithmic scheme based on cuckoo search (CS) for the negative selection algorithm with particular focus on the real-valued negative selection algorithm with variable-sized detectors (V-Detectors). It embodies the properties of Lévy flight in attaining global convergence and optimality, which results in the generation of efficient detectors. The proposed algorithm improved the performance of standard V-Detectors and performs better than other existing algorithms. Hence, it can be concluded that the optimization technique enhances the detection ability and efficiency of the negative selection algorithm. Future work will involve hybridizing cuckoo search with other optimization algorithms for enhanced detection.