Traditional techniques based on image fusion are arduous in integrating complementary or heterogeneous infrared (IR)/visible (VS) images. Dissimilarities in various kind of features in these images are vital to preserve in the single fused image. Hence, simultaneous preservation of both the aspects at the same time is a challenging task. However, most of the existing methods utilize the manual extraction of features; and manual complicated designing of fusion rules resulted in a blurry artifact in the fused image. Therefore, this study has proposed a hybrid algorithm for the integration of multi-features among two heterogeneous images. Firstly, fuzzification of two IR/VS images has been done by feeding it to the fuzzy sets to remove the uncertainty present in the background and object of interest of the image. Secondly, images have been learned by two parallel branches of the siamese convolutional neural network (CNN) to extract prominent features from the images as well as high-frequency information to produce focus maps containing source image information. Finally, the obtained focused maps which contained the detailed integrated information are directly mapped with the source image
The infrared sensors or multi-sensors are used to capture the infrared and visible images. As different objects like the environment, people, and animals emit thermal or infrared radiations which are further used for the detection of target and parametric inversion. These images have a lesser effect and insensitive to the illumination variations and disguise. Thus, it overcomes the hurdles while detecting the targets by working day and night [
Therefore, there is a need for an automatic fusion method that can fuse the two complementary images into a single image,
In recent years, more attention has been paid towards the field of IR and VS image fusion. Many researchers presented a lot of IR/VS image fusion approaches which are roughly classified into various categories as multi-scale decomposition (MST), principal component analysis (PCA), sparse representation (SR), fuzzy sets (FS), and deep learning (DL). In consideration to this problem, the main motivation behind this work was to extend the research in the direction of the examination of the fused image to be helpful in the object tracking, object detection, biometric recognition, and RGB-infrared fusion tracking. Therefore, goal is to propose a reliable automatic anti-noise infrared/visible image fusion technique for generating a fused image that has the largest degree of visual representation of environmental scenes to be used in.
Major contributions of this study are: (1) The unique integration of fuzzification and siamese CNN based infrared/visible fusion technique for the integration of complementary infrared/visible images has been put forward. (2) Fuzzification has been done using the fuzzy sets to model various uncertainties efficiently for problems like ambiguousness, vagueness, unclearness, and distortion present in the image by the determination of the membership grade of the background environment as well as target detection. Whereas, feature classification has been done by the CNN model with the extraction of the low level as well as high level infrared/visible features. Furthermore, fusion rules are also automatically generated to fuse the obtained features. (3) The proposed technique is more reliable and robust as compared to the classical infrared/visible technique due to its advantage of making it less laborious. (4) A publically accessible dataset consisted of 78 infrared/visible images has been used for the experiments. (5) The qualitative as well as quantitative evaluation has been done using six classical infrared/visible techniques such as discrete cosine transform (DCT), anisotropic diffusion & karhunen-loeve (ADKL), guided filter (GF), random walk (RW), principal component analysis (PCA), and convolutional neural network (CNN) methods by using five metrics,
The key motivation of this research is to combine the advantages of the spatial domain (CNN) and fuzzy based method to achieve the accurate extraction of IR targets while maintaining the background features of VS images which is not easy to attain as there occurs various challenges during this process. Efficacious evaluation of the quality of pixels has been done with the extraction of target features and background features in order to integrate them for the generation of clear focused fused image. Additionally, it is a laborious task. Then, investigation of the determination of the pixels belongingness is an issue of relevance. Furthermore, from the literature, it has been analyzed that FS represented the uncertain features. Therefore, indeterminacies, noise, and imprecision present in the images can be considered as a problem of fuzzy image processing. Subsequently, due to the powerful ability of the CNN for automatic data extraction, this research work generated the data-driven decision maps with the utilization of CNN. Hence, as per the literature, no attempt has been made to integrate the FS with CNN for IR/VS image fusion. Therefore, in this research work, an attempt has been made to propose a novel fuzzy CNN based IR/VS image fusion method for the fusion of images. The key contributions of this study are outlined as follows.
It helped to integrate different modality images to produce a clear more informative fused image. It also improved the infrared image recognition quality of the modern imaging system. Subjective and objective experimental analysis have been performed.
The remaining structure of this study is presented as follows: Section 2 briefly describes the background and related approaches for infrared/visible image fusion. In Section 3 detailed description of the proposed technique methodology is given. Section 4 presents the dataset, evaluation metrics, and validates the experimental results by doing an extensive comparison with existing techniques. In Section 5, concluding remarks and future works discussion is drawn.
In the past, numerous techniques for infrared/visible fusion had been developed like pyramid decomposition [
In order to handle the former problems, hybridization of the fuzzy set and Siamese CNN has been employed to fuse the infrared/visible images. The proposed technique is presented as follows.
Zadeh et al. [
For the processing of an image, input images
where
The membership grade describes the element’s degree of belongingness to a FS. Here, 1 indicates the elements with complete belongingness to a FS, whereas 0 implies it’s belonginess to the fuzzy set. Summation of all the membership functions of the element ‘
where,
As input grayscale image includes darker, brighter, and gray level pixels whose value ranges from 0 to 255. Therefore, image mapping has been done from pixel scale to fuzzy domain by assigning triangular membership function.
Now, image having pixel values between 0 to 255 was converted to 0 to 1 indicating the pixel fuzziness.
where
The triangular membership function of the image has been applied whose mathematical representation is shown in
So, by using the above equation, pixels having minimum intensity value are assigned 0 whilst pixels having maximum value are assigned 1, and the uncertainty, as well as ambiguity are removed. Hence, the uncertainty was removed without diminishing the image quality.
The proposed Siamese CNN or convNet model designed for the fusion of IR/VS images is described here. It is designed to automatically learn mid and high level abstractions of the data presented in the two heterogeneous images. By the use of the Siamese network, the same weights were shared between two different branches. One branch was used to handle the infrared image and the other was to process the visible image. Each branch has step-wise stages of CNN such as convolution layer, max pooling, flattening, and full connection,
These layers generate the feature maps parallel to each level of abstraction of features from an image [
Then, features extracted from the previous CNN layers are concatenated by the fully connected (FC) layer. Subsequently, pooled feature maps are obtained by the flattening of the pooling layers. The last layer consists of the output neuron which assigns a probability to the image. CNN gives scalar output whose value ranges from 0 to 1.
The proposed technique for the fusion scheme consisted of five steps: fuzzification, focus detection by the feature map generations, segmentation, unwanted region removal, and infrared/visible image fusion. This attempt has been made to generate a fused image consisting of all its useful features as illustrated by the schematic block diagram of the proposed technique for infrared/visible image fusion in
Firstly,
For the first three convolutional layers, the fixed stride of 1 has been used. Max pooling has been applied for the localization of the parts of the images using a window size of 2
Thus, during the fusion, the network which has been trained using the patch size of 16
If
Moreover,
Now, more detailed information is contained in the focus map of the image which is near to 0 or 1 values. From
Further processing of the focus map has been done to preserve the maximum of useful features
The obtained binary map contained some misclassified pixels and unwanted small objects or holes as clearly seen in
Here, the area threshold value is manually adjusted to 0.03
Now the computed
where
Lastly, the pixel-wise weighted average method has been used to obtain the resultant single fused image as described in
where,
The proposed algorithm for infrared/visible image fusion is described in detail in Algorithm 1.
In this research work, both subjective and objective assessment has been done for the validation of the superiority of the proposed technique. For this purpose, six pre-existing infrared/visible image fusion techniques such as DCT [
In this, IR/VS images are obtained under changing environmental conditions. The publically available datasets are acquired from RoadScene [
The RoadScene dataset consisted of total 221 IR/VS sets of images. Images are of rich road traffic spots. For an instance, pedestrians, roads, and vehicles. These highly representative spots are acquired from naturalistic driving videos. These images have no uniform resolution.
The TNO dataset is common publically used for the IR/VS research. It includes the varied military relevant scenes images that has registered with distinct multi-band cameras with non-uniform resolution.
The CVC-14 dataset included pedestrian scenes that is highly utilized for the manufacturing of autonomous driving technologies. It is composed of two pair of sequence, namely day and night pairs, respectively. Total are 18710 images, among which 8821 is the daytime sequence and 9589 as the nighttime sequence. All images have resolution of 640
Towards this approach,
where, ‘
where, two source input images are described by
where,
where,
where
All metrics values have ranges in the [0, 1] interval [
In this study, Siamese CNN has been presented. It consisted of two branches having the same neural structure with the same weights for the extraction of the features of two different infrared/visible images. The network training has been done by using a framework of caffe [
Training has been done on 50,000 natural images derived from the ImageNet dataset [
where, Random flipping: both horizontal and vertical flipping is done. Rotation: both horizontal as well as vertical rotation of images is done by 90° and 180°. Gaussian filter: blurring of images are obtained for noise smoothening.
The fusion result on the six different sets of infrared/visible images has been attained. Based on the fused images, it can be observed that infrared images have apparent objects and visible images have an obvious background. The techniques such as GF, DL, RW, PCA, and ADKL failed in retaining the objects presented in the images well.
From
It is evident from the
For further illustrations of the fusion effects, five evaluation metrics such as MI, HP, ISS,
Therefore, from the above discussions, it can be concluded that the proposed technique attained the highest values in terms of every metric as shown in bold in
Fusion methods | MI | EG | EI | ISS | HP |
---|---|---|---|---|---|
DCT | 0.7318 | 0.7202 | 0.6099 | 0.5543 | 0.5472 |
PCA | 0.6111 | 0.7040 | 0.6370 | 0.6728 | 0.5304 |
RW | 0.7355 | 0.7994 | 0.5647 | 0.7421 | 0.5634 |
GF | 0.3850 | 0.7689 | 0.6174 | 0.7510 | 0.5232 |
CNN | 0.6490 | 0.7962 | 0.5015 | 0.8109 | 0.5115 |
ADKL | 0.6709 | 0.6790 | 0.6180 | 0.6994 | 0.6978 |
Proposed |
This paper designed an infrared/visible image fusion technique based on the fuzzification and convolutional neural network. The main goal of this study is to solve the issue regarding the maintenance of thermal radiation features in the pre-existing IR/VS based methods. Therefore, benefits of two theories have been taken with the integration of FS and CNN to devise a new strong and adaptable technique into a single scheme. The proposed technique firstly retained the details of the thermal radiation of the infrared images, whereas simultaneously accumulated the visibility in the visible image. Therefore, correct target location can be observed which further helped in the processing and also vital for increasing precision and focused ability of the output image. This technique dealt with 78 sets of infrared/visible images. Furthermore, high quality and enhanced image has been produced even under bad illumination and varied expressions. The main goal behind this work is the designing of the advanced automatic technique to obtain the fused image containing contour, brightness, and texture information between IR/VS images to illustrate clear target features of the infrared image while distinctly visible background which will be further helpful in the military surveillance and object detection. The subjective, as well as objective evaluation, indicated that the proposed technique has given a higher performance in comparison to the existing techniques in feature extraction and information gathering.
In the future, we intend on the optimization of the developed technique with the hybridization of the neuro fuzzy and CNN. Moreover, this technique can be more generalized for the fusion of more than two images at the same time by adapting the convolutional operations. Also, we intend to extend this research in other domains as well.