Obtaining clear images of underwater scenes with descriptive details is an arduous task. Conventional imaging techniques fail to provide clear cut features and attributes that ultimately result in object recognition errors. Consequently, a need for a system that produces clear images for underwater image study has been necessitated. To overcome problems in resolution and to make better use of the Super-Resolution (SR) method, this paper introduces a novel method that has been derived from the Alpha Generative Adversarial Network (AlphaGAN) model, named Alpha Super Resolution Generative Adversarial Network (AlphaSRGAN). The model put forth in this paper helps in enhancing the quality of underwater imagery and yields images with greater resolution and more concise details. Images undergo pre-processing before they are fed into a generator network that optimizes and reforms the structure of the network while enhancing the stability of the network that acts as the generator. After the images are processed by the generator network, they are passed through an adversarial method for training models. The dataset used in this paper to learn Single Image Super Resolution (SISR) is the USR 248 dataset. Training supervision is performed by an unprejudiced function that simultaneously scrutinizes and improves the image quality. Appraisal of images is done with reference to factors like local style information, global content and color. The dataset USR 248 which has a huge collection of images has been used for the study is composed of three collections of images—high (640
Procuring high quality and clear images of underwater scenes is often difficult owing to the complex nature of the ecosystem and environment present underwater [
A generation model based on deep residual networks for Single Image Super Resolution (SISR) was provided by Islam et al. [
In this paper, an attempt to resolve the problems faced previously by introducing a super-resolution model for underwater images for real-time applications is made. The problem defined here is a translation issue (image-to-image) and assumes that a mapping that is non-linear in nature exists between the distorted images (which make up the input data) and the enhanced images (of which the output is made of). Next, an Alpha-SRGAN-based model is designed, which learns the mapping between the two images by adversarial training on the USR-248 dataset. After careful consideration of the design, implementation and experimental results and validations of the model, we make the following inputs in this contribution.
Several models and techniques have been introduced for unsupervised learning. However, for image generation, GAN [
The LR-HR domain is used in DeblurGAN Yuan et al. [
Another work by Yu et al. [
In SISR, an input of low resolution could give rise to multiple high-resolution images, and the HR space that we wish to map the low-resolution image to is typically unmalleable [
Here, the Convolution of the unknown high-resolution picture, X, and K (blurry Kernel) is given by
One of the widely studied fields in the recent years is the process of increasing the spatial-resolution of LR images, otherwise known as Single Image Super Resolution. Bicubic up-sampling, nearest neighbor and other similar interpolation methods are few solutions to this problem. Single Image Super Resolution (SISR) for terrestrial applications has been widely studied. However, this is not the case with images captured underwater. SISR techniques for improving these images have not been focused on due to the shortage of detailed and extensive data sets that effectively grasp the distortions present in underwater images. Data sets that are currently available contain synthetic images [
Nevertheless, the existing techniques are not capable of retrieving finer details in images, thus making the output blurry and of low quality. A few studies have been made in this domain, focusing mostly on rebuilding underwater images of better quality by removing noise and blurry areas [
One dataset that contains a vast collection of paired HR-LR images is the USR-248 dataset [
Preprocessing is done with Contrast Limited Adaptive Histogram Equalization (CLAHE) and white balance to remove the low contrast and severe deformations present in underwater images. Next, correction of seafloor color and creation of normal underwater scenes is employed using a positive white balance. CLAHE has been employed to get an enhanced image and to improve the visibility of the aquatic animals. The results after the pre-processing stage have been presented in
Once the input images are preprocessed and trained with the help of the AlphaSRGAN system, the generator produces the super-resolution image. Linear photometric models, affine motion and speeded up robust features (SURF) image registration are used to register the image in the generator. The image resolution is enhanced by fusing the sharpest area in every image after the registered image is inputted into the network architecture. Consecutively, decomposition of the image is done by discrete wavelet transform as the first step. The image is split into four sub-bands—High-High (HH) sub-band, High-Low (HL) sub-band, Low-High (LH) sub-band and Low-Low (LL) sub-band. Among these, the low-low sub-band retains the original features of the image and acts as the coefficient of approximation. The detailed parameters of the image are represented by the remaining sub-bands i.e., LH, HL, and HH sub-bands. Subsequently, features of the image that stand out from the rest in terms of clarity and detail are extracted and represented first with the help of Linear Discriminant Analysis (LDA). In this step, a new axis is generated on which data from both the features is projected. The information is projected such that the variance is minimized and the distance between the class means is maximized. Variance is preferred here since the signal and noise generally have variances at the extremes, with the signal having greater variance and the noise having less variance, and because the ratio between the variances can also be depicted easily using the signal-to-noise ratio parameter. Finally, Inverse Discrete Wavelet Transform (IDWT) is used for reconstructing the image. The refused image is used to learn and achieve super-resolution by repeated processing in the generator and discriminator network.
The architecture of the different networks proposed here is based on the AlphaGAN [
The discriminator’s sigmoid output layer is removed and substituted by binary cross entropy loss along with the formulation for power function while using Alpha-GAN [
The AlphaGAN architecture resolves problems in optimization shown in
In addition, the proposed method introduces two more hyper-parameters, a and b balance the emphasis on D(x) and D(G(R)) while training and are two-order indices, respectively. In this model, a and b are assumed to be greater than 0 so as to prevent cases like D(a) from becoming evident in the loss function that usually happens when these values are equal to 0. This assumption also improves the convergence stability of the model proposed. Otherwise, when the discriminator’s output goes below unity, the value of loss becomes very large, thereby making the model unstable and difficult to converge during training. Here, the absolute value of the output from the discriminator is considered. This also prevents the output from taking an arbitrary value when a and b values are greater than 1. It is shown that the objective functions and formulas in Alpha-GAN are not associated with the alpha divergence formulation (
Low Resolution or High Resolution (LR/HR) content loss is the factor that stimulates the restoration of identical features as the ground truth. The representation is usually done by the generator in the form of high-level representations. Transfers of style, removal of SISR problems and enhancement of images have been effectively done. High-end attributes extricated by the final convolutional layer from a VGG-19 network that was trained previously has been used to define the image content function
The generative element of GAN has also been taken with the perceptual loss along with the content loss described in Section 3.3. Adversarial loss promotes our model to prefer responses from the vast plethora of natural images available by attempting to deceive the discriminator network. The probabilities of the discriminator
In
The Alpha-SRGAN model has been implemented using TensorFlow libraries [
PSNR is often utilized as a standard to assess images. It is a common method to quantify signal reconstruction quality during image compression. It can be evaluated by finding the Mean Square Error method (MSE) [
MSE for two m*n monochromatic images called K and I, where one denoted the noise approximation of the other parameter, is given by:
Peak signal to noise ratio can be evaluated using the following equation:
where MAX1 stands the maximum value that represents the color of image points, MAX1 is usually 255 when a particular sampling point is depicted using eight bits.
Image Scale | SRGAN [ |
ESRGAN [ |
EDSRGAN [ |
RSRGAN [ |
ISGAN [ |
SRDRMGAN [ |
Deep SESR [ |
AlphaSRGAN (Ours) |
---|---|---|---|---|---|---|---|---|
2* | 28.05 | 26.66 | 27.12 | 25.11 | 26.34 | 28.55 | 27.03 | 29.86 |
4* | 24.76 | 23.79 | 21.65 | 24.96 | 23.87 | 24.62 | 24.59 | 25.96 |
8* | 20.14 | 19.75 | 19.87 | 19.89 | 20.19 | 20.25 | 21.62 | 21.89 |
A measure of the similarity between 2 images is given by the Structural Similarity Index (SSIM). The Image and Video Engineering Laboratory based in the University of Texas, Austin, was the first one to coin the term. There are two key terms here—Structural information and distortion. These terms are usually defined with respect to the image composition. The property of the object structure that is independent of contrast and brightness is called Structural information. A combination of structure, contrast and brightness gives rise to distortion. Estimation of brightness has been done using mean values, contrast using standard deviation and structural similarity was measured with the help of covariance.
SSIM of two images, x and y can be calculated by:
In the
Image Scale | SRGAN [ |
ESRGAN [ |
EDSRGAN [ |
RSRGAN [ |
ISGAN [ |
SRDRMGAN [ |
Deep SESR [ |
AlphaSRGAN (Ours) |
---|---|---|---|---|---|---|---|---|
2* | 0.78 | 0.75 | 0.77 | 0.75 | 0.95 | 0.81 | 0.88 | 0.93 |
4* | 0.69 | 0.66 | 0.65 | 0.69 | 0.84 | 0.69 | 0.71 | 0.88 |
8* | 0.60 | 0.58 | 0.58 | 0.65 | 0.72 | 0.61 | 0.63 | 0.75 |
Mean Opinion Score [
Image Scale | SRGAN [ |
ESRGAN [ |
EDSRGAN [ |
RSRGAN [ |
ISGAN [ |
SRDRMGAN [ |
Deep SESR [ |
AlphaSR GAN (Ours) |
---|---|---|---|---|---|---|---|---|
2* | 2.56 | 1.85 | 2.21 | 1.21 | 1.78 | 2.98 | 2.14 | 3.58 |
4* | 1.19 | 1.03 | 0.89 | 1.22 | 1.13 | 1.16 | 1.09 | 1.78 |
8* | 0.80 | 0.65 | 0.67 | 0.69 | 0.73 | 0.76 | 0.84 | 0.95 |
Underwater Image Quality Measure (UIQM) [
Altogether, the comprehensive quality measure for underwater images is then depicted by
Image Scale | SRGAN [ |
ESRGAN [ |
EDSRGAN [ |
RSRGAN [ |
ISGAN [ |
SRDRMGAN [ |
Deep SESR [ |
AlphaSR GAN (Ours) |
---|---|---|---|---|---|---|---|---|
2* | 2.72 | 2.70 | 2.67 | 2.42 | 2.72 | 2.77 | 3.15 | 3.23 |
4* | 2.42 | 2.38 | 2.40 | 2.55 | 2.35 | 2.48 | 2.96 | 2.85 |
8* | 2.10 | 2.05 | 2.12 | 2.10 | 2.01 | 2.17 | 2.39 | 2.41 |
According to PSNR and SSIM, the comparative findings are represented in
DeepSR technique can perform two tasks simultaneously and produce a picture for a better result. Nevertheless, the compared methods, our proposed model AlphaSR GAN perform the tasks effectively in a synchronized manner and obtain an image with the best quality for pre-eminent results. To validate the performance of our proposed model, we compared it with other techniques and the comparison results are graphically illustrated in
This technique failed to obtain the required data. Whereas the ESRGAN and EDSRGAN methods have resulted in slightly higher UIQM and MOS values, but these techniques also failed to recover the complete information. In further comparison with other techniques, the RSRGAN outcome has significantly higher values of UIQM and MOS, but the picture effect was comparatively low. Then ISGAN and SRDRMGAN technique resulted in higher UIQM and MOS values similar to Deep SESR technique. However, the required part of the information was not compensated by these two techniques, even though it can obtain a clear image.
Further, with higher values of UIQM and MOS, Deep SESR produces better images. However, our proposed AlphaSRGAN model executes results in notably higher UIQM and MOS values. In addition, the tasks were efficiently performed in a coordinated manner and obtained a picture of the highest standard.
In this work, a novel generative model named AlphaSRGAN image super-resolution algorithm has been introduced that amalgamates the traditional image reconstruction approaches with deep learning methods for underwater image super-resolution. In addition, several qualitative and quantitative tests have been performed on the dataset used-USR-248. The peak signal to noise ratio for image scales 2*, 4* and 8* in our model is superior to other models. Parameters such as SSIM and UIQM also prove to be better than existing systems when compared to the proposed model. Enhanced Mean Opinion Scores have also been obtained by our model. Subsequently, the model proposed here shows the greater performance when compared to pre-existing models. AlphaSRGAN also proves to be a better alternative taking the enhanced performance, competent computational abilities, and model design into consideration. It makes the model extremely suitable for real-time applications and applications in other fields as well.