Intelligent Automation & Soft Computing DOI:10.32604/iasc.2023.029644 | |
Article |
Using GAN Neural Networks for Super-Resolution Reconstruction of Temperature Fields
1School of Artificial Intelligence, Nanjing University of Information Science & Technology, Nanjing, 210000, China
2Unit 93117 of PLA, Nanjing, 210000, China
3International Business Machines Corporation (IBM), New York, 100014, USA
4School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing, 210000, China
*Corresponding Author: Zhiwei Jiang. Email: jzw0659@outlook.com
Received: 08 March 2022; Accepted: 21 April 2022
Abstract: A Generative Adversarial Neural (GAN) network is designed based on deep learning for the Super-Resolution (SR) reconstruction task of temperature fields (comparable to downscaling in the meteorological field), which is limited by the small number of ground stations and the sparse distribution of observations, resulting in a lack of fineness of data. To improve the network’s generalization performance, the residual structure, and batch normalization are used. Applying the nearest interpolation method to avoid over-smoothing of the climate element values instead of the conventional Bicubic interpolation in the computer vision field. Sub-pixel convolution is used instead of transposed convolution or interpolation methods for up-sampling to speed up network inference. The experimental dataset is the European Centre for Medium-Range Weather Forecasts Reanalysis v5 (ERA5) with a bidirectional resolution of
Keywords: Super-resolution; deep learning; ERA5 dataset; GAN networks
Artificial intelligence advances have shown considerable promise in the field of meteorology, not only in the domains of computer vision (CV) and language processing [1–5]. The Image Super-Resolution [6] is a classic in the CV field; it generally refers to increasing the resolution, for example, from 512 × 512 to 1024 × 1024 pixels. In the field of meteorology, downscaling for the temperature field is a super-resolution reconstruction using the same notion as CV [7]. It is a solution to mitigate the conversion of low-resolution data to high-resolution data. Most conventional statistical approaches employed in the past to develop statistical downscaling models in the meteorological disciplines were based on normal distribution assumptions. However, various extremes did not follow a normal distribution, and bottlenecks were encountered in the study of extreme climate events with non-normal distributions. [8].
GAN networks [9] use a random sample from the potential space as input and learn such that the output results are near to the true distribution in the training set, which overtakes the feature extraction performance of conventional neural networks. In this paper, In this research, we propose a GAN network structure and use the ERA5-Land hourly dataset [10,11] to scale up 2 m temperature field data with a resolution of
Overall, the contributions of this study are mainly in three aspects:
1. To provide the model with adequate capacity and generalization performance, the widely used residual structure [12] and batch normalization [13] in the deep learning domain are included. On the other hand, a recurrent structure is required to fuse temporal information since the data are interrelated in time.
2. Because arbitrary points in the image matrix have a defined physical relevance for meteorological elements such as temperature, using Bicubic interpolation for the scale down during the data processing phase, which will lose the original information, is ill-advised. In our work, the reduced location is perfectly aligned with the original image, and we use the nearest interpolation to retain the original image’s information.
3. We adopt 2-fold staged zooming of the meteorological element field for the network’s structural attributes, which is more flexible with 2, 4, and 8-fold zooming. To enhance the GAN more effectively, sub-pixel [14] convolution is employed instead of the standard transposed convolution and interpolation approaches.
Bilinear interpolation and Bicubic interpolation are extensively employed to recover lost information in images, whether it’s for image super-resolution or downscaling in the meteorological field. The basic idea is to use linear and cubic functions to interpolate pixel coordinates horizontally and vertically in a bidirectional way. In Fig. 1, the bilinear and bicubic approaches are presented. Bilinear and Bicubic gather the nearest 4 and 16 points surrounding the target point, respectively, and 1st-order and 3rd-order polynomials would be used to infer the value of the target pixel.
The experimental results (Section 6) indicate that the interpolation method produces an overly smooth image. Deep learning approaches such as GAN-base, on the other hand, are preferred. GAN differs from ordinary neural networks in that it has two components: a generator and a discriminator. The generator’s job in the SR reconstruction task is to input a low-resolution picture and then generate a new high-resolution image, while the discriminator’s job is to distinguish between actual and fake images (by generator). It can approximate the produced data to the real distribution as the number of iterations increases. The GAN network’s CV field may be used in a variety of applications, including image style conversion and image demosaic.
Numerous works have been produced over the years with deep learning methods for SR reconstruction, Super-Resolution Deep Convolutional Networks (SRCNN) [15] introduced deep learning methods into the field of image Super-resolution for the first time using only three layers of convolution to achieve state-of-the-art (STOA) performance, and Faster SRCNN (FSRCNN) improved the SRCNN for many measures to promote network’s effectiveness. Very Deep Super-Resolution (VDSR) [16] allows the network to learn the residuals of the high-frequency part of the image and the performance is further improved. GAN networks were originally implemented for super-resolution to enhance realism via perceptual loss and adversarial loss, alleviating the problem of losing high-frequency details when using RMSE as a loss function in Super-Resolution GAN [17] (SRGAN, shown in Fig. 2).
The SRGAN needs high-resolution generation for all time frames by concatenating each frame due to the temporal correlation of the data, and we experimentally demonstrate that satisfactory results cannot be reached using the SRGAN approach for the temperature field SR reconstruction task, even though it is a baseline model in the picture super-resolution domain (results shown in Section 6).
Over the years Super-Resolution achievements in the non-computer vision domain have proliferated [18,19], and unlike image processing [20–22], it is critical to understand and quantify forecast uncertainty in climate and climate applications. The classical precipitation downscaling algorithm [23] uses techniques such as stochastic autoregressive models. Meteorology elements downscaling is also being attempted with deep learning. Leinonen et al. [24] used GAN networks for the down-sampling of rainfall and cloud thickness, and we differ from them by using pixel extraction in the down-sampling stage to retain the original information of meteorological elements while introducing sub-pixel convolution to optimize the network to be more efficient. The China Meteorological Administration’s Land Assimilation System Statistical Downscaling Model (CLDASSD) [25] used traditional convolution and proposed a quality control algorithm for high and low resolution data pairs, and achieved STOA performance for down-sampling the temperature field in the Chinese region.
The experimental dataset is the ERA5-Land hourly [11], a reanalysis dataset with a higher resolution than ERA5 on pressure levels. The dataset uses physics laws to combine model data with observations from global regions into a dataset. The resolution of this dataset is 0.1° × 0.1°, one record per hour, and the meteorological element used is 2 m-temperature.
We crop the worldwide matrix for Beijing, China, with a longitude range of 111.2°E to 117.5°E and a latitude range of 36.0°N to 42.3°N., as shown in Fig. 3. The original dataset provided 00:00~23:00 with a total of 24 items for the temporal dimension, and we sampled the data at 3-h intervals to acquire 8 items in a day. After cropping and sampling the original data, the dimension shape in one day is
The original image needs to be downscaled to low-resolution and aligned with the original image for training, in addition to the concept of computer vision for SR. Bicubic interpolate is a generally used down-sampling method for converting a high-resolution image to a target resolution with a blurring effect, although it is ineffectual for meteorological data processing. To maintain the correct values of the real meteorological element field and prevent numerical smoothing, we adopt nearest interpolation down-sampling to upscale two points from the whole element field at
The training set contains all days for 2019 and 2020, the validation set is randomly divided from the training set to choose the best model during the training process, and the test set has 151 days of data from 2021.1.1 to 2021.5.31 to avoid temporal overlap with the training set.
This experiment’s GAN is composed of a generator and a discriminator. The generator inputs low-resolution meteorological elements (2-D matrix), and the output can be dynamically adjusted to upscale to high-resolution element fields by factors of 2, 4, and 8 on demand. In this section, we show the model network structure for a generator and a discriminator, followed by an introduction to the network’s submodules such as residual structure, recurrent structure, and sub-pixel convolution.
The complete proposed GAN network structure is illustrated in Fig. 4. A generator is the main model of super-resolution for the low-resolution input and outputs of the high-resolution 2-D meteorological element field: the low-resolution part first input to the conventional convolution layer
The third layer of the generator
Fig. 5 shows the submodules of ConvolutionBlock and ResBlock. ConvBaseBlock is an ordinary convolutional layer that fills normalizes, and non-linearly activates the input feature map (a fill operation is required before the input 3 × 3 convolutional layer so that the height and width of the image remain constant after convolution). ResBlock is the residual structure [12] that has arisen as a dominant structure in the CV domain, with gradient disappearance/explosion as network structure depth increases [27]. The proposed paradigm solves the problem of network degradation, which speeds up the convergence of the training process, and drastically reduces the difficulty of training so that the network allows it to be designed deeper. The residual blocks process their inputs through two activation layers and convolutional layers, and finally, add the inputs to the outputs, replaces the traditional single convolutional layer with a single monolithic one, and the input and output can be achieved identically, while the network depth can be increased to improve the learning ability of the network. On the other hand, the Internal Covariate Shift (ICS) problem occurs as the network depth deepens, so we add batch normalization [13] after the output of the convolution layer and before the input of the activation function (except the last layer) can be used to solve it while allowing the activation function’s input data to fall in the gradient non-saturation region, mitigating the gradient disappearance risk and speeding up the training processes.
In comparison to the conventional Bicubic interpolation, sub-pixel convolution [14] operates as shown in Fig. 6, which is a method for up-sampling distinguished from the interpolation function method by model-base. If the feature map needs to be scaled up twice, the number of channels will be expanded to yield 4 low-resolution images of the same shape by Convolutional Neural Network (CNN). The channel pixel sites can then be panned to the plane dimension to reduce the channel dimension in exchange for the plane scale.
5 Model Optimization Objectives
Our goal is to train a Generator (
The role of
We choose a gradient-based method to calculate the gradient of the network model’s weight parameters (
The training procedure optimizes the GAN network for the objective Eq. (1), the discriminator
1. When
2. When
Optimizer: we choose stochastic gradient descent with 0.2 momenta (SGDM, gradient ascent is SGAM) [28] as the optimization method and update the method as follows. Although many variants of optimization algorithms with momentum such as Adaptive Moment Estimation (ADAM) [29] have been shown to converge faster and more efficiently in a large number of experiments, there is also experimental evidence that adaptive methods are detrimental to machine learning. Reddi et al. [30] found that Adam may not converge in some cases, and stochastic gradient descent (SGD) [31] or SGDM are still the dominant optimization methods.
The stochastic gradient descent update strategy with momentum is as follows:
Strategy for updating the learning rate: we also hope to reduce the risk of the learning process into the saddle surface. In this experiment, we use a cosine annealing learning rate decay strategy to periodically update the learning rate. The following is how the learning rate update strategy works in Eq. (4),
The complete update process algorithm is as follows:
In this section, evaluation metrics widely used in the CV field are introduced and applied to the downscaled evaluation results of the meteorological element field in this paper.
For tasks with super-resolution, the mean square error has limitations, and it has been shown that using the root mean square error as the primary loss function loses high-frequency information from the image. The Peak Signal-to-noise Ratio (PSNR, shown by Eq. (6)) is the ratio between the maximum power of the signal and the signal noise power to measure the quality of the reconstructed image that has been compressed, usually expressed in decibels (dB), the higher the PSNR index, the better the image quality,
MSE, PSNR is not consistent with the actual visual perception of human eyes, we also applied Structural Similarity (SSIM) [32] as the evaluation metric, the SSIM algorithm is designed to consider the visual characteristics of human eyes, which is more consistent with the visual perception of human eyes than the traditional way, it is a measure of the similarity of two images, and the value range is [0,1], the larger the value of SSIM, it means less distortion or better quality of the image. The higher SSIM, the less distorted the better quality. SSIM is calculated as in Eq. (7), where
We verified the convergence of the GAN by empirical methods, observing the average metric performance of the output of the generator
In this section, we compare the performance of interpolation methods (Bilinear, Bicubic) and GAN network methods on the test dataset (date from 2021-1-1 to 2021-5-31). To evaluate the generalization performance of our method with different parameters, we applied different
We sampled 3 days of the test dataset (Date: 1–1, 3–7, 5–31) to represent the results of the Pre-, mid-and-late period, respectively, and the actual performance of downscaling is shown in Fig. 8, we can see that in the Pre-period, the proposed model is visually insignificantly distinct from other methods, still has slightly better performance. As the days increase, the advantages of our GAN model gradually apparent, capable of restoring more detailes.
We fix the parameters of the optimizer and the network weights’ initial except the learning rate decay strategy, using the SGD optimizer with 0.2 momenta, the initial learning rate is 0.005, and the number of iterations is set to 300. The performance of each metric during the training process is recorded as shown in Fig. 9 and the result is shown in Tab. 1, where the learning rate is represented by the last line, and the others represent MSE, PSNR, SSIM and
GAN is one of the most prominent deep learning approaches and has made significant progress in image and video super-resolution. The enhancement of resolution has wide applicability in observation and model data processing in climate science. This work addresses the growing demand by generating a conditional super-resolution GAN that operates on a 2-dimensional image sequence for each input. Rather than processing each image independently in a sequence, our generator and discriminator structures develop the concept of recurrent neural networks to apply to temporal data, the results demonstrate that GAN network-based models generally outperform traditional interpolation methods, while our proposed GAN network performs better than the ordinary GAN-based model.
The proposed model also has limitations. Since it is a GAN-based model, the limitations that exist in GAN networks are also potential threats to our model: (1) Model parameters oscillate, falter, and do not converge, although it did not appear in our experiments, it is still something that should be taken seriously. (2) The complexity of the model is higher compared to SRGAN, and the training and inference speed is not advantageous compared to it. (3) It is limited by the complexity of the model, which requires sufficient time or rounds to adequately converge. (4) The current magnification is not flexible enough; it hopes to be able to further expand in more applications.
1. Optimization of the network structure to improve performance and memory usage.
2. Generalization of different scale factors, producing high-resolution images with multiple scale factors at once (the current version is specific to a factor of 8), although it is possible to switch flexibly between 2x or 4x, all require the support of the dataset and retraining, while for the output is not able to output multiple for once time.
3. It is preferable to implement frame insertion in the temporal dimension in addition to the spatial dimension.
4. Extrapolation of time series to allow short-term prediction for prospects.
5. Employing auxiliary variables may prompt the output of the generator to approximate the real distribution better. For example, the altitude is input to our network as an auxiliary variable for fusing more meteorological-related information.
Funding Statement: This research was supported by the National Natural Science Foundation of China under Grant Nos.61772280 and 62072249.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
1. C. Liu, S. Yang, D. Di, Y. Yang, C. Zhou et al., “A machine learning-based cloud detection algorithm for the himawari-8 spectral image,” Advances in Atmospheric Sciences, vol. 38, pp. 1–14, 2021. [Google Scholar]
2. H. Li, C. Yu, J. Xia, Y. Wang, J. Zhu et al., “A model output machine learning method for grid temperature forecasts in the Beijing area,” Advances in Atmospheric Sciences, vol. 36, no. 10, pp. 1156–1170, 2019. [Google Scholar]
3. H. Dai, “Machine learning of weather forecasting rules from large meteorological data bases,” Advances in Atmospheric Sciences, vol. 13, no. 4, pp. 471–488, 1996. [Google Scholar]
4. J. Xia, H. Li, Y. Kang, C. Yu, L. Ji et al., “Machine learning−based weather support for the 2022 winter olympics,” Advances in Atmospheric Sciences, vol. 37, no. 9, pp. 927–932, 2020. [Google Scholar]
5. L. Han, M. Chen, K. Chen, H. Chen, Y. Zhang et al., “A deep learning method for bias correction of ECMWF 24–240 h forecasts,” Advances in Atmospheric Sciences, vol. 38, no. 9, pp. 1444–1459, 2021. [Google Scholar]
6. I. Michal and P. Shmuel, “Improving resolution by image registration,” GVGIP : Graphical Models and Image Processing, vol. 53, no. 3, pp. 231–239, 1991. [Google Scholar]
7. B. C. Hewitson and R. G. Crane, “Climate downscaling: Techniques and application,” Climate Research, vol. 07, no. 2, pp. 85–95, 1996. [Google Scholar]
8. C. Qian, W. Zhou, S. K. Fong and K. C. Leong, “Two approaches for statistical prediction of non-gaussian climate extremes: A case study of macao hot extremes during 1912–2012,” Journal of Climate, vol. 28, no. 2, pp. 623–636, 2015. [Google Scholar]
9. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley et al., “Generative Adversarial Nets,” in Advances in Neural Information Processing Systems 27th, Montreal, Quebec, Canada, pp. 2672–2680, 2014. [Google Scholar]
10. H. Hersbach, B. Bell, P. Berrisford, S. Hirahara, A. Horányi et al., “The ERA5 global reanalysis,” Quarterly Journal of the Royal Meteorological Society, vol. 146, no. 730, pp. 1999–2049, 2020. [Google Scholar]
11. J. Muñoz Sabater, “Copernicus climate change service (C3Sclimate data store (CDSERA5-land hourly data from 1981 to present,” 2019. [Online]. Available: http://dx.doi.org/10.24381/cds.e2161bac. [Google Scholar]
12. K. He, X. Zhang, S. Ren and J. Sun, “Deep residual learning for image recognition,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770–778, 2016. [Google Scholar]
13. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. of the 32nd Int. Conf. on Machine Learning, Proc. of Machine Learning Research, Lille, France, pp. 448–456, 2015. [Google Scholar]
14. W. Shi, J. Caballero, F. Huszár, J. Totz and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 1874–1883, 2016. [Google Scholar]
15. C. Dong, C. C. Loy, K. He and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295–307, 2016. [Google Scholar]
16. J. Kim, J. K. Lee and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 1646–1654, 2016. [Google Scholar]
17. C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 105–114, 2017. [Google Scholar]
18. K. C. Aswathy and E. Poovammal, “A novel alphaSRGAN for underwater image super resolution,” Computers, Materials & Continua, vol. 69, no. 2, pp. 1537–1552, 2021. [Google Scholar]
19. K. Sathya and M. Rajalakshmi, “CNN: Enhanced super resolution method for rice plant disease classification,” Computer Systems Science and Engineering, vol. 42, no. 1, pp. 33–47, 2022. [Google Scholar]
20. W. El-Shafai, A.-M. Ali, E.-S.-M. El-Rabaie, N.-F. Soliman, A.-D. Algarni et al., “Automated COVID-19 detection based on single-image super-resolution and CNN models,” Computers, Materials & Continua, vol. 70, no. 1, pp. 1141–1157, 2022. [Google Scholar]
21. X. Liu, Z. Chen, W. Song, F. Li and Y. Yang, “Data matching of solar images super-resolution based on deep learning,” Computers, Materials & Continua, vol. 68, no. 3, pp. 4017–4029, 2021. [Google Scholar]
22. J. Zhou, J. Liu, J. Li, M. Huang, J. Cheng et al., “Mixed attention densely residual network for single image super-resolution,” Computer Systems Science and Engineering, vol. 39, no. 1, pp. 133–146, 2021. [Google Scholar]
23. N. Rebora, L. Ferraris, J. V. HarDeNberg and A. Provenzale, “RainFARM: Rainfall downscaling by a filtered autoregressive model,” Journal of Hydrometeorology, vol. 7, no. 4, pp. 724–738, 2006. [Google Scholar]
24. J. Leinonen, D. Nerini and A. Berne, “Stochastic super-resolution for downscaling time-evolving atmospheric fields with a generative adversarial network,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 9, pp. 7211–7223, 2021. [Google Scholar]
25. R. Tie, C. Shi, G. Wan, X. Hu, L. Kang et al., “CLDASSD: Reconstructing fine textures of the temperature field using super-resolution technology,” Advances in Atmospheric Sciences, vol. 38, pp. 1–14, 2021. [Google Scholar]
26. X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong et al., “Convolutional LSTM network: A machine learning approach for precipitation nowcasting,” Advances in Neural Information Processing Systems, vol. 28, pp. 802–810, 2015. [Google Scholar]
27. Y. Bengio, P. Y. Simard and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157–166, 1994. [Google Scholar]
28. I. Sutskever, J. Martens, G. E. Dahl and G. E. Hinton, “On the importance of initialization and momentum in deep learning,” in Proc. of the 30th Int. Conf. on Machine Learning, Proc. of Machine Learning Research, Atlanta, GA, USA, pp. 1139–1147, 2013. [Google Scholar]
29. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014. [Online]. Available: https://arxiv.org/abs/1412.6980. [Google Scholar]
30. S. J. Reddi, S. Kale and S. Kumar, “On the convergence of adam and beyond,” 2019. [Online]. Available: https://arxiv.org/abs/1904.09237. [Google Scholar]
31. H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, vol. 22, no. 3, pp. 400–407, 1951. [Google Scholar]
32. Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004. [Google Scholar]
This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |