Low dynamic range (LDR) images captured by consumer cameras have a limited luminance range. As the conventional method for generating high dynamic range (HDR) images involves merging multiple-exposure LDR images of the same scene (assuming a stationary scene), we introduce a learning-based model for single-image HDR reconstruction. An input LDR image is sequentially segmented into the local region maps based on the cumulative histogram of the input brightness distribution. Using the local region maps, SParam-Net estimates the parameters of an inverse tone mapping function to generate a pseudo-HDR image. We process the segmented region maps as the input sequences on long short-term memory. Finally, a fast super-resolution convolutional neural network is used for HDR image reconstruction. The proposed method was trained and tested on datasets including HDR-Real, LDR-HDR-pair, and HDR-Eye. The experimental results revealed that HDR images can be generated more reliably than using contemporary end-to-end approaches.
The dynamic range of digital images is represented by the luminance range from the darkest to the brightest area. The images generated by digital display devices are generally stored as 8 bits, and the pixel intensities have a value between zero and 255 in the R, G, and B color channels. However, this range is limited when attempting to represent the wide luminance range of real-world objects. Images represented by 8 bits are known as low dynamic range (LDR) images. In contrast, a high dynamic range (HDR) image has a wider dynamic range than an LDR image. HDR techniques are actively used in photography, physically based rendering, films, medical, and industrial imaging, and the most recent displays support HDR content [
To reconstruct HDR images, sequentially capturing multiple images with different exposures, estimating the camera response function, and merging the brightness values in the images are necessary [
Single-image HDR inference is referred to as inverse tone mapping. Because the details of an image are frequently lost in quite bright and/or dark regions (i.e., the over and/or under-exposed image regions), HDR reconstruction from a single LDR is a challenging problem. Inverse tone mapping can transform a large amount of legacy LDR content into HDR images that are suitable for enhanced viewing on HDR displays and various applications, such as image-based lighting with HDR environment maps. Inverse tone mapping algorithms often expand the luminance range, adjust the image contrast, and fill in the clipped (saturated) regions [
Banterle [
Recently, many methods have been actively developed to reconstruct an HDR image from a single LDR input using deep convolutional neural networks (CNNs). These technologies expand the dynamic range of conventional LDRs, increasing the contrast ratio of the image [
Endo [
Jang [
Liu [
Eilertsen [
To reconstruct an HDR image from a single LDR image, we introduce a deep learning model: SParam-Net and convolutional long short-term memory (ConvLSTM)-Net. The input LDR image is sequentially segmented into local region maps based on the cumulative histogram of the brightness distribution. SParam-Net derives a global inverse tone mapping function from the local region maps. Using the local segmentation regions within specific brightness value ranges, the relationship between the inverse tone mapping function and details of the LDR image can be effectively learned. A pseudo-HDR image is generated using the inverse tone mapping, and the local details in the segmented regions are inputted sequentially to ConvLSTM-Net. Thereafter, the correlation of the brightness distribution in the local region maps and global luminance distribution of the pseudo-HDR image can be encoded. The encoded feature map is transferred into a fast super-resolution convolutional neural network (FSRCNN) to reconstruct the HDR image. Our contribution is that the local region maps based on the cumulative distribution can be used in the inverse tone mapping function estimation and inputted sequentially into LSTM model for HDR image reconstruction.
The remainder of this paper is organized as follows. In Section 2, we describe the proposed deep learning model: Sparam-Net and ConvLSTM-Net. Then, we explain the dataset and experimental results in Section 3 and finally conclude the paper in Section 4.
The input LDR image is sequentially segmented into local maps based on the cumulative histogram of the brightness distribution. Specifically, we obtained a histogram of the brightness values of the LDR image and computed its cumulative brightness distribution. Regarding the relatively brighter regions to the dark regions, the local region maps are segmented based on
Banterle [
We derived a quadratic equation from
We introduced a deep learning model, known as SParam-Net, to estimate the parameters of S-curve:
Considering the early deep learning studies on image restoration and reconstruction, the L2 loss function was employed. Zhao [
To generate an HDR image, the segmented local maps were sequentially inputted to ConvLSTM-Net based on the brightness. Here, the pseudo-HDR image generated using SParam-Net was also used. The LSTM is recurrent neural network architecture with feedback connections, which is well-suited for classifying and making predictions based on time-series data, such as voice and texts [
ConvLSTM-Net is composed of two parts: an encoder and a decoder. The encoder extracts the relationship of the local brightness distribution in each brightness band, which is divided using the cumulative histogram and global luminance of the pseudo-HDR image. Deconvolution operation is performed on the feature map with the lower resolution to reconstruct the feature map with the same resolution as the input LDR image [
The convolution operation of ConvLSTM-Net is represented as Conv(
The encoded feature map was transferred into the decoder to reconstruct the HDR image. We employed the FSRCNN, which was constructed using feature extraction, shrinking, mapping, expanding, and deconvolution processes [
The proposed network model, which was constructed using the SParam-Net and ConvLSTM-Net, was jointly learned for HDR reconstruction from a single image. SParam-Net was learned to estimate the inverse tone mapping function. The weight values of SParam-Net were used as the initial weight values of the proposed model during the learning process. The final loss function was defined as
In this experiment, HDR-Real [
The LDR images were represented based on the dynamic range of a pixel (8-bit) (i.e., [0, 255]). Generally, because different HDR-images have different dynamic ranges and maximum values, normalization of the dataset is required to ensure an effective learning performance. In addition, if the luminance values of the HDR-images are normalized with the maximum value in the range of [0, 1], many pixels in the image have near-zero values. This is because the HDR image has a large dynamic range, and the size of the bright regions is generally small. The network finds it difficult to learn to reconstruct HDR-images that have quite small pixel values. Prior to the learning process in our network model, the average pixel value of the HDR image was normalized to 0.5.
There are many image quality assessment methods for the evaluation of the performance of inverse tone mappings [
To train and test our proposed deep learning model, we conducted experiments on a computer equipped with a Core i5-9600k CPU, 16 GB RAM, and GeForce GTX 1080Ti GPU, and we used Python language and PyTorch deep learning library. Our network model has been implemented and performed in the commercial computer system, considering the computation and memory complexity. We trained our model with a stochastic gradient descent of a batch size of four using the Adam optimizer (
In our experiment, the proposed network model was converged after training for 50 epochs. SParam-Net was trained to generate the global inverse tone mapping from the local region maps. The trained weight parameters of SParam-Net were used as the initial values in joint learning of SParam-Net and ConvLSTM-Net. It took 10 min to train SParam-Net and 18 min per epoch for the joint training our model (SParam-Net and ConvLSTM-Net). The running time at inference in our network model is approximately 76 ms for 256 × 256 images and 262 ms for 512 × 512 images.
First, the luminance distribution of the ground truth HDR image was compared to that of the pseudo-HDR image that was generated using the inverse tone mapping function via SParam-Net. Because these luminance values were obtained in the color channels, the pixel values could be evaluated without considering additional chromatic components [
The proposed model was compared to contemporary deep learning-based methods: Expand-Net [
Methods | HDR-VDP-2 | PSNR | SSIM | |
---|---|---|---|---|
ExpandNet [ |
56.11 ± 5.37 | 20.91 | 0.8221 | |
DrTMO [ |
56.39 ± 5.09 | 20.54 | 0.8221 | |
HDRCNN [ |
55.95 ± 5.96 | 18.95 | 0.7939 | |
SingleHDR [ |
58.83 ± 4.96 | 25.73 | 0.8932 | |
Our model | 57.91 ± 5.74 | 23.84 | 0.8547 | |
57.81 ± 5.63 | 23.73 | 0.8521 | ||
57.77 ± 5.73 | 23.78 | 0.8524 | ||
57.90 ± 5.71 | 23.70 | 0.8521 | ||
57.30 ± 5.65 | 23.20 | 0.8506 |
The proposed method achieved a comparable performance, considering the other three methods (Expand-Net, DrTMO, HDRCNN) except for SingleHDR to some extent. SingleHDR utilizes three CNNs for the sub-tasks of dequantization, linearization, and hallucination [
Our model (α = 1, β = 0.2) | HDR-VDP-2 | PSNR | SSIM |
---|---|---|---|
57.84 ± 5.73 | 23.55 | 0.8498 | |
57.67 ± 5.74 | 23.43 | 0.8352 | |
57.91 ± 5.74 | 23.84 | 0.8547 | |
57.67 ± 5.70 | 23.52 | 0.8537 | |
57.89 ± 5.69 | 23.88 | 0.8501 |
The following is the summary of the main contributions of this paper: 1) The local region maps, which are segmented based on the cumulative brightness distribution, are used to estimate an inverse tone mapping function. 2) The globally tone mapped-local region maps are used as the input sequences on the LSTM model for HDR image reconstruction. In our network model, the correlation of the brightness distribution in the local region maps and the global luminance distribution of the pseudo-HDR image can be encoded. However, the proposed model still encounters difficulty in reconstructing over and/or under-exposed regions. Therefore, we need to further consider scattered noise or contouring artifacts that occurred often in the quantization process.
In this paper, we presented a deep learning model for reconstructing an HDR image from a single LDR image. SParam-Net was used to estimate the inverse tone mapping function to generate a pseudo-HDR image. Both the pseudo-HDR image and segmented local region maps based on the cumulative histogram were inputted sequentially into the convolutional LSTM. The weights obtained from the SParam-Net were transferred for joint learning of the end-to-end reconstruction model. It was demonstrated that based on the order of the brightness values obtained from the LDR image, the local region maps can be effectively used in the convolutional LSTM to learn the relationship between the LDR and HDR images. Therefore, the model can sequentially learn the details of the brightness bands, considering the global luminance distribution. The proposed model was compared to contemporary deep learning-based methods based on HDR-VDP-2, PSNR, and SSIM measures. The results of the experiments show that the proposed deep learning model can reconstruct an HDR image from a single LDR image more reliably than the contemporary end-to-end methods. However, the proposed model still encounters difficulty in reconstructing over and/or under-exposed regions. Further considerations on the scattered noise or contouring artifacts that occur often in the quantization process are needed.