FS-MSFormer: Image Dehazing Based on Frequency Selection and Multi-Branch Efficient Transformer
Chunming Tang*, Yu Wang
Tianjin Key Laboratory of Intelligent Control for Electrical Equipment, School of Artificial Intelligence, Tiangong University, Tianjin, 300387, China
* Corresponding Author: Chunming Tang. Email:
Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.062328
Received 16 December 2024; Accepted 06 March 2025; Published online 31 March 2025
Abstract
Image dehazing aims to generate clear images critical for subsequent visual tasks. CNNs have made significant progress in the field of image dehazing. However, due to the inherent limitations of convolution operations, it is challenging to effectively model global context and long-range spatial dependencies effectively. Although the Transformer can address this issue, it faces the challenge of excessive computational requirements. Therefore, we propose the FS-MSFormer network, an asymmetric encoder-decoder architecture that combines the advantages of CNNs and Transformers to improve dehazing performance. Specifically, the encoding process employs two branches for multi-scale feature extraction. One branch integrates an improved Transformer to enrich local and global contextual information while achieving linear complexity, and the other branch dynamically selects the most suitable frequency components in the frequency domain for enhancement. A single decoding branch is utilized to achieve feature recovery in the decoding process. After enhancing local and global features, they are fused with the encoded features, which reduces information loss and enhances the model’s robustness. A perceptual consistency loss function is also designed to minimize image color distortion. We conducted experiments on synthetic datasets SOTS-Indoor, Foggy Cityscapes, and the real-world dataset Dense-Haze, showing improved dehazing results. Compared with FSNet, our method has shown improvements of 0.95 dB in PSNR and 0.007 in SSIM on SOTS-Indoor dataset, and enhancements of 1.89 dB in PSNR and 0.0579 in SSIM on the Dense-Haze dataset demonstrate the effectiveness of our method.
Keywords
Asymmetric encoder-decoder architecture; perceived consistency loss; unified transformer