Speech Enhancement via Mask-Mapping Based Residual Dense Network

Lin Zhou; Xijin Chen; Chaoyan Wu; Qiuyue Zhong; Xu Cheng; Yibin Tang

doi:10.32604/cmc.2023.027379

Open Access icon Open Access

ARTICLE

Speech Enhancement via Mask-Mapping Based Residual Dense Network

Lin Zhou^1,*, Xijin Chen¹, Chaoyan Wu¹, Qiuyue Zhong¹, Xu Cheng², Yibin Tang³

1 School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
2 Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, FI-90014, Finland
3 College of IOT Engineering, Hohai University, Changzhou, 213022, China

* Corresponding Author: Lin Zhou. Email: email

Computers, Materials & Continua 2023, 74(1), 1259-1277. https://doi.org/10.32604/cmc.2023.027379

Received 16 January 2022; Accepted 06 April 2022; Issue published 22 September 2022

Abstract

Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network (DNN). But the mapping-based methods only utilizes the phase of noisy speech, which limits the upper bound of speech enhancement performance. Masking-based methods need to accurately estimate the masking which is still the key problem. Combining the advantages of above two types of methods, this paper proposes the speech enhancement algorithm MM-RDN (masking-mapping residual dense network) based on masking-mapping (MM) and residual dense network (RDN). Using the logarithmic power spectrogram (LPS) of consecutive frames, MM estimates the ideal ratio masking (IRM) matrix of consecutive frames. RDN can make full use of feature maps of all layers. Meanwhile, using the global residual learning to combine the shallow features and deep features, RDN obtains the global dense features from the LPS, thereby improves estimated accuracy of the IRM matrix. Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments. Specifically, in the untrained acoustic test with limited priors, e.g., unmatched signal-to-noise ratio (SNR) and unmatched noise category, MM-RDN can still outperform the existing convolutional recurrent network (CRN) method in the measures of perceptual evaluation of speech quality (PESQ) and other evaluation indexes. It indicates that the proposed algorithm is more generalized in untrained conditions.

Keywords

Mask-mapping-based method; residual dense block; speech enhancement

Cite This Article

APA Style

Zhou, L., Chen, X., Wu, C., Zhong, Q., Cheng, X. et al. (2023). Speech Enhancement via Mask-Mapping Based Residual Dense Network. Computers, Materials & Continua, 74(1), 1259–1277. https://doi.org/10.32604/cmc.2023.027379

Vancouver Style

Zhou L, Chen X, Wu C, Zhong Q, Cheng X, Tang Y. Speech Enhancement via Mask-Mapping Based Residual Dense Network. Comput Mater Contin. 2023;74(1):1259–1277. https://doi.org/10.32604/cmc.2023.027379

IEEE Style

L. Zhou, X. Chen, C. Wu, Q. Zhong, X. Cheng, and Y. Tang, “Speech Enhancement via Mask-Mapping Based Residual Dense Network,” Comput. Mater. Contin., vol. 74, no. 1, pp. 1259–1277, 2023. https://doi.org/10.32604/cmc.2023.027379

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Speech Enhancement via Mask-Mapping Based Residual Dense Network

Abstract

Keywords

Cite This Article

872

803

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link