Binaural Speech Separation Algorithm Based on Long and Short  Time Memory Networks

Lin Zhou; Siyuan Lu; Qiuyue Zhong; Ying Chen; Yibin Tang; Yan Zhou

doi:10.32604/cmc.2020.010182

Open Access icon Open Access

ARTICLE

Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks

Lin Zhou^{1, *}, Siyuan Lu¹, Qiuyue Zhong¹, Ying Chen^{1, 2}, Yibin Tang³, Yan Zhou³

1 School of Information Science and Engineering, Southeast University, Nanjing, 210096, China.
2 Department of Psychiatry, Columbia University and NYSPI, New York, 10032, USA.
3 College of Internet of Things Engineering, Hohai University, Changzhou, 213022, China.

* Corresponding Author: Lin Zhou. Email: email .

Computers, Materials & Continua 2020, 63(3), 1373-1386. https://doi.org/10.32604/cmc.2020.010182

Received 15 February 2020; Accepted 28 February 2020; Issue published 30 April 2020

Download PDF

Abstract

Speaker separation in complex acoustic environment is one of challenging tasks in speech separation. In practice, speakers are very often unmoving or moving slowly in normal communication. In this case, the spatial features among the consecutive speech frames become highly correlated such that it is helpful for speaker separation by providing additional spatial information. To fully exploit this information, we design a separation system on Recurrent Neural Network (RNN) with long short-term memory (LSTM) which effectively learns the temporal dynamics of spatial features. In detail, a LSTM-based speaker separation algorithm is proposed to extract the spatial features in each time-frequency (TF) unit and form the corresponding feature vector. Then, we treat speaker separation as a supervised learning problem, where a modified ideal ratio mask (IRM) is defined as the training function during LSTM learning. Simulations show that the proposed system achieves attractive separation performance in noisy and reverberant environments. Specifically, during the untrained acoustic test with limited priors, e.g., unmatched signal to noise ratio (SNR) and reverberation, the proposed LSTM based algorithm can still outperforms the existing DNN based method in the measures of PESQ and STOI. It indicates our method is more robust in untrained conditions.

Keywords

Binaural speech separation, long and short time memory networks, feature vectors, ideal ratio mask.

Cite This Article

APA Style

Zhou, L., Lu, S., Zhong, Q., Chen, Y., Tang, Y. et al. (2020). Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks. Computers, Materials & Continua, 63(3), 1373–1386. https://doi.org/10.32604/cmc.2020.010182

Vancouver Style

Zhou L, Lu S, Zhong Q, Chen Y, Tang Y, Zhou Y. Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks. Comput Mater Contin. 2020;63(3):1373–1386. https://doi.org/10.32604/cmc.2020.010182

IEEE Style

L. Zhou, S. Lu, Q. Zhong, Y. Chen, Y. Tang, and Y. Zhou, “Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks,” Comput. Mater. Contin., vol. 63, no. 3, pp. 1373–1386, 2020. https://doi.org/10.32604/cmc.2020.010182

BibTex EndNote RIS

Citations

2

[click to view]

Copyright © 2020 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks

Abstract

Keywords

Cite This Article

Citations

3191

1645

0

Further Information

Guidelines

Follow Us

Join Us

Share Link