A New Speech Encoder Based on Dynamic Framing Approach

Renyuan Liu; Jian Yang; Xiaobing Zhou; Xiaoguang Yue

doi:10.32604/cmes.2023.021995

Open Access icon Open Access

ARTICLE

A New Speech Encoder Based on Dynamic Framing Approach

Renyuan Liu¹, Jian Yang¹, Xiaobing Zhou^1,*, Xiaoguang Yue^2,3,4

1 School of Information Science and Engineering, Yunnan University, Kunming, 650500, China
2 Rattanakosin International College of Creative Entrepreneurship, Rajamangala University of Technology Rattanakosin, Nakhon Pathom, 73170, Thailand
3 Department of Computer Science and Engineering, School of Sciences, European University Cyprus, Nicosia, 1516, Cyprus
4 CIICESI, ESTG, Politécnico do Porto, Felgueiras, 4610-156, Portugal

* Corresponding Author: Xiaobing Zhou. Email: email

Computer Modeling in Engineering & Sciences 2023, 136(2), 1259-1276. https://doi.org/10.32604/cmes.2023.021995

Received 16 February 2022; Accepted 26 September 2022; Issue published 06 February 2023

Abstract

Latent information is difficult to get from the text in speech synthesis. Studies show that features from speech can get more information to help text encoding. In the field of speech encoding, a lot of work has been conducted on two aspects. The first aspect is to encode speech frame by frame. The second aspect is to encode the whole speech to a vector. But the scale in these aspects is fixed. So, encoding speech with an adjustable scale for more latent information is worthy of investigation. But current alignment approaches only support frame-by-frame encoding and speech-to-vector encoding. It remains a challenge to propose a new alignment approach to support adjustable scale speech encoding. This paper presents the dynamic speech encoder with a new alignment approach in conjunction with frame-by-frame encoding and speech-to-vector encoding. The speech feature from our model achieves three functions. First, the speech feature can reconstruct the origin speech while the length of the speech feature is equal to the text length. Second, our model can get text embedding from speech, and the encoded speech feature is similar to the text embedding result. Finally, it can transfer the style of synthesis speech and make it more similar to the given reference speech.

Keywords

Speech synthesis; dynamic framing convolution network; speech encoding

Cite This Article

APA Style

Liu, R., Yang, J., Zhou, X., Yue, X. (2023). A New Speech Encoder Based on Dynamic Framing Approach. Computer Modeling in Engineering & Sciences, 136(2), 1259–1276. https://doi.org/10.32604/cmes.2023.021995

Vancouver Style

Liu R, Yang J, Zhou X, Yue X. A New Speech Encoder Based on Dynamic Framing Approach. Comput Model Eng Sci. 2023;136(2):1259–1276. https://doi.org/10.32604/cmes.2023.021995

IEEE Style

R. Liu, J. Yang, X. Zhou, and X. Yue, “A New Speech Encoder Based on Dynamic Framing Approach,” Comput. Model. Eng. Sci., vol. 136, no. 2, pp. 1259–1276, 2023. https://doi.org/10.32604/cmes.2023.021995

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A New Speech Encoder Based on Dynamic Framing Approach

Abstract

Keywords

Cite This Article

964

612

1

Further Information

Guidelines

Follow Us

Join Us

Share Link