Renyuan Liu1, Jian Yang1, Xiaobing Zhou1,*, Xiaoguang Yue2,3,4
CMES-Computer Modeling in Engineering & Sciences, Vol.136, No.2, pp. 1259-1276, 2023, DOI:10.32604/cmes.2023.021995
- 06 February 2023
Abstract Latent information is difficult to get from the text in speech synthesis. Studies show that features from speech can get more information to help text encoding. In the field of speech encoding, a lot of work has been conducted on two aspects. The first aspect is to encode speech frame by frame. The second aspect is to encode the whole speech to a vector. But the scale in these aspects is fixed. So, encoding speech with an adjustable scale for more latent information is worthy of investigation. But current alignment approaches only support frame-by-frame encoding… More >