Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

Le, Thanh X.; Le, An T.; Nguyen, Quang H.

doi:10.32604/csse.2023.026234

Open Access icon Open Access

ARTICLE

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

by Thanh X. Le, An T. Le, Quang H. Nguyen^*

School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, 10000, Vietnam

* Corresponding Author: Quang H. Nguyen. Email: email

Computer Systems Science and Engineering 2023, 44(2), 1263-1278. https://doi.org/10.32604/csse.2023.026234

Received 19 December 2021; Accepted 21 February 2022; Issue published 15 June 2022

Abstract

In recent years, speech synthesis systems have allowed for the production of very high-quality voices. Therefore, research in this domain is now turning to the problem of integrating emotions into speech. However, the method of constructing a speech synthesizer for each emotion has some limitations. First, this method often requires an emotional-speech data set with many sentences. Such data sets are very time-intensive and labor-intensive to complete. Second, training each of these models requires computers with large computational capabilities and a lot of effort and time for model tuning. In addition, each model for each emotion failed to take advantage of data sets of other emotions. In this paper, we propose a new method to synthesize emotional speech in which the latent expressions of emotions are learned from a small data set of professional actors through a Flowtron model. In addition, we provide a new method to build a speech corpus that is scalable and whose quality is easy to control. Next, to produce a high-quality speech synthesis model, we used this data set to train the Tacotron 2 model. We used it as a pre-trained model to train the Flowtron model. We applied this method to synthesize Vietnamese speech with sadness and happiness. Mean opinion score (MOS) assessment results show that MOS is 3.61 for sadness and 3.95 for happiness. In conclusion, the proposed method proves to be more effective for a high degree of automation and fast emotional sentence generation, using a small emotional-speech data set.

Keywords

Emotional speech synthesis; flowtron; speech synthesis; style transfer; vietnamese speech

Cite This Article

APA Style

Le, T.X., Le, A.T., Nguyen, Q.H. (2023). Emotional vietnamese speech synthesis using style-transfer learning. Computer Systems Science and Engineering, 44(2), 1263-1278. https://doi.org/10.32604/csse.2023.026234

Vancouver Style

Le TX, Le AT, Nguyen QH. Emotional vietnamese speech synthesis using style-transfer learning. Comput Syst Sci Eng. 2023;44(2):1263-1278 https://doi.org/10.32604/csse.2023.026234

IEEE Style

T. X. Le, A. T. Le, and Q. H. Nguyen, “Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning,” Comput. Syst. Sci. Eng., vol. 44, no. 2, pp. 1263-1278, 2023. https://doi.org/10.32604/csse.2023.026234

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

Abstract

Keywords

Cite This Article

2017

796

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link