Integrating Deep Learning and Machine Translation for Understanding Unrefined Languages

HongGeun Ji; Soyoung Oh; Jina Kim; Seong Choi; Eunil Park

doi:10.32604/cmc.2022.019521

Open Access icon Open Access

ARTICLE

Integrating Deep Learning and Machine Translation for Understanding Unrefined Languages

HongGeun Ji^1,2, Soyoung Oh¹, Jina Kim³, Seong Choi^1,2, Eunil Park^1,2,*

1 Department of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, 03063, Korea
2 Raon Data, Seoul, 03073, Korea
3 Department of Computer Science and Engineering, University of Minnesota, Minneapolis, 55455, MN, USA

* Corresponding Author: Eunil Park. Email: email

Computers, Materials & Continua 2022, 70(1), 669-678. https://doi.org/10.32604/cmc.2022.019521

Received 16 April 2021; Accepted 23 May 2021; Issue published 07 September 2021

Abstract

In the field of natural language processing (NLP), the advancement of neural machine translation has paved the way for cross-lingual research. Yet, most studies in NLP have evaluated the proposed language models on well-refined datasets. We investigate whether a machine translation approach is suitable for multilingual analysis of unrefined datasets, particularly, chat messages in Twitch. In order to address it, we collected the dataset, which included 7,066,854 and 3,365,569 chat messages from English and Korean streams, respectively. We employed several machine learning classifiers and neural networks with two different types of embedding: word-sequence embedding and the final layer of a pre-trained language model. The results of the employed models indicate that the accuracy difference between English, and English to Korean was relatively high, ranging from 3% to 12%. For Korean data (Korean, and Korean to English), it ranged from 0% to 2%. Therefore, the results imply that translation from a low-resource language (e.g., Korean) into a high-resource language (e.g., English) shows higher performance, in contrast to vice versa. Several implications and limitations of the presented results are also discussed. For instance, we suggest the feasibility of translation from resource-poor languages for using the tools of resource-rich languages in further analysis.

Keywords

Twitch; multilingual; machine translation; machine learning

Cite This Article

APA Style

Ji, H., Oh, S., Kim, J., Choi, S., Park, E. (2022). Integrating Deep Learning and Machine Translation for Understanding Unrefined Languages. Computers, Materials & Continua, 70(1), 669–678. https://doi.org/10.32604/cmc.2022.019521

Vancouver Style

Ji H, Oh S, Kim J, Choi S, Park E. Integrating Deep Learning and Machine Translation for Understanding Unrefined Languages. Comput Mater Contin. 2022;70(1):669–678. https://doi.org/10.32604/cmc.2022.019521

IEEE Style

H. Ji, S. Oh, J. Kim, S. Choi, and E. Park, “Integrating Deep Learning and Machine Translation for Understanding Unrefined Languages,” Comput. Mater. Contin., vol. 70, no. 1, pp. 669–678, 2022. https://doi.org/10.32604/cmc.2022.019521

BibTex EndNote RIS

Citations

2

[click to view]

Copyright © 2022 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Integrating Deep Learning and Machine Translation for Understanding Unrefined Languages

Abstract

Keywords

Cite This Article

Citations

4271

2095

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link