Deep Learning Multimodal for Unstructured and Semi-Structured Textual Documents Classification

Katamesh, Nany; Abu-Elnasr, Osama; Elmougy, Samir

doi:10.32604/cmc.2021.015761

Open Access icon Open Access

ARTICLE

Deep Learning Multimodal for Unstructured and Semi-Structured Textual Documents Classification

by Nany Katamesh, Osama Abu-Elnasr^*, Samir Elmougy

Faculty of Computers and Information, Department of Computer Science, Mansoura University, 35516, Egypt

* Corresponding Author: Osama Abu-Elnasr. Email: email

(This article belongs to the Special Issue: Deep Learning Trends in Intelligent Systems)

Computers, Materials & Continua 2021, 68(1), 589-606. https://doi.org/10.32604/cmc.2021.015761

Received 05 December 2020; Accepted 23 January 2021; Issue published 22 March 2021

Abstract

Due to the availability of a huge number of electronic text documents from a variety of sources representing unstructured and semi-structured information, the document classification task becomes an interesting area for controlling data behavior. This paper presents a document classification multimodal for categorizing textual semi-structured and unstructured documents. The multimodal implements several individual deep learning models such as Deep Neural Networks (DNN), Recurrent Convolutional Neural Networks (RCNN) and Bidirectional-LSTM (Bi-LSTM). The Stacked Ensemble based meta-model technique is used to combine the results of the individual classifiers to produce better results, compared to those reached by any of the above mentioned models individually. A series of textual preprocessing steps are executed to normalize the input corpus followed by text vectorization techniques. These techniques include using Term Frequency Inverse Term Frequency (TFIDF) or Continuous Bag of Word (CBOW) to convert text data into the corresponding suitable numeric form acceptable to be manipulated by deep learning models. Moreover, this proposed model is validated using a dataset collected from several spaces with a huge number of documents in every class. In addition, the experimental results prove that the proposed model has achieved effective performance. Besides, upon investigating the PDF Documents classification, the proposed model has achieved accuracy up to 0.9045 and 0.959 for the TFIDF and CBOW features, respectively. Moreover, concerning the JSON Documents classification, the proposed model has achieved accuracy up to 0.914 and 0.956 for the TFIDF and CBOW features, respectively. Furthermore, as for the XML Documents classification, the proposed model has achieved accuracy values up to 0.92 and 0.959 for the TFIDF and CBOW features, respectively.

Keywords

Document classification; deep learning; text vectorization; convolutional neural network; bi-directional neural network; stacked ensemble

Cite This Article

APA Style

Katamesh, N., Abu-Elnasr, O., Elmougy, S. (2021). Deep learning multimodal for unstructured and semi-structured textual documents classification. Computers, Materials & Continua, 68(1), 589-606. https://doi.org/10.32604/cmc.2021.015761

Vancouver Style

Katamesh N, Abu-Elnasr O, Elmougy S. Deep learning multimodal for unstructured and semi-structured textual documents classification. Comput Mater Contin. 2021;68(1):589-606 https://doi.org/10.32604/cmc.2021.015761

IEEE Style

N. Katamesh, O. Abu-Elnasr, and S. Elmougy, “Deep Learning Multimodal for Unstructured and Semi-Structured Textual Documents Classification,” Comput. Mater. Contin., vol. 68, no. 1, pp. 589-606, 2021. https://doi.org/10.32604/cmc.2021.015761

BibTex EndNote RIS

Copyright © 2021 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Deep Learning Multimodal for Unstructured and Semi-Structured Textual Documents Classification

Abstract

Keywords

Cite This Article

2774

1535

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link