Open Access iconOpen Access

ARTICLE

crossmark

LAME: Layout-Aware Metadata Extraction Approach for Research Articles

by Jongyun Choi1, Hyesoo Kong2, Hwamook Yoon2, Heungseon Oh3, Yuchul Jung1,*

1 Department of Computer Engineering, Kumoh National Institute of Technology (KIT), Gumi, Korea
2 Korea Institute of Science and Technology Information (KISTI), Daejeon, Korea
3 School of Computer Science and Engineering, Korea University of Technology and Education (KOREATECH), Cheonan, Korea

* Corresponding Author: Yuchul Jung. Email: email

Computers, Materials & Continua 2022, 72(2), 4019-4037. https://doi.org/10.32604/cmc.2022.025711

Abstract

The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers. To accommodate the diversity of the layouts of academic journals, we propose a novel LAyout-aware Metadata Extraction (LAME) framework equipped with the three characteristics (e.g., design of automatic layout analysis, construction of a large meta-data training set, and implementation of metadata extractor). In the framework, we designed an automatic layout analysis using PDFMiner. Based on the layout analysis, a large volume of metadata-separated training data, including the title, abstract, author name, author affiliated organization, and keywords, were automatically extracted. Moreover, we constructed a pre-trained model, Layout-MetaBERT, to extract the metadata from academic journals with varying layout formats. The experimental results with our metadata extractor exhibited robust performance (Macro-F1, 93.27%) in metadata extraction for unseen journals with different layout formats.

Keywords


Cite This Article

APA Style
Choi, J., Kong, H., Yoon, H., Oh, H., Jung, Y. (2022). LAME: layout-aware metadata extraction approach for research articles. Computers, Materials & Continua, 72(2), 4019-4037. https://doi.org/10.32604/cmc.2022.025711
Vancouver Style
Choi J, Kong H, Yoon H, Oh H, Jung Y. LAME: layout-aware metadata extraction approach for research articles. Comput Mater Contin. 2022;72(2):4019-4037 https://doi.org/10.32604/cmc.2022.025711
IEEE Style
J. Choi, H. Kong, H. Yoon, H. Oh, and Y. Jung, “LAME: Layout-Aware Metadata Extraction Approach for Research Articles,” Comput. Mater. Contin., vol. 72, no. 2, pp. 4019-4037, 2022. https://doi.org/10.32604/cmc.2022.025711



cc Copyright © 2022 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1607

    View

  • 807

    Download

  • 0

    Like

Share Link