Jongyun Choi1, Hyesoo Kong2, Hwamook Yoon2, Heungseon Oh3, Yuchul Jung1,*
CMC-Computers, Materials & Continua, Vol.72, No.2, pp. 4019-4037, 2022, DOI:10.32604/cmc.2022.025711
- 29 March 2022
Abstract The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers. To accommodate the diversity of the layouts of academic journals, we propose a novel LAyout-aware Metadata Extraction (LAME) framework equipped with the three characteristics (e.g., design of automatic layout analysis, construction of a large meta-data training set, and implementation of metadata extractor). In the framework, we designed an automatic layout analysis using PDFMiner. Based on the More >