Information Classification and Extraction on Official Web Pages  of Organizations

Jinlin Wang; Xing Wang; Hongli Zhang; Binxing Fang; Yuchen Yang; Jianan Liu

doi:10.32604/cmc.2020.011158

Open Access icon Open Access

ARTICLE

Information Classification and Extraction on Official Web Pages of Organizations

Jinlin Wang¹, Xing Wang^{1, *}, Hongli Zhang¹, Binxing Fang¹, Yuchen Yang¹, Jianan Liu²

1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150006, China.
2 China Electronic Equipment System Engineering Company, Beijing, 100039, China

* Corresponding Author: Xing Wang. Email: email .

Computers, Materials & Continua 2020, 64(3), 2057-2073. https://doi.org/10.32604/cmc.2020.011158

Received 30 April 2020; Accepted 15 May 2020; Issue published 30 June 2020

Download PDF

Abstract

As a real-time and authoritative source, the official Web pages of organizations contain a large amount of information. The diversity of Web content and format makes it essential for pre-processing to get the unified attributed data, which has the value of organizational analysis and mining. The existing research on dealing with multiple Web scenarios and accuracy performance is insufficient. This paper aims to propose a method to transform organizational official Web pages into the data with attributes. After locating the active blocks in the Web pages, the structural and content features are proposed to classify information with the specific model. The extraction methods based on trigger lexicon and LSTM (Long Short-Term Memory) are proposed, which efficiently process the classified information and extract data that matches the attributes. Finally, an accurate and efficient method to classify and extract information from organizational official Web pages is formed. Experimental results show that our approach improves the performing indicators and exceeds the level of state of the art on real data set from organizational official Web pages.

Keywords

Web pre-process, feature classification, data extraction, trigger lexicon, LSTM.

Cite This Article

APA Style

Wang, J., Wang, X., Zhang, H., Fang, B., Yang, Y. et al. (2020). Information Classification and Extraction on Official Web Pages of Organizations. Computers, Materials & Continua, 64(3), 2057–2073. https://doi.org/10.32604/cmc.2020.011158

Vancouver Style

Wang J, Wang X, Zhang H, Fang B, Yang Y, Liu J. Information Classification and Extraction on Official Web Pages of Organizations. Comput Mater Contin. 2020;64(3):2057–2073. https://doi.org/10.32604/cmc.2020.011158

IEEE Style

J. Wang, X. Wang, H. Zhang, B. Fang, Y. Yang, and J. Liu, “Information Classification and Extraction on Official Web Pages of Organizations,” Comput. Mater. Contin., vol. 64, no. 3, pp. 2057–2073, 2020. https://doi.org/10.32604/cmc.2020.011158

BibTex EndNote RIS

Copyright © 2020 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Information Classification and Extraction on Official Web Pages of Organizations

Abstract

Keywords

Cite This Article

2987

2235

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link