Jinlin Wang1, Xing Wang1, *, Hongli Zhang1, Binxing Fang1, Yuchen Yang1, Jianan Liu2
CMC-Computers, Materials & Continua, Vol.64, No.3, pp. 2057-2073, 2020, DOI:10.32604/cmc.2020.011158
- 30 June 2020
Abstract As a real-time and authoritative source, the official Web pages of organizations
contain a large amount of information. The diversity of Web content and format makes it
essential for pre-processing to get the unified attributed data, which has the value of
organizational analysis and mining. The existing research on dealing with multiple Web
scenarios and accuracy performance is insufficient. This paper aims to propose a method to
transform organizational official Web pages into the data with attributes. After locating the
active blocks in the Web pages, the structural and content features are proposed to classify More >