Open Access
ARTICLE
Webpage Matching Based on Visual Similarity
1 School of Cyberspace Science, Harbin Institute of Technology, Harbin, 150001, China
2 Department of Computer and Information Science, Temple University, Philadelphia, 42101, USA
* Corresponding Author: Xiangzhan Yu. Email:
Computers, Materials & Continua 2022, 71(2), 3393-3405. https://doi.org/10.32604/cmc.2022.017220
Received 24 January 2021; Accepted 03 July 2021; Issue published 07 December 2021
Abstract
With the rapid development of the Internet, the types of webpages are more abundant than in previous decades. However, it becomes severe that people are facing more and more significant network security risks and enormous losses caused by phishing webpages, which imitate the interface of real webpages and deceive the victims. To better identify and distinguish phishing webpages, a visual feature extraction method and a visual similarity algorithm are proposed. First, the visual feature extraction method improves the Vision-based Page Segmentation (VIPS) algorithm to extract the visual block and calculate its signature by perceptual hash technology. Second, the visual similarity algorithm presents a one-to-one correspondence based on the visual blocks’ coordinates and thresholds. Then the weights are assigned according to the tree structure, and the similarity of the visual blocks is calculated on the basis of the measurement of the visual features’ Hamming distance. Further, the visual similarity of webpages is generated by integrating the similarity and weight of different visual blocks. Finally, multiple pairs of phishing webpages and legitimate webpages are evaluated to verify the feasibility of the algorithm. The experimental results achieve excellent performance and demonstrate that our method can achieve 94% accuracy.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.