On Multi-Thread Crawler Optimization for Scalable Text Searching

Guang Sun; Huanxin Xiang; Shuanghu Li

doi:10.32604/jbd.2019.07235

Open Access icon Open Access

ARTICLE

On Multi-Thread Crawler Optimization for Scalable Text Searching

Guang Sun¹, Huanxin Xiang², Shuanghu Li^1,*

1 Hunan University of Finance and Economics, Changsha, 410205, China.
2 The University of Alabama, Tuscaloosa, 35401, USA.

*Corresponding Author: Shuanghu Li. Email: email .

Journal on Big Data 2019, 1(2), 89-106. https://doi.org/10.32604/jbd.2019.07235

Download PDF

Abstract

Web crawlers are an important part of modern search engines. With the development of the times, data has exploded and humans have entered a “big data era”. For example, Wikipedia carries the knowledge from all over the world, records the real-time news that occurs every day, and provides users with a good database of data, but because of the large amount of data, it puts a lot of pressure on users to search. At present, single-threaded crawling data can no longer meet the requirements of text crawling. In order to improve the performance and program versatility of single-threaded crawlers, a high-speed multi-threaded web crawler is designed to crawl the network hyper-scale text database. Multi-threaded crawling uses multiple threads to process web pages in parallel, combining breadth-first and depth-first algorithms to control web crawling. The practice project is based on the Python language to achieve multi-threaded optimization network hyper-large-scale text database-Wikipedia book crawling method, the project is inspired by the article on the Wikipedia article in the Big Data Digest public number.

Keywords

Multi-threading, text database, optimization, breadth-first search, depth-first search

Cite This Article

APA Style

Sun, G., Xiang, H., Li, S. (2019). On Multi-Thread Crawler Optimization for Scalable Text Searching. Journal on Big Data, 1(2), 89–106. https://doi.org/10.32604/jbd.2019.07235

Vancouver Style

Sun G, Xiang H, Li S. On Multi-Thread Crawler Optimization for Scalable Text Searching. J Big Data. 2019;1(2):89–106. https://doi.org/10.32604/jbd.2019.07235

IEEE Style

G. Sun, H. Xiang, and S. Li, “On Multi-Thread Crawler Optimization for Scalable Text Searching,” J. Big Data, vol. 1, no. 2, pp. 89–106, 2019. https://doi.org/10.32604/jbd.2019.07235

BibTex EndNote RIS

Citations

2

[click to view]

Copyright © 2019 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

On Multi-Thread Crawler Optimization for Scalable Text Searching

Abstract

Keywords

Cite This Article

Citations

2817

1910

4

Further Information

Guidelines

Follow Us

Join Us

Share Link