As data grows in size, search engines face new challenges in extracting more relevant content for users’ searches. As a result, a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements. Unfortunately, most existing indexes and ranking algorithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations, making it impossible to deliver exceptionally accurate results. As a result, this study investigates and analyses how search engines work, as well as the elements that contribute to higher ranks. This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank (PR) algorithm, which is one of the most widely used page ranking algorithms We propose weighted PageRank (WPR) algorithms to test the relationship between these various measures. The Weighted Page Rank (WPR) model was used in three distinct trials to compare the rankings of documents and pages based on one or more user preferences criteria. The findings of utilizing the Weighted Page Rank model showed that using multiple criteria to rank final pages is better than using only one, and that some criteria had a greater impact on ranking results than others.
The World Wide Web (WWW) is comprised of billions of web pages containing massive amounts of data. Users can use search engines to find useful information among the vast amount of data available. The current search engines, on the other hand, do not fully satisfy the demand for high-quality information search services. This creates difficulties in retrieving information, and several ranking systems are used to navigate through the search results. For ordering web pages, page rank algorithms are well recognised. Ranking algorithms have evolved into a useful tool for sorting and retrieving relevant web pages based on the user’s interests.
To make use of a wide range of data from the internet, time and money are required to collect and analyse data. Extracting useful information from web data and producing benefit content necessitates data extraction, transformation, and rework. Web crawlers have recently gained popularity as a method of extracting vital information from a website. Because their web crawling behaviour is similar to that of a spider crawling through a web, web crawlers are also known as web spiders. The web crawler regularly visits the web server, gathers vital information on every appended homepage, automates the process of following each link and gathering information, analyses the content of each online page, and attaches to each web page one by one to collect the data.
The Weighted Page Ranking (WPR) is a metric that measures how well web pages are ranked. Web structure and content mining techniques are combined in this manner. The importance of a page is determined by web structure mining, while the relevancy of a page is determined by web content mining. The number of pages that point to or are referred to by a page is defined as its popularity here. It can be computed using the page’s number of in links and out links. Relevancy refers to the page’s compatibility with the executed query. A page becomes more relevant if it is maximally matched to the query.
To address the issue of bias, this study suggests a new rating set of criteria based entirely on the PageRank (PR) set of rules, which is one of the most widely used weighted page ranking algorithms. To test the relationship between these various variables, we present weighted PageRank algorithms. In three separate studies, the Weighted Page Rank (WPR) model was utilised to compare the ranking scores of documents and pages based on one or more user preference criteria. The outcomes of the Weighted Page Rank model revealed that ranking based on a variety of criteria beat rankings based on a single criterion, and that some criteria had a greater impact on ranking outcomes than others. The model used the top 7 criteria for data acquisition and retrieving outcomes whereas totally disregarding the other metrics that could be significant with other needs and requirements. It included an evaluation of relevant related works with identifying the top page ranking factors and more towards achieving high application performance and obtaining the maximum mark for each page result.
By utilising the web’s structure, the Weighted PageRank (WPR) algorithm delivers vital information about a particular query. While some pages may be irrelevant to a certain query, it nevertheless achieves the highest ranking due to the large number of inbound and outbound links. The pages’ relevance to a certain query is less predetermined. This method is primarily based on the number of inbound and outbound links.
To evaluate the Weighted Page Rank (WPR) to the antique PageRank, they categorised the question end result pages into 4 organizations primarily based totally on their relevance to the furnished question. This is the way it works: Very Relevant Pages (RP) are pages that consist of crucial statistics on a selected topic. Related Pages are pages that are relevant to a query but do not contain crucial information. Weak Relevant Pages (WRP) are pages with query keywords but insufficient information. Irrelevant Pages (IR) are those with no relevant content or query keywords.
On the basis of the user’s query, both PageRank and weighted PageRank (WPR) return rated pages in category order. As a result, users care about the amount of relevant sites in the resulting list, as well as their ranking. The Relevance Rule was used to determine the relevancy value of each page in the list of pages. As a result, Weighted Page Rank differs from PageRank.
The Weighted Page Rank algorithm prioritises inbound and outbound connections, while the standard Page Rank algorithm prioritises link number. [
The suggested technique to detect malicious web pages makes use of a 30-parameter feature vector. The recommended deep learning model employs Adam Optimizer and a List smart approach to discriminate between bogus and legal websites. [
When the network is vast, ranking nodes in an uncertain graph takes a lengthy time due to the large number of alternative universes. Reference [
This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank (PR) algorithm, which is one of the most widely used page ranking algorithms We propose weighted PageRank (WPR) algorithms to test the relationship between these various measures. We propose weighted PageRank algorithms (WPR) to measure the relationship between these factors. The Weighted Page Rank method makes use of the web's structure to transmit crucial information about a query. Despite the fact that some sites may be irrelevant to a specific query, it receives the greatest priority ranking due to its massive quantity of in-links and out-links. The pages' relevance to a single query is less consistent. The number of connected in- and out-links is an important part of this strategy.
Consumers now have access to a huge amount of information on the internet. This type of web material is spread among several portal sites. Web information, on the other hand, can only be gained through the user’s own registration and effort. Additionally, the only portions of the search that can be utilised to find current information are the title and introduction of the online publication. We present a method for extracting title, content, author, writing time, and other data from web documents by automating the crawler and extracting it without requiring the user’s interaction. The following are some things to think about when implementing a theme-driven crawler: For starters, there is no uniform formal structure for web documents. Second, key website address connections are usually found on the home page. As a result, the number of links on a web site’s main page should be reduced when searching for the location of an online document. The crawler proposed in this research makes use of the portal site’s random access module to address the aforementioned concerns and maximise the effect of gathering online documents. The crawler collects website URLs via the modules call on a regular basis, limits the size of the search queue to the number of links on the main page, and checks links breadth-first. However, if the website’s main page is in frame format, you’ll need to put in more effort. Because frames do not include genuine connection is established, it is necessary to examine a page that makes up a frame in order to find a web page with a link.
The proposed weighted page rank (WPR) algorithm contains set of rules, its works on both online and offline web sites and records, and it makes use of a variety of retrieval techniques to assist the indexing and statistics extraction methods provide more relevant outcomes to customers while addressing most of the issues that plagued previous web page rating algorithms. [
More significant (popular) internet pages obtain higher page rank ranks in the weighted page rank approach. The number of inbound and outgoing links on a website determines its popularity, and each website is assigned a proportional web page rank grade.
The variables
Parameter | Description |
---|---|
weight of |
|
weight of |
Instead of sharing the rank value of a page evenly across it’s out link pages, the Weighted PageRank assigns large rank values to more significant pages. The number of inbound and outward connections is used to determine popularity, which is written as
The number of in links on pages U and V were used to make the calculation, with d = damping factor to set value 0 to 1,
The relevance calculator estimates a page’s relevance on the fly based on two factors: one shows the likelihood that the query will be answered in the page, and the other represents the query’s maximum match to the page. Reference [
The number of query phrases in the supplied document is denoted by the variable
The maximum number of strings is determined so that each string represents a logical word combination in its own right. This yields the Content Weight
The total of all possible meaningful query strings in order is
The category and location of a page in the page list determines its relevance to a query. The relevancy value increases as the result improves. The category and position of a page-relevance, lists
where I is the
In large networks, we offer an efficient technique for computing the proposed Weighted Page Rank (WPR). The power iteration approach is the most common way to calculate classical Page Rank. As is well known, power iteration is sluggish, particularly for large and dense networks. [
In other words, matrix M is non-negative and column stochastic.
B is a
We used the Weighted Page Rank (WPR) and traditional PageRank (PR) algorithms to compare their findings in order to evaluate the Weighted Page Rank method.
Web page | Weighted Page Rank (d = 0.85) | Weighted Page Rank (d = 0.7) | Weighted Page Rank (d = 0.5) |
---|---|---|---|
A | 0.6343 | 0.9149 | 1.0143 |
B | 0.3783 | 0.5580 | 0.6571 |
C | 0.6314 | 0.8885 | 0.9476 |
D | 0.5368 | 0.7603 | 0.9172 |
E | 0.3783 | 0.6580 | 0.7371 |
Weighted Page Rank (WPR) generates higher relevancy values, implying that it outperforms PageRank. The performance is shown in
This work introduces the Weighted Page Rank (WPR) algorithm, which is a PageRank extension. Weighted Page Rank assigns rank scores to pages based on their popularity, taking both inbound and outbound links into account. In the current model of Weighted Page Rank (WPR), only the inside and outside hyperlinks of the pages within the reference web page listing are used to determine the rank rankings. In three separate studies, the Weighted Page Rank model was utilised to compare ranking scores of documents and pages based on one or more user preferences criteria. According to the Weighted Page Rank model, ranking results based on many criteria are preferable to ranking results based on a single criterion, and certain criteria have a bigger impact on ranking outcomes than others.
The author with a deep sense of gratitude would thank the supervisor for his guidance and constant support rendered during this research.